Sunday, August 16, 2015

Why version bounds cannot be inferred retroactively (using dates)

In past debates about Haskell's Package Versioning Policy (PVP), some have suggested that package developers don't need to put upper bounds on their version constraints because those bounds can be inferred by looking at what versions were available on the date the package was uploaded. This strategy cannot work in practice, and here's why.

Imagine someone creates a small new package called foo. It's a simple package, say something along the lines of the formattable package that I recently released. One of the dependencies for foo is errors, a popular package supplying frequently used error handling infrastructure. The developer happens to already have errors-1.4.7 installed on their system, so this new package gets built against that version. The author uploads it to hackage on August 16, 2015 with no upper bounds on its dependencies. Let's for simplicity imagine that errors is the only dependency, so the .cabal file looks like this:

name: foo

If we come back through at some point in the future and try to infer upper bounds by date, we'll see that on August 16, the most recent version of errors was 2.0.0. Here's an abbreviated illustration of the picture we can see from release dates:

If we look only at release dates, and assume that packages were building against the most recent version, we will try to build foo with errors-2.0.0. But that is incorrect! Building foo with errors-2.0.0 will fail because errors had a major breaking change in that version. Bottom line: dates are irrelevant--all that matters is what dependency versions the author happened to be building against! You cannot assume that package authors will always be building against the most recent versions of their dependencies. This is especially true if our developer was using the Haskell Platform or LTS Haskell because those package collections lag the bleeding edge even more. So this scenario is not at all unlikely.

It is also possible for packages to be maintaining multiple major versions simultaneously. Consider large projects like the linux kernel. Developers routinely do maintenance releases on 4.1 and 4.0 even though 4.2 is the latest version. This means that version numbers are not always monotonically increasing as a function of time.

I should also mention another point on the meaning of version bounds. When a package specifies version bounds like this...

name: foo
  errors >= 1.4 && < 1.5 is not saying "my package will not work with errors-1.5 and above". It is actually saying, "I warrant that my package does work with those versions of errors (provided errors complies with the PVP)". So the idea that "< 1.5" is a "preemptive upper bound" is wrong. The package author is not preempting anything. Bounds are simply information. The upper and lower bounds are important things that developers need to tell you about their packages to improve the overall health of the ecosystem. Build tools are free to do whatever they want with that information. Indeed, cabal-install has a flag --allow-newer that lets you ignore those upper bounds and step outside the version ranges that the package authors have verified to work.

In summary, the important point here is that you cannot use dates to infer version bounds. You cannot assume that package authors will always be building against the most recent versions of their dependencies. The only reliable thing to do is for the package maintainer to tell you explicitly what versions the package is expected to work with. And that means lower and upper bounds.

Update: Here is a situation that illustrates this point perfectly: cryptonite issue #96. cryptonite-0.19 was released on August 12, 2016. But cryptonite-0.15.1 was released on August 22, 2016. Any library published after August 22, 2016 that depends on cryptonite-0.15.1 would not be able to build if the solver used dates instead of explicit version bounds.


asjo said...

I've never understood how you can set an upper bound if you are developing using the newest versions of your dependencies.

How can you know if the next version - which doesn't exist yet - of a dependency breaks your code?

What if it doesn't, and the only thing keeping your code from building is an upper-bound set too tight, predicting the future erroneously?

I don't understand why the Haskell community seem so keen on upper bounds on dependencies.

(I don't understand much Haskell, mind you, so maybe I'm missing something obvious :-))

mightybyte said...

As I mentioned in the post, a bound isn't there to categorically prevent people from building your library against newer versions of dependencies. (The --allow-newer option for cabal-install lets you do just this.) It's there to document what versions you have tested against.

mightybyte said...

For more information check out the PVP FAQ.