Tuesday, August 18, 2015

"cabal gen-bounds": easy generation of dependency version bounds

In my last post I showed how release dates are not a good way of inferring version bounds. The package repository should not make assumptions about what versions you have tested against. You need to tell it. But from what I've seen there are two problems with specifying version bounds:

  1. Lack of knowledge about how to specify proper bounds
  2. Unwillingness to take the time to do so

Early in my Haskell days, the first time I wrote a cabal file I distinctly remember getting to the dependencies section and having no idea what to put for the version bounds. So I just ignored them and moved on. The result of that decision is that I can no longer build that app today. I would really like to, but it's just not worth the effort to try.

It wasn't until much later that I learned about the PVP and how to properly set bounds. But even then, there was still an obstacle. It can take some time to add appropriate version bounds to all of a package's dependencies. So even if you know the correct scheme to use, you might not want to take the time to do it.

Both of these problems are surmountable. And in the spirit of doing that, I would like to propose a "cabal gen-bounds" command. It would check all dependencies to see which ones are missing upper bounds and output correct bounds for them. I have implemented this feature and it is available at https://github.com/mightybyte/cabal/tree/gen-bounds. Here is what it looks like to use this command on the cabal-install package:

$ cabal gen-bounds
Resolving dependencies...

The following packages need bounds and here is a suggested starting point.
You can copy and paste this into the build-depends section in your .cabal
file and it should work (with the appropriate removal of commas).

Note that version bounds are a statement that you've successfully built and
tested your package and expect it to work with any of the specified package
versions (PROVIDED that those packages continue to conform with the PVP).
Therefore, the version bounds generated here are the most conservative
based on the versions that you are currently building with.  If you know
your package will work with versions outside the ranges generated here,
feel free to widen them.

network      >= 2.6.2 && < 2.7,
network-uri  >= 2.6.0 && < 2.7,

The user can then paste these lines into their build-depends file. They are formatted in a way that facilitates easy editing as the user finds more versions (either newer or older) that the package builds with. This serves to both educate users and automate the process. I think this removes one of the main frustrations people have about upper bounds and is a step in the right direction of getting more hackage packages to supply them. Hopefully it will be merged upstream and be available in cabal-install in the future.

Sunday, August 16, 2015

Why version bounds cannot be inferred retroactively (using dates)

In past debates about Haskell's Package Versioning Policy (PVP), some have suggested that package developers don't need to put upper bounds on their version constraints because those bounds can be inferred by looking at what versions were available on the date the package was uploaded. This strategy cannot work in practice, and here's why.

Imagine someone creates a small new package called foo. It's a simple package, say something along the lines of the formattable package that I recently released. One of the dependencies for foo is errors, a popular package supplying frequently used error handling infrastructure. The developer happens to already have errors-1.4.7 installed on their system, so this new package gets built against that version. The author uploads it to hackage on August 16, 2015 with no upper bounds on its dependencies. Let's for simplicity imagine that errors is the only dependency, so the .cabal file looks like this:

name: foo

If we come back through at some point in the future and try to infer upper bounds by date, we'll see that on August 16, the most recent version of errors was 2.0.0. Here's an abbreviated illustration of the picture we can see from release dates:

If we look only at release dates, and assume that packages were building against the most recent version, we will try to build foo with errors-2.0.0. But that is incorrect! Building foo with errors-2.0.0 will fail because errors had a major breaking change in that version. Bottom line: dates are irrelevant--all that matters is what dependency versions the author happened to be building against! You cannot assume that package authors will always be building against the most recent versions of their dependencies. This is especially true if our developer was using the Haskell Platform or LTS Haskell because those package collections lag the bleeding edge even more. So this scenario is not at all unlikely.

It is also possible for packages to be maintaining multiple major versions simultaneously. Consider large projects like the linux kernel. Developers routinely do maintenance releases on 4.1 and 4.0 even though 4.2 is the latest version. This means that version numbers are not always monotonically increasing as a function of time.

I should also mention another point on the meaning of version bounds. When a package specifies version bounds like this...

name: foo
  errors >= 1.4 && < 1.5

...it is not saying "my package will not work with errors-1.5 and above". It is actually saying, "I warrant that my package does work with those versions of errors (provided errors complies with the PVP)". So the idea that "< 1.5" is a "preemptive upper bound" is wrong. The package author is not preempting anything. Bounds are simply information. The upper and lower bounds are important things that developers need to tell you about their packages to improve the overall health of the ecosystem. Build tools are free to do whatever they want with that information. Indeed, cabal-install has a flag --allow-newer that lets you ignore those upper bounds and step outside the version ranges that the package authors have verified to work.

In summary, the important point here is that you cannot use dates to infer version bounds. You cannot assume that package authors will always be building against the most recent versions of their dependencies. The only reliable thing to do is for the package maintainer to tell you explicitly what versions the package is expected to work with. And that means lower and upper bounds.

Update: Here is a situation that illustrates this point perfectly: cryptonite issue #96. cryptonite-0.19 was released on August 12, 2016. But cryptonite-0.15.1 was released on August 22, 2016. Any library published after August 22, 2016 that depends on cryptonite-0.15.1 would not be able to build if the solver used dates instead of explicit version bounds.