Friday, November 2, 2012

A Practical Cabal Primer

I've been doing full-time Haskell development for almost three years now, and while I recognize that Cabal has been painful to use at times, the current reality is that Cabal does what I need it to do and for the most part stays out of my way. In this post, I'll describe the Cabal best practices I've settled on for my Haskell development.

First, some terminology. GHC is the de facto Haskell compiler, Hackage is the package database, Cabal is a library providing package infrastructure, and cabal-install is a command line program (confusingly called "cabal") for building and installing packages, and downloading and uploading them from Hackage. This isn't a tutorial for installing Haskell, so I'll assume that you at least have GHC and cabal-install's "cabal" binary. If you have a very recent release of GHC, then you're asking for problems. At the time of this writing GHC 7.6 is a few months old, so don't use it unless you know what you're doing. Stick to 7.4 until maintainers have updated their packages. But do make sure you have the most recent version of Cabal and cabal-install because it has improved significantly.

cabal-install can install things as global or user. You usually have to have root privileges to install globally. Installing locally will put packages in your user's home directory. Executable binaries go in $HOME/.cabal/bin. Libraries go in $HOME/.ghc. Other than the packages that come with GHC, I install everything as user. This means that when I upgrade cabal-install with "cabal install cabal-install", the new binary won't take effect unless $HOME/.cabal/bin is at the front of my path.

Now I need to get the bad news over with up front. Over time your local Cabal package database will grow until it starts to cause problems. Whenever I'm having trouble building packages, I'll tinker with things a little to see if I can isolate the problem, but if that doesn't work, then I clean out my package repository and start fresh. On linux this can be done very simply with rm -fr ~/.ghc. Yes, this feels icky. Yes, it's suboptimal. But it's simple and straightforward, so either deal with it, or quit complaining and help us fix it.

I've seen people also say that you should delete the ~/.cabal directory as well. Most of the time that is bad advice. If you delete .cabal, you'll probably lose your most recent version of cabal-install, and that will make life more difficult. Deleting .ghc completely clears out your user package repository, and in my experience is almost always sufficient. If you really need to delete .cabal, then I would highly recommend copying the "cabal" binary somewhere safe and restoring it after you're done.

Sometimes you don't need to go quite so far as to delete everything in ~/.ghc. For more granular control over things, use the "ghc-pkg" program. "ghc-pkg list" shows you a list of all the installed packages. "ghc-pkg unregister foo-2.3" removes a package from the list. You can also use unregister without the trailing version number to remove every installed version of that package. If there are other packages that depend on the package you're removing, you'll get an error. If you really want to remove it, use the --force flag.

If you force unregister a package, then "ghc-pkg list" will show you all the broken packages. If I know that there's a particular hierarchy of packages that I need to remove, then I'll force remove the top one, and then use ghc-pkg to tell me all the others that I need to remove. This is an annoying process, so I only do it when I think it will be quicker than deleting everything and rebuilding it all.

So when do you need to use ghc-pkg? Typically I only use it when something breaks that I think should build properly. However, I've also found that having multiple versions of a package installed at the same time can sometimes cause problems. This can show up when the package I'm working on uses one version of a library, but when I'm experimenting in ghci a different version gets loaded. When this happens you may get perplexing error messages for code that is actually correct. In this situation, I've been able to fix the problem by using ghc-pkg to remove all but one version of the library in question.

If you've used all these tips and you still cannot install a package even after blowing away ~/.ghc, then there is probably a dependency issue in the package you're using. Haskell development is moving at a very rapid pace, so the upstream package maintainers may not be aware or have had time to fix the problem. You can help by alerting them to the problem, or better yet, including a patch to fix it.

Often the fix may be a simple dependency bump. These are pretty simple to do yourself. Use "cabal unpack foo-package-0.0.1" to download the package source and unzip it into the current directory. Then edit the .cabal file, change the bounds, and build the local package with "cabal install". Sometimes I will also bump the version of the package itself and then use that as the lower bound in the local package that I'm working on. That way I know it will be using my fixed version of foo-package. Don't be afraid to get your hands dirty. You're literally one command a way from hacking on upstream source.

For the impatient, here's a summary of my tips for basic cabal use:

  1. Install the most recent versions of cabal-install
  2. Don't install things with --global
  3. Make sure $HOME/.cabal/bin is at the front of your path
  4. Don't be afraid to use rm -fr ~/.ghc
  5. Use ghc-pkg for fine-grained package control
  6. User "cabal unpack" to download upstream code so you can fix things yourself

Using these techniques, I've found that Cabal actually works extremely well for small scale Haskell development--development where you're only working on a single package at a time and everything else is on hackage. Large scale development where you're developing more than one local package requires another set of tools. But fortunately we've already have some that work reasonably well. I'll discuss those in my next post.

4 comments:

Alexander Dorofeev said...

Update cabal now - not the best idea. A bunch of libraries depends on it. Hoogle for example.

mightybyte said...

I don't understand your point. cabal-install itself tells you to upgrade when there is a newer version available. And the point is not to have the newest version of Cabal so much as to have the newest version of cabal-install. Once you have that built, you don't need to keep that version of Cabal around.

Anonymous said...

Sandboxing lets you have compiled package directory per project. This largely eliminates the problems of conflicting package versions, and hence the need to ever clean the .ghc directory from ~/.cabal.

I've been using cabal-dev to add sandboxing to cabal-install, and it works pretty well. But I look forward to having theses features build in to cabal-install itself, as described here:

http://blog.johantibell.com/2012/08/you-can-soon-play-in-cabal-sandbox.html


Brent said...

By the way, the 'cab' tool ( http://hackage.haskell.org/package/cab ) can unregister packages recursively...