Thursday, December 20, 2012

Haskell Web Framework Matrix

A comparison of the big three Haskell web frameworks on the most informative two axes I can think of.

image/svg+xml DSLs Combinators Snap Happstack Yesod NS Some thingsdynamic Everythingtype-safe

Note that this is not intended to be a definitive statement of what is and isn't possible in each of these frameworks. As I've written elsewhere, most of the features of each of the frameworks are interchangeable and can be mixed and matched. The idea of this matrix is to reflect the general attitude each of the frameworks seem to be taking, because sometimes generalizations are useful.

Thursday, November 8, 2012

Using Cabal With Large Projects

In the last post we talked about basic cabal usage. That all works fine as long as you're working on a single project and all your dependencies are in hackage. When Cabal is aware of everything that you want to build, it's actually pretty good at dependency resolution. But if you have several packages that depend on each other and you're working on development versions of these packages that have not yet been released to hackage, then life becomes more difficult. In this post I'll describe my workflow for handling the development of multiple local packages. I make no claim that this is the best way to do it. But it works pretty well for me, and hopefully others will find this information helpful.

Consider a situation where package B depends on package A and both of them depend on bytestring. Package A has wide version bounds for its bytestring dependency while package B has narrower bounds. Because you're working on improving both packages you can't just do "cabal install" in package B's directory because the correct version of package A isn't on hackage. But if you install package A first, Cabal might choose a version of bytestring that won't work with package B. It's a frustrating situation because eventually you'll have to end up worrying about dependencies issues that Cabal should be handling for you.

The best solution I've found to the above problem is cabal-meta. It lets you specify a sources.txt file in your project root directory with paths to other projects that you want included in the package's build environment. For example, I maintain the snap package, which depends on several other packages that are part of the Snap Framework. Here's what my sources.txt file looks like for the snap package:

./
../xmlhtml
../heist
../snap-core
../snap-server

My development versions of the other four packages reside in the parent directory on my local machine. When I build the snap package with cabal-meta install, cabal-meta tells Cabal to look in these directories in addition to whatever is in hackage. If you do this initially for the top-level package, it will correctly take into consideration all your local packages when resolving dependencies. Once you have all the dependencies installed, you can go back to using Cabal and ghci to build and test your packages. In my experience this takes most of the pain out of building large-scale Haskell applications.

Another tool that is frequently recommended for handling this large-scale package development problem is cabal-dev. cabal-dev allows you to sandbox builds so that differing build configurations of libraries can coexist without causing problems like they do with plain Cabal. It also has a mechanism for handling this local package problem above. I personally tend to avoid cabal-dev because in my experience it hasn't played nicely with ghci. It tries to solve the problem by giving you the cabal-dev ghci command to execute ghci using the sandboxed environment, but I found that it made my ghci workflow difficult, so I prefer using cabal-meta which doesn't have these problems.

I should note that cabal-dev does solve another problem that cabal-meta does not. There may be cases where two different packages may be completely unable to coexist in the same Cabal "sandbox" if their set of dependencies are not compatible. In that case, you'll need cabal-dev's sandboxes instead of the single user-level package repository used by Cabal. I am usually only working on one major project at a time, so this problem has never been an issue for me. My understanding is that people are currently working on adding this kind of local sandboxing to Cabal/cabal-install. Hopefully this will fix my complaints about ghci integration and should make cabal-dev unnecessary.

There are definitely things that need to be done to improve the cabal tool chain. But in my experience working on several different large Haskell projects both open and proprietary I have found that the current state of Cabal combined with cabal-meta (and maybe cabal-dev) does a reasonable job at handling large project development within a very fast moving ecosystem.

Friday, November 2, 2012

A Practical Cabal Primer

I've been doing full-time Haskell development for almost three years now, and while I recognize that Cabal has been painful to use at times, the current reality is that Cabal does what I need it to do and for the most part stays out of my way. In this post, I'll describe the Cabal best practices I've settled on for my Haskell development.

First, some terminology. GHC is the de facto Haskell compiler, Hackage is the package database, Cabal is a library providing package infrastructure, and cabal-install is a command line program (confusingly called "cabal") for building and installing packages, and downloading and uploading them from Hackage. This isn't a tutorial for installing Haskell, so I'll assume that you at least have GHC and cabal-install's "cabal" binary. If you have a very recent release of GHC, then you're asking for problems. At the time of this writing GHC 7.6 is a few months old, so don't use it unless you know what you're doing. Stick to 7.4 until maintainers have updated their packages. But do make sure you have the most recent version of Cabal and cabal-install because it has improved significantly.

cabal-install can install things as global or user. You usually have to have root privileges to install globally. Installing locally will put packages in your user's home directory. Executable binaries go in $HOME/.cabal/bin. Libraries go in $HOME/.ghc. Other than the packages that come with GHC, I install everything as user. This means that when I upgrade cabal-install with "cabal install cabal-install", the new binary won't take effect unless $HOME/.cabal/bin is at the front of my path.

Now I need to get the bad news over with up front. Over time your local Cabal package database will grow until it starts to cause problems. Whenever I'm having trouble building packages, I'll tinker with things a little to see if I can isolate the problem, but if that doesn't work, then I clean out my package repository and start fresh. On linux this can be done very simply with rm -fr ~/.ghc. Yes, this feels icky. Yes, it's suboptimal. But it's simple and straightforward, so either deal with it, or quit complaining and help us fix it.

I've seen people also say that you should delete the ~/.cabal directory as well. Most of the time that is bad advice. If you delete .cabal, you'll probably lose your most recent version of cabal-install, and that will make life more difficult. Deleting .ghc completely clears out your user package repository, and in my experience is almost always sufficient. If you really need to delete .cabal, then I would highly recommend copying the "cabal" binary somewhere safe and restoring it after you're done.

Sometimes you don't need to go quite so far as to delete everything in ~/.ghc. For more granular control over things, use the "ghc-pkg" program. "ghc-pkg list" shows you a list of all the installed packages. "ghc-pkg unregister foo-2.3" removes a package from the list. You can also use unregister without the trailing version number to remove every installed version of that package. If there are other packages that depend on the package you're removing, you'll get an error. If you really want to remove it, use the --force flag.

If you force unregister a package, then "ghc-pkg list" will show you all the broken packages. If I know that there's a particular hierarchy of packages that I need to remove, then I'll force remove the top one, and then use ghc-pkg to tell me all the others that I need to remove. This is an annoying process, so I only do it when I think it will be quicker than deleting everything and rebuilding it all.

So when do you need to use ghc-pkg? Typically I only use it when something breaks that I think should build properly. However, I've also found that having multiple versions of a package installed at the same time can sometimes cause problems. This can show up when the package I'm working on uses one version of a library, but when I'm experimenting in ghci a different version gets loaded. When this happens you may get perplexing error messages for code that is actually correct. In this situation, I've been able to fix the problem by using ghc-pkg to remove all but one version of the library in question.

If you've used all these tips and you still cannot install a package even after blowing away ~/.ghc, then there is probably a dependency issue in the package you're using. Haskell development is moving at a very rapid pace, so the upstream package maintainers may not be aware or have had time to fix the problem. You can help by alerting them to the problem, or better yet, including a patch to fix it.

Often the fix may be a simple dependency bump. These are pretty simple to do yourself. Use "cabal unpack foo-package-0.0.1" to download the package source and unzip it into the current directory. Then edit the .cabal file, change the bounds, and build the local package with "cabal install". Sometimes I will also bump the version of the package itself and then use that as the lower bound in the local package that I'm working on. That way I know it will be using my fixed version of foo-package. Don't be afraid to get your hands dirty. You're literally one command a way from hacking on upstream source.

For the impatient, here's a summary of my tips for basic cabal use:

  1. Install the most recent versions of cabal-install
  2. Don't install things with --global
  3. Make sure $HOME/.cabal/bin is at the front of your path
  4. Don't be afraid to use rm -fr ~/.ghc
  5. Use ghc-pkg for fine-grained package control
  6. User "cabal unpack" to download upstream code so you can fix things yourself

Using these techniques, I've found that Cabal actually works extremely well for small scale Haskell development--development where you're only working on a single package at a time and everything else is on hackage. Large scale development where you're developing more than one local package requires another set of tools. But fortunately we've already have some that work reasonably well. I'll discuss those in my next post.

Thursday, November 1, 2012

Why Cabal Has Problems

Haskell's package system, henceforth just "Cabal" for simplicity, has gotten some harsh press in the tech world recently. I want to emphasize a few points that I think are important to keep in mind in the discussion.

First, this is a hard problem. There's a reason the term "DLL hell" existed long before Cabal. I can't think of any package management system I've used that didn't generate quite a bit of frustration at some point.

Second, the Haskell ecosystem is also moving very quickly. There's the ongoing iteratees/conduits/pipes debate of how to do IO in an efficient and scalable way. Lenses have recently seen major advances in the state of the art. There is tons of web framework activity. I could go on and on. So while Hackage may not be the largest database of reusable code, the larger ones like CPAN that have been around for a long time are probably not moving as fast (in terms of advances in core libraries).

Third, I think Haskell has a unique ability to facilitate code reuse even for relatively small amounts of code. The web framework scene demonstrates this fairly well. As I've said before, even though there are three main competing frameworks, libraries in each of the frameworks can be mixed and matched easily. For example, web-routes-happstack provides convenience code for gluing together the web-routes package with happstack. It is 82 lines of code. web-routes-wai does the same thing for wai with 81 lines of code. The same thing could be done for Snap with a similar amount of code.

The languages with larger package repositories like Ruby and Python might also have small glue packages like this, but they don't have the powerful strong type system. This means that when a Cabal build fails because of dependency issues, you're catching an interaction much earlier than you would have caught it in the other languages. This is what I'm getting at when I say "unique ability to facilitate code reuse".

When you add Haskell's use of cross-module compiler optimizations to all these previous points, I think it makes a compelling case that the Haskell community is at or near the frontier of what has been done before even though we may be a ways away in terms of raw number of packages and developers. Thus, it should not be surprising that there are problems. When you're at the edge of the explored space, there's going to be some stumbling around in the dark and you might go down some dead end paths. But that's not a sign that there's something wrong with the community.

Note: The first published version of this article made some incorrect claims based on incorrect information about the number of Haskell packages compared to the number of packages in other languages. I've removed the incorrect numbers and adjusted my point.

Tuesday, April 17, 2012

LTMT Part 2: Monads

In part 1 of this tutorial we talked about types and kinds. Knowledge of kinds will help to orient yourself in today's discussion of monads.

What is a monad? When you type "monad" into Hayoo the first result takes you to the documentation for the type class Monad. If you don't already have a basic familiarity with type classes, you can think of a type class as roughly equivalent to a Java interface. A type class defines a set of functions involving a certain data type. When a data type defines all the functions required by the type class, we say that it is an instance of that type class. When a type Foo is an instance of the Monad type class, you'll commonly hear people say "Foo is a monad". Here is a version of the Monad type class.

class Monad m where
    return :: a -> m a
    (=<<) :: (a -> m b) -> m a -> m b

(Note: If you're the untrusting type and looked up the real definition to verify that mine is accurate, you'll find that my version is slightly different. Don't worry about that right now. I did it intentionally, and there is a method to my madness.)

This basically says that in order for a data type to be an instance of the Monad type class, it has to define the two functions return and (=<<) (pronounced "bind") that have the above type signatures. What do these type signatures tell us? Let's look at return first. We see that it returns a value of type m a. This tells us that m has the kind signature m :: * -> *. So whenever we hear someone say "Foo is a monad" we immediately know that Foo :: * -> *.

In part 1, you probably got tired of me emphasizing that a type is a context. When we look at return and bind, this starts to make more sense. The type m a is just the type a in the context m. The type signature return :: a -> m a tells us that the return function takes a plain value a and puts that value into the context m. So when we say something is a monad, we immediately know that we have a function called return that lets us put arbitrary other values into that context.

Now, what about bind? It looks much more complicated and scary, but it's really pretty simple. To see this, let's get rid of all the m's in the type signature. Here's the before and after.

before :: (a -> m b) -> m a -> m b
after  :: (a -> b) -> a -> b

The type signature for after might look familiar. It's exactly the same as the type signature for the ($) function! If you're not familiar with it, Haskell's $ function is just syntax sugar for function application. (f $ a) is exactly the same as (f a). It applies the function f to its argument a. It is useful because it has very low precedence is right associative, so it is a nice syntax sugar that allows us to eliminate parenthesis in certain situations. When you realize that (=<<) is roughly analogous to the concept of function application (modulo the addition of a context m), it suddenly makes a lot more sense.

So now what happens when we look at bind's type signature with the m's back in? (f =<< k) applies the function f to the value k. However, the crucial point is that k is a value wrapped in the context m, but f's parameter is an unwrapped value a. From this we see that the bind function's main purpose is to pull a value out of the context m and apply the function f, which does some computation, and returns the result back in the context m again.

The monad type class does not provide any mechanism for unconditionally pulling a value out of the context. The only way to get access to the unwrapped value is with the bind function, but bind does this in a controlled way and requires the function to wrap things up again before the result is returned. This behavior, enabled by Haskell's strong static type system, provides complete control over side effects and mutability.

Some monads do provide a way to get a value out of the context, but the choice of whether to do so is completely up to the author of said monad. It is not something inherent in the concept of a monad.

Monads wouldn't be very fun to use if all you had was return, bind, and derived functions. To make them more usable, Haskell has a special syntax called "do notation". The basic idea behind do notation is that there's a bind between every line, and you can do a <- func to unwrap the return value of func and make it available to later lines with the identifier 'a'.

You can find a more detailed treatment of do notation elsewhere. I hear that Learn You a Haskell and Real World Haskell are good.

In summary, a monad is a certain type of context that provides two things: a way to put things into the context, and function application within the context. There is no way to get things out. To get things out, you have to use bind to take yourself into the context. Once you have these two operations, there are lots of other more complicated operations built on the basic primitives that are provided by the API. Much of this is provided in Control.Monad. You probably won't learn all this stuff in a day. Just dive in and use these concepts in real code. Eventually you'll find that the patterns are sinking in and becoming clearer.

Monday, April 16, 2012

The Less Travelled Monad Tutorial: Understanding Kinds

This is part 1 of a monad tutorial (but as we will see, it's more than your average monad tutorial). If you already have a strong grasp of types, kinds, monads, and monad transformers, and type signatures like newtype RST r s m a = RST { runRST :: r -> s -> m (a, s) } don't make your eyes glaze over, then reading this won't change your life. If you don't, then maybe it will.

More seriously, when I was learning Haskell I got the impression that some topics were "more advanced" and should wait until later. Now, a few years in, I feel that understanding some of these topics earlier would have significantly sped up the learning process for me. If there are other people out there whose brains work somewhat like mine, then maybe they will be able to benefit from this tutorial. I can't say that everything I say here will be new, but I haven't seen these concepts organized in this way before.

This tutorial is not for absolute beginners. It assumes a basic knowledge of Haskell including the basics of data types, type signatures, and type classes. If you've been programming Haskell for a little bit, but are getting stuck on monads or monad transformers, then you might find some help here.

data Distance = Dist Double
data Mass = Mass Double

This code defines data types called Distance and Mass. The name on the left side of the equals sign is called a type constructor (or sometimes shortened to just type). The Haskell compiler automatically creates functions from the names just to the right of the equals sign. These functions are called data constructors because they construct the types Distance and Mass. Since they are functions, they are also first-class values, which means they have types as seen in the following ghci session.

$ ghci ltmt.hs
GHCi, version 7.0.3: http://www.haskell.org/ghc/  :? for helpghci> :t Dist
Dist :: Double -> Distance
ghci> :t Mass
Mass :: Double -> Mass
ghci> :t Distance
<interactive>:1:1: Not in scope: data constructor `Distance'

We see here that Dist and Mass are functions that return the types Distance and Mass respectively. Frequently you'll encounter code where the type and the constructor have the same name (as we have here with Mass). Distance, however, illustrates that these are really two separate entities. Distance is the type and Dist is the constructor. Types don't have types, so the ":t" command fails for Distance.

Now, we need to pause for a moment and think about the meaning of these things. What is the Distance type? Well, when we look at the constructor, we can see that a value of type Distance contains a single Double. The constructor function doesn't actually do anything to the Double value in the process of constructing the Distance value. All it does is create a new context for thinking about a Double, specifically the context of a Double that we intend to represent a distance quantity. (Well, that's not completely true, but for the purposes of this tutorial we'll ignore those details.) Let me repeat that. A type is just a context. This probably seems so obvious that you're starting to wonder about me. But I'm saying it because I think that keeping it in mind will help when talking about monads later.

Now let's look at another data declaration.

data Pair a = MkPair a a

This one is more interesting. The type constructor Pair takes an argument. The argument is some other type a and that is used in some part of the data constructor. When we look to the right side, we see that the data constructor is called MkPair and it constructs a value of type "Pair a" from two values of type "a".

MkPair :: a -> a -> Pair a

The same thing we said for the above types Distance and Mass applies here. The type constructor Pair represents a context. It's a context representing a pair of values. The type Pair Int represents a pair of Ints. The type Pair String represents a pair of strings. And on and on for whatever concrete type we use in the place of a.

Again, this is all very straightforward. But there is a significant distinction between the two type constructors Pair and Distance. Pair requires a type parameter, while Distance does not. This brings us to the topic of kinds. (Most people postpone this topic until later, but it's not hard to understand and I think it helps to clarify things later.) You know those analogy questions they use on standardized tests? Here's a completed one for you:

values : types    ::   types : kinds

Just as we categorize values by type, we categorize type constructors by kind. GHCi lets us look up a type constructor's kind with the ":k" command.

ghci> :k Distance
Distance :: *
ghci> :k Mass
Mass :: *
ghci> :k Dist
<interactive>:1:1: Not in scope: type constructor or class `Dist'
ghci> :k Pair
Pair :: * -> *
ghci> :k Pair Mass
Pair Mass :: *

In English we would say "Distance has kind *", and "Pair has kind * -> *". Kind signatures look similar to type signatures because they are. When we use Mass as Pair's first type argument, the result has kind *. The Haskell report defines kind signatures with the following two rules.

  • The symbol * represents the kind of all nullary type constructors (constructors that don't take any parameters).
  • If k1 and k2 are kinds, then k1->k2 is the kind of types that take one parameter that is a type of kind k1 and return a type of kind k2.

As an exercise, see if you can work out the kind signatures for the following type constructors. You can check your work with GHCi.

data Tuple a b = Tuple a b
data HardA a = HardA (a Int)
data HardB a b c = HardB (a b) (c a Int)

Before reading further, I suggest attempting to figure out the kind signatures for HardA and HardB because they involve a key pattern that will come up later.


Welcome back. The first example is just a simple extension of what we've already seen. The type constructor Tuple has two arguments, so it's kind signature is Tuple :: * -> * -> *. Also if you try :t you'll see that the data constructor's type signature is Tuple :: a -> b -> Tuple a b.

In the case of the last two types, it may be a little less obvious. But they build on each other in fairly small, manageable steps. For HardA, in the part to the left of the equals sign we see that there is one type parameter 'a'. From this, we can deduce that HardA's kind signature is something of the form ? -> *, but we don't know exactly what to put at the question mark. On the right side, all the individual arguments to the data constructor must have kind *. If (a Int) :: *, then the type 'a' must be a type constructor that takes one parameter. That is, it has kind * -> *, which is what we must substitute for the question mark. Therefore, we get the final kind signature HardA :: (* -> *) -> *.

HardB is a very contrived and much more complex case that exercises all the above simple principles. From HardB a b c we see that HardB has three type parameters, so it's kind signature has the form HardB :: a -> b -> c -> *. On the right side the (a b) tells us that b :: * and a :: * -> *. The second part (c a Int) means that c is a type constructor with two parameters where its first parameter is a, which has the type signature we described above. So this gives us c :: (* -> *) -> * -> *. Now, substituting all these in, we get HardB :: (* -> *) -> * -> ((* -> *) -> * -> *) -> *.

The point of all this is to show that when you see juxtaposition of type constructors (something of the form (a b) in a type signature), it is telling you that the context a is a non-nullary type constructor and b is its first parameter.

Continue on to Part 2 of the Less Travelled Monad Tutorial

Sunday, April 15, 2012

Four Tips for New Haskell Programmers

The Haskell programming language is widely considered to have a fairly steep learning curve--at least compared with mainstream languages.  In my experience with Haskell and specifically helping newcomers I've noticed a few common issues that seem to come up again and again.  Some of these issues might be more avoidable if the Haskell community did a better job communicating them.  Four points I have noticed are:
  1. Read Haddock API docs
  2. Pay attention to type class instances
  3. Learn about kinds
  4. Learn monad transformers

Read Haddock API Docs

I mention this point at the risk of stating the obvious.  If you are going to become a proficient Haskell programmer, it's absolutely essential that you get used to reading the API docs for the packages you use.  I often hear newcomers ask for tutorials demonstrating how to use packages.  Our community would definitely be better off with tutorials for every package, but it would also be better if newcomers would pay more attention to the API docs.  Now I know you're probably thinking, "yeah, but most packages are poorly documented."  That is true, but Haskell's type signatures tell you a lot more about a function than other languages.  For instance, consider the following example:

readChan :: Chan a -> IO a

A Chan is essentially a queue.  You can put values in and get them out in FIFO order.  When you are trying to understand the above function, one of the first things you might wonder about it is whether it blocks or not.  But if you think about it a little more, you'll realize that this function has to block.  If it didn't block, then there might not be a value and there would be no way to return something of type a (because Haskell has no null pointers).  A non-blocking version would have to have a type signature like this:

tryReadChan :: Chan -> IO (Maybe a)

So even with no prose documentation added at all, you can still learn a lot from the type signatures.  This is more significant in Haskell than other languages because of Haskell's purity and its strong type system.  Also, I would recommend that you bookmark the url "http://hackage.haskell.org/package/".  Actually, better yet, type it into your web browser frequently so it is the first thing given to you by the autocorrect when you start typing "hack...".  Once you auto-complete that url, you can just type the name of the package you are looking for and you'll immediately get to the most recent API docs for that package.  It's way faster than hitting control-f and searching through the package list on that page.

Also, bookmark http://www.haskell.org/ghc/docs/latest/html/libraries/index.html because it has links to documentation for the core libraries.

Pay Attention to Type Class Instances

I can't emphasize this point enough.  One of the most common questions I get from beginners is how to run an IO function in some random monad.  This information is trivially visible in the API docs, but for some reason beginners never seem to notice.  Take for example the Snap monad.  Go ahead, click that link and look at the documentation.  The clue that you can run IO actions inside that monad is tucked away near the end of the "Instances" block.  It's one little innocuous line MonadIO Snap.  Newcomers might not be familiar with MonadIO, but if they click the link they'll see that it defines one function liftIO :: IO a -> m a that converts any function in the IO monad to a function in the current m monad.  Those instance lines contain a treasure trove of information.  Don't neglect them.

Learn About Kinds

In Haskell all values have a type.  Analogously, all types have a kind.  This topic is often not brought up until later, but I feel that understanding kinds gives beginners a big advantage in understanding type signatures.  I'm not going to go into it in detail here, but I think it's an important concept that is too often ignored.

Learn About Monad Transformers

This is another one of those topics that is often put off until later.  When I was learning monads, I distinctly remember getting the impression that monad transformers were a much more complicated beast and that I didn't need to learn about them at that time.  But when I finally did learn about monad transformers a lot of things became clearer for me.  Also, monad transformers are used all the time in real world applications.  Transformers are a much simpler concept than monads, especially if you know about kinds.  There's no reason not to learn them at the same time.