Friday, December 19, 2014

LTMT Part 3: The Monad Cookbook

Introduction

The previous two posts in my Less Traveled Monad Tutorial series have not had much in the way of directly practical content. In other words, if you only read those posts and nothing else about monads, you probably wouldn't be able to use monads in real code. This was intentional because I felt that the practical stuff (like do notation) had adequate treatment in other resources. In this post I'm still not going to talk about the details of do notation--you should definitely read about that elsewhere--but I am going to talk about some of the most common things I have seen beginners struggle with and give you cookbook-style patterns that you can use to solve these issues.

Problem: Getting at the pure value inside the monad

This is perhaps the most common problem for Haskell newcomers. It usually manifests itself as something like this:

main = do
    lineList <- lines $ readFile "myfile.txt"
    -- ... do something with lineList here

That code generates the following error from GHC:

    Couldn't match type `IO String' with `[Char]'
    Expected type: String
      Actual type: IO String
    In the return type of a call of `readFile'

Many newcomers seem puzzled by this error message, but it tells you EXACTLY what the problem is. The return type of readFile has type IO String, but the thing that is expected in that spot is a String. (Note: String is a synonym for [Char].) The problem is, this isn't very helpful. You could understand that error completely and still not know how to solve the problem. First, let's look at the types involved.

readFile :: FilePath -> IO String
lines :: String -> [String]

Both of these functions are defined in Prelude. These two type signatures show the problem very clearly. readFile returns an IO String, but the lines function is expecting a String as its first argument. IO String != String. Somehow we need to extract the String out of the IO in order to pass it to the lines function. This is exactly what do notation was designed to help you with.

Solution #1

main :: IO ()
main = do
    contents <- readFile "myfile.txt"
    let lineList = lines contents
    -- ... do something with lineList here

This solution demonstrates two things about do notation. First, the left arrow lets you pull things out of the monad. Second, if you're not pulling something out of a monad, use "let foo =". One metaphor that might help you remember this is to think of "IO String" as a computation in the IO monad that returns a String. A do block lets you run these computations and assign names to the resulting pure values.

Solution #2

We could also attack the problem a different way. Instead of pulling the result of readFile out of the monad, we can lift the lines function into the monad. The function we use to do that is called liftM.

liftM :: Monad m => (a -> b) -> m a -> m b
liftM :: Monad m => (a -> b) -> (m a -> m b)

The associativity of the -> operator is such that these two type signatures are equivalent. If you've ever heard Haskell people saying that all functions are single argument functions, this is what they are talking about. You can think of liftM as a function that takes one argument, a function (a -> b), and returns another function, a function (m a -> m b). When you think about it this way, you see that the liftM function converts a function of pure values into a function of monadic values. This is exactly what we were looking for.

main :: IO ()
main = do
    lineList <- liftM lines (readFile "myfile.txt")
    -- ... do something with lineList here

This is more concise than our previous solution, so in this simple example it is probably what we would use. But if we needed to use contents in more than one place, then the first solution would be better.

Problem: Making pure values monadic

Consider the following program:

import Control.Monad
import System.Environment
main :: IO ()
main = do
    args <- getArgs
    output <- case args of
                [] -> "cat: must specify some files"
                fs -> liftM concat (mapM readFile fs)
    putStrLn output

This program also has an error. GHC actually gives you three errors here because there's no way for it to know exactly what you meant. But the first error is the one we're interested in.

    Couldn't match type `[]' with `IO'
    Expected type: IO Char
      Actual type: [Char]
    In the expression: "cat: must specify some files"

Just like before, this error tells us exactly what's wrong. We're supposed to have an IO something, but we only have a String (remember, String is the same as [Char]). It's not convenient for us to get the pure result out of the readFile functions like we did before because of the structure of what we're trying to do. The two patterns in the case statement must have the same type, so that means that we need to somehow convert our String into an IO String. This is exactly what the return function is for.

Solution: return

return :: a -> m a

This type signature tells us that return takes any type a as input and returns "m a". So all we have to do is use the return function.

import Control.Monad
import System.Environment
main :: IO ()
main = do
    args <- getArgs
    output <- case args of
                [] -> return "cat: must specify some files"
                fs -> liftM concat (mapM readFile fs)
    putStrLn output

The 'm' that the return function wraps its argument in, is determined by the context. In this case, main is in the IO monad, so that's what return uses.

Problem: Chaining multiple monadic operations

import System.Environment
main :: IO ()
main = do
    [from,to] <- getArgs
    writeFile to $ readFile from

As you probably guessed, this function also has an error. Hopefully you have an idea of what it might be. It's the same problem of needing a pure value when we actually have a monadic one. You could solve it like we did in solution #1 on the first problem (you might want to go ahead and give that a try before reading further). But this particular case has a pattern that makes a different solution work nicely. Unlike the fist problem, you can't use liftM here.

Solution: bind

When we used liftM, we had a pure function lines :: String -> [String]. But here we have writeFile :: FilePath -> String -> IO (). We've already supplied the first argument, so what we actually have is writeFile to :: String -> IO (). And again, readFile returns IO String instead of the pure String that we need. To solve this we can use another function that you've probably heard about when people talk about monads...the bind function.

(=<<) :: Monad m => (a -> m b) -> m a -> m b
(=<<) :: Monad m => (a -> m b) -> (m a -> m b)

Notice how the pattern here is different from the first example. In that example we had (a -> b) and we needed to convert it to (m a -> m b). Here we have (a -> m b) and we need to convert it to (m a -> m b). In other words, we're only adding an 'm' onto the 'a', which is exactly the pattern we need here. Here are the two patterns next to each other to show the correspondence.

writeFile to :: String -> IO ()
                     a ->  m b

From this we see that "writeFile to" is the first argument to the =<< function. readFile from :: IO String fits perfectly as the second argument to =<<, and then the return value is the result of the writeFile. It all fits together like this:

import System.Environment
main :: IO ()
main = do
    [from,to] <- getArgs
    writeFile to =<< readFile from

Some might point out that this third problem is really the same as the first problem. That is true, but I think it's useful to see the varying patterns laid out in this cookbook style so you can figure out what you need to use when you encounter these patterns as you're writing code. Everything I've said here can be discovered by carefully studying the Control.Monad module. There are lots of other convenience functions there that make working with monads easier. In fact, I already used one of them: mapM.

When you're first learning Haskell, I would recommend that you keep the documentation for Control.Monad close by at all times. Whenever you need to do something new involving monadic values, odds are good that there's a function in there to help you. I would not recommend spending 10 hours studying Control.Monad all at once. You'll probably be better off writing lots of code and referring to it whenever you think there should be an easier way to do what you want to do. Over time the patterns will sink in as form new connections between different concepts in your brain.

It takes effort. Some people do pick these things up more quickly than others, but I don't know anyone who just read through Control.Monad and then immediately had a working knowledge of everything in there. The patterns you're grappling with here will almost definitely be foreign to you because no other mainstream language enforces this distinction between pure values and side effecting values. But I think the payoff of being able to separate pure and impure code is well worth the effort.

Tuesday, November 25, 2014

Announcing C◦mp◦se :: Conference

Since most of my content is about Haskell, I would like to take this opportunity to inform my readers of a new conference that I and the other co-organizers of the New York Haskell Meetup are hosting at the end of January. It's called C◦mp◦se, and it's a conference for typed functional programmers. Check out the website at http://www.composeconference.org/. We recently issued a call for papers. I know it's short notice, but the deadline is November 30. If you have something that you think would be interesting to typed functional programmers, we'd love to hear from you. Along with the conference we'll also be having one day be a less formal hackathon/unconference. If you would like to give a tutorial/demo at the unconference, email us at info@composeconference.org.

Sunday, August 10, 2014

Field Accessors Considered Harmful

It's pretty well known these days that Haskell's field accessors are rather cumbersome syntactically and not composable.  The lens abstraction that has gotten much more popular recently (thanks in part to Edward Kmett's lens package) solves these problems.  But I recently ran into a bigger problem with field accessors that I had not thought about before.  Consider the following scenario.  You have a package with code something like this:

data Config = Config { configFieldA :: [Text] }

So your Config data structure gives your users getters and setters for field A (and any other fields you might have).  Your users are happy and life is good.  Then one day you decide to add a new feature and that feature requires expanding and restructuring Config.  Now you have this:
data MConfig = MConfig { mconfigFieldA :: [Text] }
data Config = Config { configMC :: MConfig , configFieldX :: Text , configFieldY :: Bool }

This is a nice solution because your users get to keep the functionality over the portion of the Config that they are used to, and they still get the new functionality.  But now there's a problem.  You're still breaking them because configFieldA changed names to mconfigFieldA and now refers to the MConfig structure instead of Config.  If this was not a data structure, you could preserve backwards compatibility by creating another function:

configFieldA = mconfigFieldA . configMC

But alas, that won't work here because configFieldA is not a normal function.  It is a special field accessor generated by GHC and you know that your users are using it as a setter.  It seems to me that we are at an impasse.  It is completely impossible to deliver your new feature without breaking backwards compatibility somehow.  No amount of deprecated cycles can ease the transition.  The sad thing is that it seems like it should have been totally doable.  Obviously there are some kinds of changes that understandably will break backwards compatibility.  But this doesn't seem like one of them since it is an additive change.  Yes, yes, I know...it's impossible to do this change without changing the type of the Config constructor, so that means that at least that function will break.  But we should be able to minimize the breakage to the field accessor functions, and field accessors prevent us from doing that.

However, we could have avoided this problem.  If we had a bit more foresight, we could have done this.

module Foo (mkConfig, configFieldA) where

data Config = Config { _configFieldA :: [Text] }

mkConfig :: [Text] -> Config
mkConfig = Config

configFieldA = lens _configFieldA (\c a -> c { _configFieldA = a })

This would allow us to avoid breaking backwards compatibility by continuing to export appropriate versions of these symbols.  It would look something like this.
module Foo
  ( MConfig
  , mkMConfig
  , mconfigFieldA
  , Config
  , mkConfig
  , configFieldA
  , configMC
  -- ...
  ) where

data MConfig = MConfig { _mconfigFieldA :: [Text] }
data Config = Config { _configMC :: MConfig
                     , _configFieldX :: Text
                     , _configFieldY :: Bool }

mkMConfig = MConfig

mkConfig a = Config (mkMConfig a) "" False

mconfigFieldA = lens _mconfigFieldA (\c a -> c { _mconfigFieldA = a })
configMC = lens _configMC (\c mc -> c { _configMC = mc })

-- The rest of the field lenses here

configFieldA = configMC . mconfigFieldA
Note that the type signatures for mkConfig and configFieldA stay exactly the same.  We weren't able to do this with field accessors because they are not composable.  Lenses solve this problem for us because they are composable and we have complete control over their definition.

For quite some time now I have thought that I understood the advantage that lenses give you over field accessors.  Discovering this added ability of lenses in helping us preserve backwards compatibility came as a pleasant surprise.  I'll refrain from opining on how this should affect your development practices, but I think it makes the case for using lenses in your code a bit stronger than it was before.

Sunday, July 13, 2014

Haskell Best Practices for Avoiding "Cabal Hell"

I posted this as a reddit comment and it was really well received, so I thought I'd post it here so it would be more linkable.  A lot of people complain about "cabal hell" and ask what they can do to solve it.  There are definitely things about the cabal/hackage ecosystem that can be improved, but on the whole it serves me quite well.  I think a significant amount of the difficulty is a result of how fast things move in the Haskell community and how much more reusable Haskell is than other languages.

With that preface, here are my best practices that seem to make Cabal work pretty well for me in my development.

1. I make sure that I have no more than the absolute minimum number of packages installed as --global.  This means that I don't use the Haskell Platform or any OS haskell packages.  I install GHC directly.  Some might think this casts too much of a negative light on the Haskell Platform.  But everyone will agree that having multiple versions of a package installed at the same time is a significant cause of build problems.  And that is exactly what the Haskell Platform does for you--it installs specific versions of packages.  If you use Haskell heavily enough, you will invariably encounter a situation where you want to use a different version of a package than the one the Haskell Platform gives you.

2. Make sure ~/.cabal/bin is at the front of your path.  Hopefully you already knew this, but I see this problem a lot, so it's worth mentioning for completeness.

3. Install happy and alex manually.  These two packages generate binary executables that you need to have in ~/.cabal/bin.  They don't get picked up automatically because they are executables and not package dependencies.

4. Make sure you have the most recent version of cabal-install.  There is a lot of work going on to improve these tools.  The latest version is significantly better than it used to be, so you should definitely be using it.

5. Become friends with "rm -fr ~/.ghc".  This command cleans out your --user repository, which is where you should install packages if you're not using a sandbox.  It sounds bad, but right now this is simply a fact of life.  The Haskell ecosystem is moving so fast that packages you install today will be out of date in a few months if not weeks or days.  We don't have purely functional nix-style package management yet, so removing the old ones is the pragmatic approach.  Note that sandboxes accomplish effectively the same thing for you.  Creating a new sandbox is the same as "rm -fr ~/.ghc" and then installing to --user, but has the benefit of not deleting everything else you had in --user.

6. If you're not working on a single project with one harmonious dependency tree, then use sandboxes for separate projects or one-off package compiles.

7. Learn to use --allow-newer.  Again, things move fast in Haskell land.  If a package gives you dependency errors, then try --allow-newer and see if the package will just work with newer versions of dependencies.

8. Don't be afraid to dive into other people's packages.  "cabal unpack" makes it trivial to download the code for any package.  From there it's often trivial to make manual changes to version bounds or even small code changes.  If you make local changes to a package, then you can either install it to --user so other packages use it, or you can do "cabal sandbox add-source /path/to/project" to ensure that your other projects use the locally modified version.  If you've made code changes, then help out the community by sending a pull request to the package maintainer.  Edit: bergmark mentions that unpack is now "cabal get" and "cabal get -s" lets you clone the project's source repository.

9. If you can't make any progress from the build messages cabal gives you, then try building with -v3.  I have encountered situations where cabal's normal dependency errors are not helpful.  Using -v3 usually gives me a much better picture of what's going on and I can usually figure out the root of the problem pretty quickly.

Friday, May 16, 2014

Implicit Blacklisting for Cabal

I've been thinking about all the Haskell PVP discussion that's been going on lately. It should be no secret by now that I am a PVP proponent. I'm not here to debate the PVP in this post, so for this discussion let's assume that the PVP is a good thing and should be adopted by all packages published on Hackage. More specifically, let's assume this to mean that every package should specify upper bounds on all dependencies, and that most of the time these bounds will be of the form "< a.b".

Recently there has been discussion about problems encountered when packages that have not been using upper bounds change and start using them. The recent issue with the HTTP package is a good example of this. Roughly speaking the problem is that if foo-1.2 does not provide upper bounds on it's dependency bar, the constraint solver is perpetually "poisoned" because foo-1.2 will always be a candidate even long after bar has become incompatible with foo-1.2. If later foo-3.9 specifies a bound of bar < 0.5, then when bar-0.5 comes out the solver will try to build with foo-1.2 even though it is hopelessly old. This will result in build errors since bar has long since changed its API.

This is a difficult problem. There are several immediately obvious approaches to solving the problem.

  1. Remove the offending old versions (the ones missing upper bounds) from Hackage.
  2. Leave them on Hackage, but mark them as deprecated/blacklisted so they will not be chosen by the solver.
  3. Go back and retroactively add upper bounds to the offending versions.
  4. Start a new empty Hackage server that requires packages to specify upper bounds on all dependencies.
  5. Start a new Hackage mirror that infers upper bounds based on package upload dates.

All of these approaches have problems. The first three are problematic because they mess with build reproducibility. The fourth approach fragments the community and in the very best case would take a lot of time and effort before gaining adoption. The fifth approach has problems because correct upper bounds cannot always be inferred by upload dates.

I would like to propose a solution I call implicit blacklisting. The basic idea is that for each set of versions with the prefix a.b.c Cabal will only consider a single one: the last one. This effectively means that all the lower versions with the prefix a.b.c will be implicitly blacklisted. This approach should also allow maintainers to modify this behavior by specifying more granular version bounds.

In our previous example, suppose there were a number of 0.4 versions of the bar package, with 0.4.3.3 being the last one. In this case, if foo specified a bound of bar < 0.5, the solver would only consider 0.4.3.3. 0.4.3.2 and 0.4.3.1 would not be considered. This would allow us to completely hide a lack of version bounds by making a new patch release that only bumps the d number. If that release had problems, we could address them with more patch releases.

Now imagine that for some crazy reason foo worked with 0.4.3.2, but 0.4.3.3 broke it somehow. Note that if bar is following the PVP, that should not be the case. But there are some well-known cases where the PVP can miss things and there is always the possibility of human error. In this case, foo should specify a bound of bar < 0.4.3.3. In this case, the solver should respect that bound and only consider 0.4.3.2. But 0.4.3.1 would still be ignored as before.

Implicit blacklisting has the advantage that we don't need any special support for explicitly marking versions as deprecated/blacklisted. Another advantage is that it does not cause any problems for people who had locked their code down to using specific versions. If foo specified an exact version of bar == 0.4.3.0, then that will continue to be chosen. Implicit blacklisting also allows us to leave everything in hackage untouched and fix issues incrementally as they arise with the minimum amount of work. In the above issue with HTTP-4000.0.7, we could trivially address it by downloading that version, adding version bounds, and uploading it as HTTP-4000.0.7.1.

All in all, I think this implicit blacklisting idea has a number of desirable properties and very few downsides. It fixes the problem using nothing but our existing infrastructure: version numbers. It doesn’t require us to add new concepts like blacklisted/deprecated flags, out-of-band “revision” markers to denote packages modified after the fact, etc. But since this is a complicated problem I may very well have missed something, so I'd like to hear what the community thinks about this idea.

Tuesday, January 28, 2014

Ember.js is driving me crazy

For the past few months I've been working on a project with a fairly complex interactive web interface. This required me to venture into the wild and unpredictable jungle of Javascript development. I was totally unprepared for what I would find. Soon after starting the project it became clear that just using JQuery would not be sufficient for my project. I needed a higher level Javascript framework. After a doing a little research I settled on Ember.js.

The Zombie Code Apocalypse

Ember was definitely a big improvement over straight JQuery, and allowed me to get some fairly complex UI behavior working very quickly. But recently I've run into some problems. The other day I had a UI widget defined like this:

App.FooController = Ember.ObjectController.extend({
    // ...
});
App.FooView = Ember.View.extend({
    // ...
});

It was used somewhere on the page, but at some point I decided that the widget was no longer needed, so I commented out the widget's markup. I wasn't sure whether we would ultimately keep the widget or not, so I opted to keep the above javascript code for the controller and view around for awhile so it would be easily available if I later decided to re-enable that UI element.

Everything seemed to work fine until a few days later when I noticed that another one of my controls, Bar, was not being populated with data. After spending hours trying to figure out the problem, I finally happened to comment out the unused code for the Foo widget and the problem went away. WTF?!? Why should this have anything to do with the functioning of a completely unrelated widget? This makes absolutely no sense to me, and it completely violates the natural assumption that the controller and view for two completely unrelated controls would have no impact on each other. I would have liked to know the underlying cause, but I didn't want to waste time with it, so I just removed the code and moved on.

Spontaneously Changing Values

Maybe a week later I ran into another problem. Some data was changing when I didn't expect it to. I looked everywhere I could think of that might affect the data, but couldn't find anything. Again, I spent the better part of a day trying to track down the source of this problem. After awhile I was getting desperate, so I started putting print statements all over the place. I discovered that the data was changing in one particular function. I examined it carefully but couldn't find any hint of this data being impacted. Eventually I isolated the problem to the following snippet:

console.log(this.get('foo'));
this.set('bar', ...);
console.log(this.get('foo'));

The first log line showed foo with a value of 25. The second log line showed foo with a value of 0. This is utter madness! I set one field, and a completely different one gets changed! In what world does this make any shred of sense? This time, even when I actually figured out where the problem was happening I still couldn't figure out how to solve it. At least the first time I could just comment out the offending innocuous lines. Here I narrowed down the exact line that's causing the problem, but still couldn't figure out how to fix it. Finally I got on the #emberjs IRC channel and learned that Ember's set function has special behavior for values in the content field, which foo was a part of. I was able to fix this problem by initializing the bar field to null. WAT?!?

I was in shock. This seemed like one of the most absurd behaviors I've encountered in all my years of programming. Back in the C days you could see some crazy things, but at least you knew that array updates and pointer arithmetic could be dangerous and possibly overwrite other parts of memory. Here there's no hint. No dynamic index that might overflow. Just what we thought was a straightforward getter and setter for a static field in a data type.

Blaming Systems, Not People

Before you start jumping all over me for all the things I did wrong, hear me out. I'm not blaming the Ember developers or trying to disparage Ember. Ember.js is an amazing library and my application wouldn't exist without it or something like it. I'm just a feeble-minded Haskell programmer and not well-versed in the ways of Javascript. I'm sure I was doing things that contributed to the problem. But that's not the point. I've been around long enough to realize that there are probably good justifications for why the above behaviors exist. The Ember developers are clearly way better Javascript programmers than I will ever be. There's got to be a better explanation.

Peter Senge, in his book The Fifth Discipline, talks about the beer distribution game. It's a game that has been played thousands of times with diverse groups of people in management classes all over the world. The vast majority of people who play it perform very poorly. Peter points out that we're too quick to attribute a bad outcome to individual people when it should instead be attributed to the structure of the system in which those people were operating. This situation is no different.

Like the beer distribution game, Javascript is a complex system. The above anecdotes demonstrate how localized well-intentioned decisions by different players resulted in a bad outcome. The root of the problem is the system we were operating in: an impure programming language with weak dynamic typing. In a different system, say the one we get with Haskell, I can conclusively say that I never would have had these problems. Haskell's purity and strong static type system provide a level of safety that is simply unavailable in Javascript (or any other mainstream programming language for that matter).

The Godlike Refactoring

In fact, this same project gave us another anecdote supporting this claim. The project's back end is several thousand lines of Haskell code. I wrote all of the back end code, and since we have a pretty aggressive roadmap with ambitious deadlines the code isn't exactly all that pretty. There are a couple places with some pretty hairy logic. A few weeks ago we needed to do a major refactoring of the back end to support a new feature. I was too busy with other important features, so another member of the team worked on the refactoring. He had not touched a single line of the back end code before that point, but thanks to Haskell's purity and strong static type system he was able to pull off the entire refactoring single-handedly in a just a couple hours. And once he got it compiling, the application worked the first time. We are both convinced that this feat would have been impossible without strong static types.

Conclusion

I think there are a couple of interesting points worth thinking about here. First of all, the API chosen by Ember only hid the complexity, it didn't reduce it. What seemed to be a simple get() method was actually a more complex system with some special cases. The system was more complex than the API indicated. It's useful to think about the true complexity of a problem compared to the complexity of the exposed API.

The second point is that having the ability to make categorical statements about API behavior is very important. We use this kind of reasoning all the time, and the more of it we can do, the fewer the number of assumptions we will have to question when something isn't behaving as we expect. In this case, I made the seemingly reasonable categorical assumption that unused class definitions would have no effect on my program. But for some reason that I still don't understand, it was violated. I also made the categorical assumption that Ember's get() and set() methods worked like they would work in a map. But that assumption didn't hold up either. I encounter assumptions that don't hold up all the time. Every programmer does. But rarely are they so deeply and universally held as these.

So what can we learn from this? In The Fifth Discipline, Senge goes on to talk about the importance of thinking with a systems perspective; about how we need to stop blaming people and focus more on the systems involved. I think it's telling how in my 5 or 6 years of Haskell programming I've never seen a bug as crazy as these two that I encountered after working only a few months on a significant Javascript project. Haskell with it's purity and strong static type system allows me to make significantly more confident categorical statements about what my code can and cannot do. That allows me to more easily build better abstractions that actually reduce complexity for the end user instead of just hiding it away in a less frequented area.

Thursday, December 20, 2012

Haskell Web Framework Matrix

A comparison of the big three Haskell web frameworks on the most informative two axes I can think of.

image/svg+xml DSLs Combinators Snap Happstack Yesod NS Some thingsdynamic Everythingtype-safe

Note that this is not intended to be a definitive statement of what is and isn't possible in each of these frameworks. As I've written elsewhere, most of the features of each of the frameworks are interchangeable and can be mixed and matched. The idea of this matrix is to reflect the general attitude each of the frameworks seem to be taking, because sometimes generalizations are useful.