Friday, February 29, 2008

How I Learned to Stop Worrying And Love Haskell's Type Inference

In developing the HAppS example that I have been posting here, I came upon a problem that gave me new insight to Haskell. I don't think it was a particularly deep insight, but it is significant to me as someone new to the language, and new to type inference.

Consider a user authentication function that retrieves a map from the reader monad, looks up a username, and compares the password retrieved with a passed in parameter. I first implemented this function something like this:

data User = User {
  username :: String,
  password :: String
}

authUser name pass = do
                     u <- liftM (M.lookup name) ask
                     liftM2 (==) (liftM password u) (return pass)

In the process of getting there, I stumbled around rearranging and inserting various liftM calls until I finally got it to work. Many of the reasons behind the type errors still seemed like voodoo to me. And my haphazard approach to fixing them is evident looking at the code. The problem with this function is how it behaves when the username does not exist in the map. A look at the lookup function in Data.Map reveals that it has the following type:

lookup :: (Monad m, Ord k) => k -> Map k a -> m a

The documentation informs us that the function will return the result in a monad or fail if the key isn't in the map. When I originally wrote the function, the significance of this was lost on me. So I didn't pay much attention to it and after trial and error, finally got authUser to compile. Now I needed to figure out how to properly handle the failure of the monad.

The documentation for lookup suggests using the Maybe monad, where fail is defined to be Nothing. So how to get lookup into the Maybe monad? I could have used another function and specified a type declaration, but there had to be a better way. I'm not a fan of type annotations. They seem to clutter up the code too much. After thinking about it for awhile, I finally realized that the call to lookup was being put into the reader monad because that's the monad being used by authUser.

When the lightbulb came on, it was blinding. I just needed to use the result of lookup in a way that forced the Haskell's type inference to put it into the Maybe monad! To do this, all you have to do is compare to Just pass instead of pass. We'll have to lift the password function into the Maybe monad first, but that's not a problem. So the code simplifies to this:

authUser2 name pass = do
  users <- ask
  return $ (Just pass) == liftM password (M.lookup name users)

...which has exactly the desired behavior. If the user isn't found in the map, then the lookup returns Nothing. The lifted password function also returns Nothing, which is then compared to "Just pass". This comparison fails just like we want.

In retrospect, this doesn't seem like a particularly difficult concept to understand. I knew that the type inference engine did this sort of thing with type variables. But I think the many ways of applying this behavior is something not immediately appreciated by programmers coming to Haskell from an imperative background. Conclusion: Haskell's type inference engine is your friend. Instead of viewing it as something getting in the way that must be worked around, try to figure out how you can make it work to your advantage.

Thursday, February 28, 2008

Transactional Integrity Problem

An astute reader pointed out that there is a transactional integrity problem with the HAppS application built over the last 4 posts. The function checkAndAdd in Finished HAppS Application contains a call to "query $ IsUser" as well as a call to "update $ AddUser". This violates that ACID guarantee that was desired from the checkAndAdd function. If two people simultaneously try to create the same username, it's possible that both of them could get past the "query" and "if exists" statements before either of the "update AddUser" statements are executed. In this case, both of the AddUser updates would succeed and both users would think their account was created. But if they had the same username, then first one would be overwritten by the second one. The second user wouldn't notice a problem, but the first user would not be able to log in to the newly created account because his password would probably be different from the password chosen by the second user. This wouldn't be the end of the world, but it would certainly create frustration for the first user.

The problem exists because HAppS gives us transactional guarantees at the query and update level only. I just didn't think about it when I originally wrote the code. I could just give the fix, but I'll outline incorrect attempts I made before I got to the fix. Hopefully it will be more beneficial to see some wrong solutions and how they got corrected. Since I'm still trying to overcome tendencies learned from years of imperative programming, maybe this will be useful to other people in the same position. First we need to modify the addUser function to check for the existence of the user first. What we want is something like the following:

addUser name u = do
                 exists <- isUser name
                 if not exists
                   then modUsers $ M.insert name u

Haskell's if statement requires an else clause. The else clause must be the same type as modUsers, so that means it has to be "else return ()". Then I found out that Haskell's "unless" function does exactly the same thing. So you can replace the whole if statement with "unless exists $ modUsers $ M.insert name u".

For those who are still trying to understand monads, I should point out that it won't work to avoid exists binding with "if not (isUser name)". The problem here is that isUser is of type "m Bool", and the not function needs a Bool. The bind operator is the mechanism responsible for allowing us to effectively pull a result out of the monad to be passed to another function. Information never actually comes out of the monad though, because the function has to return a monad-encapsulated value.

The old type signature for addUser was:

addUser :: MonadState State m => String -> User -> m ()

The new one (now inferred by the compiler) is:

addUser :: (MonadReader State m, MonadState State m) =>
            String -> User -> m ()

The reason for the change is that isUser is a MonadReader action and modUsers is a MonadState action. The new type signature is saying that code using this function must be an instance of both MonadReader and MonadState. Fortunately, the update method has both of these, so we can get away with using the two different monads.

There's still one problem with this definition of addUser. There is no way for the caller to find out whether a new user was created or not. The simplest way to communicate this information is to just return exists at the end of the computation. So our finished addUser function in Session.hs looks like this:

addUser name u = do
                 exists <- isUser name
                 unless exists $ modUsers $ M.insert name u
                 return exists

And our checkAndAdd function becomes:

checkAndAdd user pass = do
  existed <- update $ AddUser user $ User user pass
  if existed
    then ok $ toResponse $ "User already exists."
    else ok $ toResponse $ "User created."

Now we have a single transaction and arguably cleaner code. What have we learned from this? Your update and query functions need to be carefully designed to provide a framework with the appropriate transactional guarantees needed in your system.

Tuesday, February 26, 2008

Finished HAppS Application

Update: Due to popular demand, I put the plain haskell code for this app on hpaste.org. Here are links for Session.hs, Main.hs, and login.html.

The past three posts have laid the groundwork for a full HAppS web app. We have built the infrastructure for an authentication system that can store usernames and passwords as well as the session IDs for active sessions. Here we will tie everything together to make a functional web app. It should all compile with HAppS 0.9.2. Let's get the imports out of the way.
> {-# OPTIONS -fglasgow-exts #-}
> {-# LANGUAGE TemplateHaskell , FlexibleInstances,
>              UndecidableInstances, OverlappingInstances,
>              MultiParamTypeClasses, GeneralizedNewtypeDeriving #-}
>
> module Main where
>
> import Control.Concurrent
> import Control.Monad
> import HAppS.Server
> import HAppS.State
>
> import Session --The session and state code already developed
In the first post, we created code to serve a login/registration form and handle the form submission. Some of the details have changed since the first post, but the main ideas are the same. In order to verify that our state is being held properly, we will need two additional pages. One will show which user is "logged in" (associated with the active session). We'll make that page available at the /view URL. The second page will show a list of all registered users. We'll put that page at /list. Here is the new ServerPartT list implementing this:
> impl = [ dir "login" [methodSP GET $ (fileServe ["login.html"] ".")
>                      ,methodSP POST $ withData loginPage ]
>        , dir "newuser" [methodSP POST $ withData newUserPage]
>        , dir "view" [withDataFn (liftM Just (readCookieValue "sid") `mplus` return Nothing) viewPage]
>        , dir "list" userListPage
>        , anyRequest $ ok $ toResponse "Sorry, couldn't find a matching handler" ]
The code for /view uses readCookieValue to get the session ID from the cookies in the HTTP request. It encapsulates the result in the Maybe monad for convenient error handling and uses withDataFn to pass the results to viewPage. Before we jump into the new code, we need more of the code developed in the first post for getting data from the form.
> data UserAuthInfo = UserAuthInfo String String
> data NewUserInfo = NewUserInfo String String String
>
> instance FromData UserAuthInfo where
>     fromData = liftM2 UserAuthInfo (look "username")
>                (look "password" `mplus` return "nopassword")
>
> instance FromData NewUserInfo where
>     fromData = liftM3 NewUserInfo (look "username")
>                (look "password" `mplus` return "nopassword")
>                (look "password2" `mplus` return "nopassword2")
Since user registration is the first step in the process, we'll start there. newUserPage will take a NewUserInfo as a parameter, verify that the two passwords match, and add the user to our state. If the passwords did not match, it will return a simple error. This is very simple using Haskell's guards.
> newUserPage (NewUserInfo user pass1 pass2)
>   | pass1 == pass2 = [anyRequest $ do (checkAndAdd user pass1)]
>   | otherwise = [anyRequest $ ok $ toResponse $ "Passwords did not match"]
The checkAndAdd function needs to check to make sure the user doesn't already exist, and then add the user or return an error. To do this, we'll call upon the isUser and addUser functions created in the last post. But we can't call them directly. We have to use the query and update functions as well as the IsUser and AddUser types created for us by mkMethods. The update function is used for operations that modify the state. query is used for read-only operations.
> checkAndAdd user pass = do
>   exists <- query $ IsUser user
>   if exists
>     then ok $ toResponse $ "User already exists"
>     else do
>       update $ AddUser user $ User user pass
>       ok $ toResponse $ "User created."
The login form now needs to verify the user's password and create a new session to log the user in. Again, the functions we've already built give us exactly this capability. The loginPage function performs the authentication, and then calls the performLogin function to create the new session. performLogin uses mkCookie to create a cookie with the appropriate key/value pair. This cookie is passed to addCookie, which also takes an argument indicating the lifetime of the cookie. A value of -1 makes the cookie last forever.
> loginPage (UserAuthInfo user pass) = [anyRequest $ do
>   allowed <- query $ AuthUser user pass
>   if allowed
>     then performLogin user
>     else ok $ toResponse $ "Incorrect password"
>   ]
>
> performLogin user = do
>   key <- update $ NewSession (SessionData user)
>   addCookie (-1) (mkCookie "sid" (show key))
>   ok $ toResponse $ "UserAuthInfo: " ++ show (user)
After seeing state queries and updates used for registration and login, the viewPage and userListPage functions will be fairly straightforward. Haskell's piecewise function definition combined with our use of the Maybe monad makes it easy to separate the "error case" where the user is not logged in.
> viewPage (Just sid) = [anyRequest $ do
>   ses <- query $ (GetSession $ sid)
>   ok $ toResponse $ "Cookie value: " ++ (maybe "not logged in" show (ses :: Maybe SessionData))]
>
> viewPage Nothing =
>   [anyRequest $ ok $ toResponse $ "Not logged in"]
>
> userListPage = [anyRequest $ do u <- query ListUsers; ok $ toResponse $ "Users: " ++ (show u)]
The main function needs a little more infrastructure when state is being used. A call to startSystemState is required for initialization. startSystemState needs to be passed an entry point Proxy that tells what data is being stored in state. It's also probably a good idea to terminate more gracefully, so I've forked a new thread for the HTTP server and and waited for termination before killing it and cleaning up.
> entryPoint :: Proxy State
> entryPoint = Proxy
>
> main = do
>   control <- startSystemState entryPoint
>   tid <- forkIO $ simpleHTTP nullConf impl
>   waitForTermination
>   putStrLn "Shutting down..."
>   killThread tid
>   shutdownSystem control
>   putStrLn "Shutdown complete"
So, in less than 200 lines of Haskell code, we have implemented a fully functional, stateful web app in HAppS. What has this gotten us that we wouldn't have in most other web frameworks? First, we have all our state stored in memory and persisted nicely on disk. I'm not going to get into a detailed comparison with the standard relational database approach, but I will highlight some of the benefits that we get from this approach.
  • Our state is defined with the Haskell type system giving us all the safety of its type checking and the flexibility of its type inference.
  • The HAppS state system gives us ACID guarantees on our state changes with very little of the complexity visible to the user.
  • With the upcoming development of multi-master replication and sharding, HAppS will give us excellent scalability with very little impact to our application.
Second, we have developed a simple authentication framework that can be the basis for any web app that requires a user to log in. This web app framework can be used with or without the state management mechanism. With a little restructuring, the state could be completely separated from the web front-end and any method of persistence used in its place. And third, you get to do all this in Haskell, a lazy, pure, functional programming language the merits of which have been elucidated by many others.

Monday, February 25, 2008

Using HAppS-State

Update: A demo of the finished application is now available. See this post for more information.

In the last post, I outlined the requirements for making a data type an instance of Component. This is great, but not very useful without a mechanism for accessing the state data.

HAppS persists its state by storing functions that operate on the state. This requires a way to serialize the functions. HAppS does this for you with the TemplateHaskell function mkMethods. So how does this affect you? Your functions that manipulate state must be either Update or Query functions. Update functions use the State monad, and Query functions use the Reader monad.

First we'll set up some convenience functions that will be used to construct the actual Query and Update functions.

> askUsers :: MonadReader State m => m (M.Map String User)
> askUsers = return . users =<< ask
> 
> askSessions::MonadReader State m => m (Sessions SessionData)
> askSessions = return . sessions =<< ask
> 
> modUsers :: MonadState State m =>
>             (M.Map String User -> M.Map String User) -> m ()
> modUsers f = modify (\s -> (State (sessions s) (f $ users s)))
> 
> modSessions :: MonadState State m =>
>                (Sessions SessionData -> Sessions SessionData) -> m ()
> modSessions f = modify (\s -> (State (f $ sessions s) (users s)))

Now we need functions for adding, listing, authenticating, and checking the existence of users. All the code dealing with the State and Reader monads has been hidden away in the above convenience functions, so these should be pretty easy to understand.

> isUser :: MonadReader State m => String -> m Bool
> isUser name = liftM (M.member name) askUsers
> 
> addUser :: MonadState State m => String -> User -> m ()
> addUser name u = modUsers $ M.insert name u
> 
> authUser :: MonadReader State m => String -> String -> m Bool
> authUser name pass = do
>                      u <- liftM (M.lookup name) askUsers
>                      liftM2 (==) (liftM password u) (return pass)
> 
> listUsers :: MonadReader State m => m [String]
> listUsers = liftM M.keys askUsers

Here are the functions for manipulating sessions. These are based on AllIn.hs. numSessions is a little different from the others. Here is the explanation from the AllIn comments:

Numsessions takes a proxy type as an argument so we know which session you want. You may have sessions on more than one type in state operating or sessions may be nested elsewhere. You can only have one of each type in all of state.
> setSession :: (MonadState State m) => SessionKey -> SessionData -> m ()
> setSession key u = do
>   modSessions $ Sessions . (M.insert key u) . unsession
>   return ()
> 
> newSession u = do
>   key <- getRandom
>   setSession key u
>   return key
> 
> getSession::SessionKey -> Query State (Maybe SessionData)
> getSession key = liftM ((M.lookup key) . unsession) askSessions
> 
> numSessions:: Proxy State -> Query State Int
> numSessions = proxyQuery $ liftM (M.size . unsession) askSessions

Now that we have all our state functions, we just need to call mkMethods. These functions may be accessed from any IO with something like (query $ authUser name pass) or (update $ AddUser name user)

> $(mkMethods ''State ['addUser, 'authUser, 'isUser, 'listUsers,
>             'setSession, 'getSession, 'newSession, 'numSessions])

Summary: We have seen that access to HAppS state is accomplished using the Reader and State monads. These accessor functions must be passed through mkMethods so HAppS knows how to serialize them for storage.

When you append this post to the previous one and add MonadState to the list of imports for Control.Monad.State, it should compile. This gives us everything we need to manage our authentication framework's state with HAppS.

Sunday, February 24, 2008

Intro to HAppS-State

Update: A demo of the finished application is now available. See this post for more information.

This post is written in literate haskell, so it should compile as-is in a .lhs file.

In my last article on HAppS, I gave a brief introduction to working with the HAppS web server to serve a basic user login/registration page and handle the form submission. In this article we are going to develop the framework for basic session management. The example file AllIn.hs in the HAppS source tree is very similar to this. I have made a few changes to demonstrate some different cases that one might encounter. So without further ado, we'll start with our standard import statements.

> {-# OPTIONS -fglasgow-exts #-}
> {-# LANGUAGE TemplateHaskell , FlexibleInstances,
>              UndecidableInstances, OverlappingInstances,
>              MultiParamTypeClasses, GeneralizedNewtypeDeriving #-}
> 
> module Session where
> 
> import qualified Data.Map as M
> import Control.Monad
> import Control.Monad.Reader
> import Control.Monad.State (modify,put,get,gets)
> import Data.Generics hiding ((:+:))
> import HAppS.Server
> import HAppS.State
> import HAppS.Data

The first thing to do is create some data structures to store the session in HAppS state. We need some type of session key to store in the browser cookie. For now we'll make it a simple Integer. Later it might be good to make it a larger (128 bits or more) cryptographically secure random number.

> type SessionKey = Integer

For this simple example, we'll only need to store the username on the server. We could make a simple type alias like we did for the session key, but I'll make it a data type to show how a more complex implementation might work.

> data SessionData = SessionData {
>   sesUser :: String
> } deriving (Read,Show,Eq,Typeable,Data)

Now we need a map so that we can look up the session data when we get the session key from the browser cookie. I'll make this a generic data type so any type of data can be used as the value.

> data Sessions a = Sessions {unsession::M.Map SessionKey a}
>   deriving (Read,Show,Eq,Typeable,Data)

Since we're doing an authentication framework, we have to store the user's password (in a real system it would be a salted hash) and other user information. This could have been stored in the SessionData type. But there may be times when it's not convenient to look up a user by their session key, so we're going to make another data type to hold all the individual information for each user. Right now it will just be the username and password.

> data User = User {
>   username :: String,
>   password :: String
> } deriving (Show,Read,Typeable,Data)

Next we will make a state data structure to hold all of our HAppS state. To store data in HAppS state, your top-level data type (or types) must be an instance of the Component type class. To simplify things we'll make only the top level state type an instance of Component. You could just as easily separate the session data and the user data into separate components. Components give you transactions and persistence. If you want those assurances you can encapsulate stuff with a component. So if you want transactional updates to your user data separately from the session data, you might want to make them separate instances. But I'll leave those details for another post.

> data State = State {
>   sessions :: Sessions SessionData,
>   users :: M.Map String User
> } deriving (Show,Read,Typeable,Data)

Before we make state an instance of Component, there are some other things that have to be done first. State and our other complex data types created thus far must be instances of Serialize and Version. And they must be declared in the right order. The lowest-level data first. The Version instance is trivial. HAppS provides some TemplateHaskell code to create Serialize instances for you.

> instance Version SessionData
> instance Version (Sessions a)
> 
> $(deriveSerialize ''SessionData)
> $(deriveSerialize ''Sessions)
> 
> instance Version State
> instance Version User
> 
> $(deriveSerialize ''User)
> $(deriveSerialize ''State)
> 
> instance Component State where
>   type Dependencies State = End
>   initialValue = State (Sessions M.empty) M.empty

The Component instantiation requires you to specify dependencies and some initial value for the state. Since we don't have any other components, a simple End will work for the dependencies. An empty map will be our initial value.

If you did have other dependencies, they would be specified something like this:

type Dependencies State = Sessions SessionData :+:
                          Users :+:
                          End

If you leave out the 8 lines deriving Version and Serialize, you'll get a compiler error that looks something like the following:

No instances for (Data.Generics.SYB.WithClass.Basics.Data
                    NormalizeD State,
                  Default State,
                  Data.Generics.SYB.WithClass.Basics.Data XmlD State)
  arising from the superclasses of an instance declaration
               at Session.lhs:102:2
Possible fix:
  add an instance declaration for
  (Data.Generics.SYB.WithClass.Basics.Data NormalizeD State,
   Default State,
   Data.Generics.SYB.WithClass.Basics.Data XmlD State)
In the instance declaration for `Component State'

Unfortunately this isn't a very helpful message. For now, HAppS defaults to use XML for its serialization. From what I understand, that may change in the future. For now, if you see an error like this, check that you have made everything an instance of Serialize.

This example demonstrated the first steps to getting data into HAppS state. In the next post we'll look at how to access and manipulate the state.

Monday, February 11, 2008

Intro to HAppS Part 1

Update: A demo of the finished application is now available. See this post for more information.

This post is the first in a series outlining the process I am taking to get a basic stateful web application working with the Haskell web framework HAppS. There is very little documentation available for versions of HAppS after 0.8.8. While this is not documentation, it should aid in getting an idea of how HAppS manages state. The code in these posts should work with HAppS 0.9.2.

For this application, we'll focus on the basic capabilities needed for user creation, authentication, and session management. The first thing we need is a form for logging in or creating a new user, and two URLs to handle the submission of these forms. Well use /login for the form, and we'll have the login form POST to the same place. The registration form will POST to /newuser. All other pages will return an error. Here is the basic HAppS code to do that:

impl = [ dir "login" [methodSP GET $ (fileServe ["login.html"] ".")
                     ,methodSP POST $ withDataFn fromLoginRequest processLogin]
       , dir "newuser" [methodSP POST $ withData processNewUser]
       , anyRequest $ ok $ toResponse "Sorry, couldn't find a matching handler"]

main = do simpleHTTP nullConf impl

simpleHTTP is passed a configuration parameter and a list of ServerPartT that specify the behavior for URLs.

The first dir says that a GET to the /login URL will display the login.html file that we have stored in the current directory. It also specifies that a POST to the same URL will result in a call to processLogin. The second dir says that a POST to /newuser will be handled by the processNewUser function.

withDataFn is a HAppS function that takes two functions as parameters. In this example, fromLoginRequest reads from the request data in the reader monad and produces a data structure that is then passed to processLogin. processLogin then uses this data to generate a ServerPartT that produces a response.

withData is similar to withDataFn, but it doesn't require the first function to read the request. It requires that processNewUser accept one parameter of type FromData. FromData is just a type class that defines a fromData function that builds the appropriate data type from the Reader monad.

The astute reader might notice that after the appropriate definition of the FromData type class, withData can simply be defined as:

withData = withDataFn fromData

Here is the code for the functions used by withData and withDataFn:

data NewUserInfo = NewUserInfo String String String

instance FromData NewUserInfo where
    fromData = liftM3 NewUserInfo
      (look "username")
      (look "password" `mplus` return "nopassword")
      (look "password2" `mplus` return "nopassword2")

processNewUser (NewUserInfo user pass1 pass2)
  | pass1 == pass2 =
    [anyRequest $ ok $ toResponse $ "NewUserInfo: " ++ show (user,pass1,pass2)]
  | otherwise = [anyRequest $ ok $ toResponse $ "Passwords did not match"]

fromLoginRequest = do a <- look "username" `mplus` return "nouser"
                      b <- look "password" `mplus` return "nopassword"
                      return (a,b)

processLogin (u,p) =
  [anyRequest $ ok $ toResponse $ "User logged in with: " ++ show (u,p)]

The processNewUser and processLogin functions both create [ServerPartT] that defines the appropriate responses. processNewUser also does a check to make sure the two passwords match. This could easily be done on the client side, but it still can't hurt to verify it on the server as well.

The differences between these two functions is in the type of parameters they accept. processNewUser takes a NewUserInfo which knows how to build itself from the request reader. The fromData function for NewUserInfo uses look to get the form data from the request. fromLoginRequest does the same thing to retrieve its form data, but returns a tuple that can be passed to processLogin.

Here is the final code to our basic HAppS application. The HTML in login.html is left as an exercise to the reader.

module Main where

import Control.Monad
import HAppS.Server

data NewUserInfo = NewUserInfo String String String

instance FromData NewUserInfo where
    fromData = liftM3 NewUserInfo
      (look "username")
      (look "password" `mplus` return "nopassword")
      (look "password2" `mplus` return "nopassword2")

processNewUser (NewUserInfo user pass1 pass2)
  | pass1 == pass2 =
    [anyRequest $ ok $ toResponse $ "NewUserInfo: " ++ show (user,pass1,pass2)]
  | otherwise = [anyRequest $ ok $ toResponse $ "Passwords did not match"]

fromLoginRequest = do a <- look "username" `mplus` return "nouser"
                      b <- look "password" `mplus` return "nopassword"
                      return (a,b)

processLogin (u,p) =
  [anyRequest $ ok $ toResponse $ "User logged in with: " ++ show (u,p)]
  
impl = [ dir "login" [methodSP GET $ (fileServe ["login.html"] ".")
                     ,methodSP POST $ withDataFn fromLoginRequest processLogin]
       , dir "newuser" [methodSP POST $ withData processNewUser]
       , anyRequest $ ok $ toResponse "Sorry, couldn't find a matching handler"]

main = do simpleHTTP nullConf impl