> :m + Data.List Data.Function > contents <- readFile "user.log" > let l = lines contents > let t = map words l > mapM print $ take 2 tThese four lines are pretty straightforward. ":m +" is GHCi syntax that is similar to an import. readFile :: FilePath -> IO String reads the contents of a file into a string. The lines function splits the string on newlines and creates a list of strings representing each line in the file. We map the words function over each of these lines to split the lines around whitespace. At this point t :: [[String]]. You can think of it as a table (hence the name 't') where each row is a line in the file and each column is a field. The "mapM print" displays the first two elements of t on separate lines.
> let noDay = map (\(d:ds) -> take 7 d : ds) t > let months = groupBy ((==) `on` head) noDayNow we get into the meat of the analysis. The noDay line uses a simple map and a lambda to strip off the last three characters of the first field in every line, turning the field into unique month identifier. "groupBy" is a handy function that groups a list into "partitions" where the elements in a partition are all equal for a user-specified definition of equality. In this case, we're grouping the rows in noDay and we want to use equality of the first field to define our groups. The 'on' function is a handy little tool defined in Data.Function that makes this easier.
on :: (b -> b -> c) -> (a -> b) -> a -> a -> cOn's first argument is a binary operator "b -> b -> c". It's second argument is a function that transforms a's into b's. It returns a new binary operator "a -> a -> c" that applies the transform function to the a's to get two b's that it can use with the original binary operator. In our example, the "a -> a -> c" is equivalent to "row -> row -> Bool" (straight out of the definition of groupBy). So the 'on' function helps us construct this row comparator by first transforming the row and then comparing those things. Our comparison function is (==), and our row transformation is "head", which gets us the month field.
Here's a simple example:
> let exampleList = [ (1,9), (1,7), (2,16), (2,6) ] > groupBy ((==) `on` fst) exampleListThis groupBy call returns [ [(1,9),(1,7)], [(2,16),(2,6)] ]. It has grouped all the consecutive tuples with 1 as the first element into one list and all the 2's into a second list. These lists then must be grouped with a surrounding list. In the original example, we get a list of groups by months. The result can be conceptualized as list of bins representing each month where each of those bins is a list of all the log entries that happened in that month.
> let monthUniqs = map (nubBy ((==) `on` (!!2))) monthsOur next line has the form "map ... months". This means that we're doing some operation on each of the "month bins" we just created. In this case our operation is "nubBy ((==) `on` (!!2))". It's very similar to the groupBy line. 'nubBy' removes duplicates from a list, where the supplied comparison function defines what things are duplicates.
> nubBy (==) [1,1,2,2,1] == [1,2]We again call on the trusty 'on' function to make nubBy use the third field (the username) to determine equality. This removes all duplicate usernames from each of the month bins, so the number of items in each of the bins is the number of unique registered users that came to the site in that month.
> zip (map (head . head) monthUniqs) (map length monthUniqs)Now we want to display the length of each of the bins. The lengths are more interesting when we know which months they go with, so we use the zip function to combine two lists into one list of tuples.
> let user = groupBy ((==) `on` (!!2)) $ sortBy (compare `on` (!!2)) tBy now these patterns should be looking familiar. Here we're grouping by the username field just like we grouped by the month field before. The only difference is that we have to sort the list by the username field first because groupBy only groups equivalent elements that are adjacent. The result of this is a list of bins representing each user.
> length usersThe length of this list tells us the number of registered users that have logged in.
> let userDays = map (nubBy ((==) `on` head)) userNow we're nubbing the user bins to remove duplicate days. (It's days because
userwas created from
> let visitCounts = map length userDaysThis tells us how many different days each user has visited the site.
None of what we have done here is particularly difficult. It wouldn't be hard to do the same thing with Ruby or Python. The point of the screencast is to show that it can also be done easily in Haskell, a statically typed, compiled language; and to demonstrate some useful functions in Haskell's standard library.