Efficiently Improving Test Coverage with Algebraic Data Types

Think of a time you've written tests for (de)serialization code of some kind, say for a data structure called Foo.  If you were using the lowest level of sophistication you probably defined a few values by hand, serialized them, deserialized that, and verified that you ended up with the same value you started with.  In Haskell nomenclature we'd say that you manually verified that parse . render == id.  If you were a little more sophisticated, you might have used the QuickCheck library (or any of the numerous similar packages it inspired in other languages) to verify the parse . render == id property for a bunch of randomly generated values.  The first level of sophistication is often referred to as unit testing.  The second frequently goes by the term property testing or sometimes fuzz testing.

Both unit testing and property testing have some drawbacks.  With unit testing you have to write fairly tedious boilerplate of listing by hand all the values you want to test with.  With property testing you have to write a precise specification of how to generate the different cases randomly.  There is a package called generic-arbitrary that can automatically derive these generators for you generically, but that approach often ends up testing many values that aren't giving you any increase in code coverage.  To see this, consider the example of testing this domain-specific data type:

data MyError e a = Failure e | Success a

Here is what we get using generic-arbitrary to generate test values:

λ import Test.QuickCheck
λ import Test.QuickCheck.Arbitrary.Generic
λ let g = genericArbitrary :: Gen (MyError Int Bool)
λ mapM_ print =<< generate (replicateM 20 g)
Success True
Success True
Success False
Success False
Success True
Failure 17
Failure 8
Failure (-29)
Failure (-14)
Success True
Failure 22
Failure (-2)
Success False
Success True
Success False
Failure (-15)
Success False
Success True
Failure (-5)
Failure (-5)

It is testing the exact same value multiple times.  But even if it had a mechanism for deduplicating the values, do you really need so many different test cases for the serialization of the Int in the Failure case?  I'm testing my serialization code for the MyError type, so I probably don't care about testing the serialization of Int.  I would take it as a given that the libraries I'm using got the serialization for Int correct.  If we take this assumption as a given, then for the above case of testing MyError Int Bool, I would really only care about two cases, the Failure case and the Success case.  Testing serializations for Int and Bool are out of the scope of my concern because from the perspective of my project they are primitives.

What we really want to test is something I call "full constructor coverage".  With this pattern of generation you only generate enough values to exercise each constructor your your type hierarchy once.  This gets you better code coverage while testing many fewer values.  If we wanted exhaustive testing, the number of values you'd need to test are given by this relation:

-- Product type
numCases (a, b) = numCases a * numCases b
-- Sum type
numCases (Either a b) = numCases a + numCases b

This is precisely what the "algebraic" in "algebraic data types" is referring to.  For product types the number of inhabitants is multiplied, for sum types it is added.  However, if we're going for full constructor coverage, the number of values we need to test is reduced to the following relation:

numCases (a, b) = max (numCases a) (numCases b)
numCases (Either a b) = numCases a + numCases b

For complex types, replacing the multiplication with the max greatly reduces the number of values you need to test.  Here's what full constructor coverage looks like for MyError:

λ import Fake
λ let g = unCoverage (gcover :: Coverage (MyError Int Bool))
λ mapM_ print =<< generate (sequence g)
Failure 75
Success False

At this point readers might be thinking, "Who cares?  Computers are fast."  Well, QuickCheck defaults to generating 100 test cases for each property, but complex types can easily make this insufficient.  You can increase how many test cases it generates, but that will make your test suite slower and you probably won't know what the right number is to get the level of coverage you are aiming for.  In my experience slower test suites can cause tangible business impact in production systems in terms of possibly longer downtimes when problems are encountered in production and you have to wait for tests to pass before you can deploy a fix.  At the end of the day, naive randomized test case generation very unlikely to score as well as the full constructor coverage method under the metric of lines of code covered per test case.

The good news is that Haskell's algebraic data types give us exactly what we need to get full constructor coverage in a completely automatic way.  It is a part of the fake package and you can use it today.  You can see example uses in the test suite.  In a dynamically typed language you don't have knowledge of the constructors and forms they take.  All variables can hold any data, so you don't have any knowledge of structure to guide you.  Thanks to types, Haskell knows that structure.

Comments

Derek Elkins said…
When I saw the title of this post I thought it would be about making an algebraic data type as a sort of input descriptor that would get randomly generated and then having an interpretation function turn that descriptor into actual values. The idea being that a "uniform" distribution of input descriptors would, via the interpretation function, lead to a non-uniform distribution of inputs.

Popular posts from this blog

Fake: Generating Realistic Test Data in Haskell