Efficiently Improving Test Coverage with Algebraic Data Types
Think of a time you've written tests for (de)serialization code of some kind, say for a data structure called
Both unit testing and property testing have some drawbacks. With unit testing you have to write fairly tedious boilerplate of listing by hand all the values you want to test with. With property testing you have to write a precise specification of how to generate the different cases randomly. There is a package called generic-arbitrary that can automatically derive these generators for you generically, but that approach often ends up testing many values that aren't giving you any increase in code coverage. To see this, consider the example of testing this domain-specific data type:
Here is what we get using generic-arbitrary to generate test values:
It is testing the exact same value multiple times. But even if it had a mechanism for deduplicating the values, do you really need so many different test cases for the serialization of the
What we really want to test is something I call "full constructor coverage". With this pattern of generation you only generate enough values to exercise each constructor your your type hierarchy once. This gets you better code coverage while testing many fewer values. If we wanted exhaustive testing, the number of values you'd need to test are given by this relation:
This is precisely what the "algebraic" in "algebraic data types" is referring to. For product types the number of inhabitants is multiplied, for sum types it is added. However, if we're going for full constructor coverage, the number of values we need to test is reduced to the following relation:
For complex types, replacing the multiplication with the max greatly reduces the number of values you need to test. Here's what full constructor coverage looks like for MyError:
At this point readers might be thinking, "Who cares? Computers are fast." Well, QuickCheck defaults to generating 100 test cases for each property, but complex types can easily make this insufficient. You can increase how many test cases it generates, but that will make your test suite slower and you probably won't know what the right number is to get the level of coverage you are aiming for. In my experience slower test suites can cause tangible business impact in production systems in terms of possibly longer downtimes when problems are encountered in production and you have to wait for tests to pass before you can deploy a fix. At the end of the day, naive randomized test case generation very unlikely to score as well as the full constructor coverage method under the metric of lines of code covered per test case.
The good news is that Haskell's algebraic data types give us exactly what we need to get full constructor coverage in a completely automatic way. It is a part of the fake package and you can use it today. You can see example uses in the test suite. In a dynamically typed language you don't have knowledge of the constructors and forms they take. All variables can hold any data, so you don't have any knowledge of structure to guide you. Thanks to types, Haskell knows that structure.
Foo
. If you were using the lowest level of sophistication you probably defined a few values by hand, serialized them, deserialized that, and verified that you ended up with the same value you started with. In Haskell nomenclature we'd say that you manually verified that parse . render == id
. If you were a little more sophisticated, you might have used the QuickCheck library (or any of the numerous similar packages it inspired in other languages) to verify the parse . render == id property for a bunch of randomly generated values. The first level of sophistication is often referred to as unit testing. The second frequently goes by the term property testing or sometimes fuzz testing.Both unit testing and property testing have some drawbacks. With unit testing you have to write fairly tedious boilerplate of listing by hand all the values you want to test with. With property testing you have to write a precise specification of how to generate the different cases randomly. There is a package called generic-arbitrary that can automatically derive these generators for you generically, but that approach often ends up testing many values that aren't giving you any increase in code coverage. To see this, consider the example of testing this domain-specific data type:
data MyError e a = Failure e | Success a
Here is what we get using generic-arbitrary to generate test values:
λ import Test.QuickCheck λ import Test.QuickCheck.Arbitrary.Generic λ let g = genericArbitrary :: Gen (MyError Int Bool) λ mapM_ print =<< generate (replicateM 20 g) Success True Success True Success False Success False Success True Failure 17 Failure 8 Failure (-29) Failure (-14) Success True Failure 22 Failure (-2) Success False Success True Success False Failure (-15) Success False Success True Failure (-5) Failure (-5)
It is testing the exact same value multiple times. But even if it had a mechanism for deduplicating the values, do you really need so many different test cases for the serialization of the
Int
in the Failure
case? I'm testing my serialization code for the MyError
type, so I probably don't care about testing the serialization of Int
. I would take it as a given that the libraries I'm using got the serialization for Int
correct. If we take this assumption as a given, then for the above case of testing MyError Int Bool
, I would really only care about two cases, the Failure
case and the Success
case. Testing serializations for Int
and Bool
are out of the scope of my concern because from the perspective of my project they are primitives.What we really want to test is something I call "full constructor coverage". With this pattern of generation you only generate enough values to exercise each constructor your your type hierarchy once. This gets you better code coverage while testing many fewer values. If we wanted exhaustive testing, the number of values you'd need to test are given by this relation:
-- Product type numCases (a, b) = numCases a * numCases b -- Sum type numCases (Either a b) = numCases a + numCases b
This is precisely what the "algebraic" in "algebraic data types" is referring to. For product types the number of inhabitants is multiplied, for sum types it is added. However, if we're going for full constructor coverage, the number of values we need to test is reduced to the following relation:
numCases (a, b) = max (numCases a) (numCases b) numCases (Either a b) = numCases a + numCases b
For complex types, replacing the multiplication with the max greatly reduces the number of values you need to test. Here's what full constructor coverage looks like for MyError:
λ import Fake λ let g = unCoverage (gcover :: Coverage (MyError Int Bool)) λ mapM_ print =<< generate (sequence g) Failure 75 Success False
At this point readers might be thinking, "Who cares? Computers are fast." Well, QuickCheck defaults to generating 100 test cases for each property, but complex types can easily make this insufficient. You can increase how many test cases it generates, but that will make your test suite slower and you probably won't know what the right number is to get the level of coverage you are aiming for. In my experience slower test suites can cause tangible business impact in production systems in terms of possibly longer downtimes when problems are encountered in production and you have to wait for tests to pass before you can deploy a fix. At the end of the day, naive randomized test case generation very unlikely to score as well as the full constructor coverage method under the metric of lines of code covered per test case.
The good news is that Haskell's algebraic data types give us exactly what we need to get full constructor coverage in a completely automatic way. It is a part of the fake package and you can use it today. You can see example uses in the test suite. In a dynamically typed language you don't have knowledge of the constructors and forms they take. All variables can hold any data, so you don't have any knowledge of structure to guide you. Thanks to types, Haskell knows that structure.
Comments