Unit Tests that Write Themselves

If you think of what a unit test does in it’s most simple form it usually has some value going into a black-box (which may be a function or sequence of steps) then checking that the result that comes out matches some expected value(s).

This is all well and good, it makes unit tests explicit, simple and reproducible. Unfortunately, there are some downsides, most notably your trusting that individual unit tests collectively cover all values that represent all paths. When I say a path I’m talking about one unique sequence of steps/outcomes through a function.

Hypothesis is an amazing library and an incredible tool for Python inspired by QuickCheck , a package for Haskell .

At it’s heart is a value generator for tests, but takes that notion leaps and bounds further by actually testing those combinations of values for you and surfacing values that break tests. So in a way, it can write the unit tests for you.

The rest of this article will be the example they provide .

Suppose we’ve written a run length encoding system and we want to test it out.

We have the following code which I took straight from the Rosetta Code wiki (OK, I removed some commented out code and fixed the formatting, but there are no functional modifications):

`def encode(input_string):     count = 1     prev = ''     lst = []     for character in input_string:         if character != prev:             if prev:                 entry = (prev, count)                 lst.append(entry)             count = 1             prev = character         else:             count += 1     else:         entry = (character, count)         lst.append(entry)     return lst   def decode(lst):     q = ''     for character, count in lst:         q += character * count     return q`

We want to write a test for this that will check some invariant of these functions.

The invariant one tends to try when you’ve got this sort of encoding / decoding is that if you encode something and then decode it then you get the same value back.

Lets see how you’d do that with Hypothesis:

`from hypothesis import given from hypothesis.strategies import text  @given(text()) def test_decode_inverts_encode(s):     assert decode(encode(s)) == s`

For this example we’ll just let pytest discover and run the test. We’ll cover other ways you could have run it later.

The `text` function returns what Hypothesis calls a search strategy. An object with methods that describe how to generate and simplify certain kinds of values. The `@given` decorator then takes our test function and turns it into a parametrized one which, when called, will run the test function over a wide range of matching data from that strategy.

Anyway, this test immediately finds a bug in the code:

`Falsifying example: test_decode_inverts_encode(s='')  UnboundLocalError: local variable 'character' referenced before assignment`

Hypothesis correctly points out that this code is simply wrong if called on an empty string.

If we fix that by just adding the following code to the beginning of the function then Hypothesis tells us the code is correct (by doing nothing as you’d expect a passing test to).

`if not input_string:     return []`

If we wanted to make sure this example was always checked we could add it in explicitly:

`from hypothesis import given, example from hypothesis.strategies import text  @given(text()) @example('') def test_decode_inverts_encode(s):     assert decode(encode(s)) == s`

You don’t have to do this, but it can be useful both for clarity purposes and for reliably hitting hard to find examples. Also in local development Hypothesis will just remember and reuse the examples anyway, but there’s not currently a very good workflow for sharing those in your CI.

It’s also worth noting that both example and given support keyword arguments as well as positional. The following would have worked just as well:

`@given(s=text()) @example(s='') def test_decode_inverts_encode(s):     assert decode(encode(s)) == s`

Suppose we had a more interesting bug and forgot to reset the count each time. Say we missed a line in our encode method:

`def encode(input_string):   count = 1   prev = ''   lst = []   for character in input_string:       if character != prev:           if prev:               entry = (prev, count)               lst.append(entry)           # count = 1  # Missing reset operation           prev = character       else:           count += 1   else:       entry = (character, count)       lst.append(entry)   return lst`

Hypothesis quickly informs us of the following example:

`Falsifying example: test_decode_inverts_encode(s='001')`

Note that the example provided is really quite simple. Hypothesis doesn’t just find any counter-example to your tests, it knows how to simplify the examples it finds to produce small easy to understand ones. In this case, two identical values are enough to set the count to a number different from one, followed by another distinct value which should have reset the count but in this case didn’t.

The examples Hypothesis provides are valid Python code you can run. Any arguments that you explicitly provide when calling the function are not generated by Hypothesis, and if you explicitly provide all the arguments Hypothesis will just call the underlying function the once rather than running it multiple times.