Part 7: Regression testing

(A series of blog posts about the tech behind lagen.nu. Earlier parts are here: first, second, third, fourth, fifth and sixth)

Like most developers that have been Test infected, I
try to create regression tests whenever I can. A project like
lagen.nu, which has no GUI, no event handling, no databases, just
operations on text files, is really well suited for automated
regression testing. However, when I started out, I didn’t do
test-first programming since I didn’t really have any idea of what
I was doing. As things solidified, I encountered a particular
section of the code that lended itself very nicely to regression
testing.

Now, when I say regression testing, I don’t neccesarily mean unit
testing. I’m not so concerned with testing classes at the method
level as my ”API” is really document oriented; a particular text
document sent into the program should result in the return of a
particular XML document. Basically, there are only two methods
that I’m testing:

  • The lawparser.parse() method: Given a section of law text,
    returns the same section with all references marked up, as
    described in part 5.
  • Law._txt_to_xml(), which, given a entire law as plaintext,
    returns the xml version, as described in part 4

Since both these tests operate in the fashion ”Send in really big
string, compare result with the expected other even bigger
string”, I found that pyunit didn’t work that well for me, as it’s
more centered around testing lots of methods in lots of classes,
where the testdata is so small that it’s comfortable having them
inside the test code.

Instead, I created my tests in the form of a bunch of text
files. For lawparser.parse, each file is just two paragraphs, the
first being the indata, and the second being the expected outdata:

    Vid ändring av en bolagsordning eller av en beviljad koncession
    gäller 3 § eller 4 a § i tillämpliga delar.

    Vid ändring av en bolagsordning eller av en beviljad koncession
    gäller <link section="3">3 §</link> eller <link section="4a">4 a §</link> i tillämpliga delar.
  

The test runner then becomes trivial:

def runtest(filename,verbose=False,quiet=False):
    (test,answer) = open(filename).read().split("\n\n", 1)
    p = LawParser(test,verbose)
    res = p.parse()
    if res.strip() == answer.strip():
        print "Pass: %s" % filename
        return True
    else:
        print "FAIL: %s" % filename
        if not quiet:
            print "----------------------------------------"
            print "EXPECTED:"
            print answer
            print "GOT:"
            print res
            print "----------------------------------------"
            return False
  

Similarly, the code to test Law._txt_to_xml() is also
pretty trivial. There are two differences: Since the indata is
larger and already split up in paragraphs, the indata and expected
result for a particular test is stored in separate files. This
also lets me edit the expected results file using nXML mode in
Emacs.

Comparing two XML documents is also a little trickier, in that
they can be equivalent, but still not match byte-for-byte (since
there can be semantically insignificant whitespace and similar
stuff). To avoid getting false alarms, I put both the expected
result file, as well as the actual result, trough tidy. This
ensures that their whitespacing will be equivalent, as well as
easy to read. Also, a good example of piping things to and from a
command in python:

def tidy_xml_string(xmlstring):
    """Neatifies a XML string and returns it"""
    (stdin,stdout) = os.popen2("tidy -q -n -xml --indent auto --char-encoding latin1")
    stdin.write(xmlstring)
    stdin.close()
    return stdout.read()
  

If the two documents still don’t match, it can be difficult to
pinpoint the exact place where they match. I could dump the
results to file and run command-line diff on them, but since there
exists a perfectly good diff implementation in the python standard
libraries I used that one instead:

    from difflib import Differ
    differ = Differ()
    diff = list(differ.compare(res.splitlines(), answer.splitlines()))
    print "\n".join(diff)+"\n"
  

The result is even easier to read than standard diff output, since
it points out the position on the line as well (maybe there’s a
command line flag for diff that does this?):

[...]
      suscipit non, venenatis ac, dictum ut, nulla. Praesent
      mattis.</p>
    </section>
-   <section id="1" element="2">
?                   ^^^

+   <section id="1" moment="2">
?                   ^^

      <p>Sed semper, ante non vehicula lobortis, leo urna sodales
      justo, sit amet mattis felis augue sit amet felis. Ut quis
[...]
  

So, that’s basically my entire test setup for now. I need to build
more infrastructure for testing the XSLT transform and the HTML
parsing code, but these two areas are the trickiest.

Since I can run these test methods without having a expected
return value, they are very useful as the main way of developing
new functionality: I specify the indata, and let the test function
just print the outdata. I can then work on new functionality
without having to manually specifying exactly how I want the
outdata to look (because this is actually somewhat difficult for
large documents), I just hack away until it sort of looks like I
want, and then just cut’n paste the outdata to the ”expected
result” file.