Part 7: Regression testing

(A series of blog posts about the tech behind Earlier parts are here: first, second, third, fourth, fifth and sixth)

Like most developers that have been Test infected, I
try to create regression tests whenever I can. A project like, which has no GUI, no event handling, no databases, just
operations on text files, is really well suited for automated
regression testing. However, when I started out, I didn’t do
test-first programming since I didn’t really have any idea of what
I was doing. As things solidified, I encountered a particular
section of the code that lended itself very nicely to regression

Now, when I say regression testing, I don’t neccesarily mean unit
testing. I’m not so concerned with testing classes at the method
level as my ”API” is really document oriented; a particular text
document sent into the program should result in the return of a
particular XML document. Basically, there are only two methods
that I’m testing:

  • The lawparser.parse() method: Given a section of law text,
    returns the same section with all references marked up, as
    described in part 5.
  • Law._txt_to_xml(), which, given a entire law as plaintext,
    returns the xml version, as described in part 4

Since both these tests operate in the fashion ”Send in really big
string, compare result with the expected other even bigger
string”, I found that pyunit didn’t work that well for me, as it’s
more centered around testing lots of methods in lots of classes,
where the testdata is so small that it’s comfortable having them
inside the test code.

Instead, I created my tests in the form of a bunch of text
files. For lawparser.parse, each file is just two paragraphs, the
first being the indata, and the second being the expected outdata:

    Vid ändring av en bolagsordning eller av en beviljad koncession
    gäller 3 § eller 4 a § i tillämpliga delar.

    Vid ändring av en bolagsordning eller av en beviljad koncession
    gäller <link section="3">3 §</link> eller <link section="4a">4 a §</link> i tillämpliga delar.

The test runner then becomes trivial:

def runtest(filename,verbose=False,quiet=False):
    (test,answer) = open(filename).read().split("\n\n", 1)
    p = LawParser(test,verbose)
    res = p.parse()
    if res.strip() == answer.strip():
        print "Pass: %s" % filename
        return True
        print "FAIL: %s" % filename
        if not quiet:
            print "----------------------------------------"
            print "EXPECTED:"
            print answer
            print "GOT:"
            print res
            print "----------------------------------------"
            return False

Similarly, the code to test Law._txt_to_xml() is also
pretty trivial. There are two differences: Since the indata is
larger and already split up in paragraphs, the indata and expected
result for a particular test is stored in separate files. This
also lets me edit the expected results file using nXML mode in

Comparing two XML documents is also a little trickier, in that
they can be equivalent, but still not match byte-for-byte (since
there can be semantically insignificant whitespace and similar
stuff). To avoid getting false alarms, I put both the expected
result file, as well as the actual result, trough tidy. This
ensures that their whitespacing will be equivalent, as well as
easy to read. Also, a good example of piping things to and from a
command in python:

def tidy_xml_string(xmlstring):
    """Neatifies a XML string and returns it"""
    (stdin,stdout) = os.popen2("tidy -q -n -xml --indent auto --char-encoding latin1")

If the two documents still don’t match, it can be difficult to
pinpoint the exact place where they match. I could dump the
results to file and run command-line diff on them, but since there
exists a perfectly good diff implementation in the python standard
libraries I used that one instead:

    from difflib import Differ
    differ = Differ()
    diff = list(, answer.splitlines()))
    print "\n".join(diff)+"\n"

The result is even easier to read than standard diff output, since
it points out the position on the line as well (maybe there’s a
command line flag for diff that does this?):

      suscipit non, venenatis ac, dictum ut, nulla. Praesent
-   <section id="1" element="2">
?                   ^^^

+   <section id="1" moment="2">
?                   ^^

      <p>Sed semper, ante non vehicula lobortis, leo urna sodales
      justo, sit amet mattis felis augue sit amet felis. Ut quis

So, that’s basically my entire test setup for now. I need to build
more infrastructure for testing the XSLT transform and the HTML
parsing code, but these two areas are the trickiest.

Since I can run these test methods without having a expected
return value, they are very useful as the main way of developing
new functionality: I specify the indata, and let the test function
just print the outdata. I can then work on new functionality
without having to manually specifying exactly how I want the
outdata to look (because this is actually somewhat difficult for
large documents), I just hack away until it sort of looks like I
want, and then just cut’n paste the outdata to the ”expected
result” file.

Quickies of the day

  • Anil John writes about developing ASP.NET applications that run under Partial Trust. The whole Code Access Security framework in .Net is a complex beast, and I fear that most developers never will learn enough to actually use it properly, leaving them with applications that appear to be secured against malicious in-process code, but still can be vulnerable to ”luring attacks”. And if you let a single malicious assembly run with FullTrust, it’s Game over for your entire host process, as explained by Keith Brown in Beware of Fully Trusted Code. As Anil says, chapter 6-9 in Improving Web Application Security: Threats and Countermeasures is recommended reading. As a sidenote, are there any MVP’s that specialize in Code Access Security?
  • Tim Bray writes about the higher level web services specifications, and how the law of leaky abstractions work against them. ”[…]; applications that try to abstract away the fact that they’re exchanging XML messages will suffer for it”
  • Anil Dash warns against yet another scenario where Word’s ”Track Changes” feature can come back and bite you in the ass. I once recieved a press release in .doc format that had Track Changes enabled in such a way that they didn’t show up on screen, but did when you printed it. Oops indeed.
  • Jon Udell observes that developers still have a lot to learn when it comes to internationalizing applications, and compares us with 13-th century French Artisans. I don’t think I have linked to Joel Spolsky’s excellent Unicode primer yet, and even if I have, its such a recommended reading that I should do it again. I did a small project involving UTF-8 to Windows-1256 (Arabic) conversion on a low level a while ago, and it was most illuminating.
  • My column on the Smalltalk heritage on IDG has spawned a small debate about ”industry languages” such as Java and C# compared to more dynamic, ”cutting edge” languages like Smalltalk and Python. My take on the debate is that if you want to get stuff done togheter with other developers that may not be on the same level as you, C# and Java will get you there with the lowest amount of risk. For single-developer projects, or for small projects that everyone involved are really bright, Python and similarly dynamic languages (including Smalltalk, Lisp/Scheme, and even Perl) can get you there faster, while allowing you to have more fun along the way.
  • Ted Neward (By the way, it’s cool that a MVP’s RSS feed URL ends in .jsp :-)is involved in a debate over a set of security guidelines (subscription required) published in Java Developers Journal. Ted observes that for many of threats that the guidelines seek to guard against to even be theoretically exploitable, the attacker already must have greater access than he stands to gain by exploiting the vulnerability. This observation is similar to Peter Torr’s that VBA and Outlook’s object model does not really increase the attack surface, since, for an attacker to make use of them, he must already have full access to the machine: ”The problem isn’t that you have knives or saucepans or shoes in your house; it’s that the burglar keeps getting inside!”
  • Cedric Beust puts his money where his mouth is; disappointed by JUnit, he writes his own testing framework, TestNG.
  • Brad Adams gets DDJ to allow republising Steven Clarke’s article on Measuring API Usability.

Quickies of the day

  • Jiri has an interesting comparison between the state of infrastructure security as opposed to application security.
  • Michael Howard has the slides from what appear to be an excellent presentation about Secure coding issues up (by way of Sergey Simakov
  • The widely-talked-about paper from Paul Watson on the TCP reset vulnerability that threatened to destroy the internet last week is now online.
  • Charles Miller discusses where bugs come from, and why unit testing only will catch a part of them.
  • Mr Ed from Hacknot asks all developers to spare a thought for the next guy that will change your code — it could be you.

Also, with all the recent book reviews all over the .Net blogosphere, I broke down and went crazy on Amazon. The following books should soon be here: