Lagen.nu behind the scenes

Now that lagen.nu has been out for
some time, it might be a good
idea to write down what I’ve learned from it so far, in blog
form. Much of the discussion will be centered around python, a
language I’m far from proficient in, but it’s possible that
someone will learn at least something from it.

First, take a look at this
post
that explains what lagen.nu is, from a user
perspective.

This post is about how the site is produced. When I started out, I
had no clear idea of what I wanted to do, other than to download
the text of all swedish laws and convert it to some sort of nice
HTML. I knew I wanted to do as much as possible with static HTML
files, and I had a hunch that XML would be involved in some way.

So, essentially, the code only needs to run off-line, with no
GUI required.

I thought about doing this in C#, since it would be a good
experience building project in a language for which expertise is
highly sought after. But since I’m no longer
programming for food
(actually I am, for another four days,
but still), I took the opportunity to do it in python, a language which I’ve
always liked but never become friends with.

From a high level, the code does the following:

  • Finds out what laws are available
  • Downloads the law text HTML documents
  • Converts the text to XML
  • Transforms the XML to HTML

There are some extra steps involved in creating the front page,
RSS feeds, and handling the verdicts database, but these are the
main steps.

The result of the program is a tree with static HTML files, ready
for deployment.

I started out by looking
for a good Python IDE
. I did not find it, and settled for Emacs
with python-mode.

Once set up with a recent version of python-mode, properly
configured, I had a nice light-weight development
environment. Here’s my minimal configuration (this goes into your
.emacs file):

(autoload 'python-mode "python-mode" "Python Mode." t)
(add-to-list 'auto-mode-alist '("\.py'" . python-mode))
(add-to-list 'interpreter-mode-alist '("python" . python-mode))
(setq py-python-command "C:\Python23\python.exe")
  

My code lives in classes, and to test things out, I have code at
the end of the main code file that looks sort of like the
following:

if __name__ == "__main__":
    vc = VerdictCollection()
    vc.get(2004,refreshidx=True)
  

(That is, if I want to test the get method of the
VerdictCollection class). To test the code, I just press
C-c C-c in the python editor window. The entire python buffer gets
sent to the python shell, and the last part (after if __name__
== "__main__":
) executes.

Things that are good about this environment:

  • Free, in both senses of the word
  • The intendation support really works, which is quite important with python
  • Reasonably fast edit-run cycle
  • The interactive python shell

Things that are bad:

  • I can’t debug stuff. It seems like it should be
    possible
    , but I have no pdb.exe, which seems to be a
    requirement. In particular, it would be nice to be able to
    automatically start debugging when an unhandled exception is
    raised.
  • Copy and paste from the *Python* buffer has character set
    problems. For example, if my code outputs a § sign, and I cut’n
    paste it into another file, emacs will complain:

    These default  coding systems were tried: 
    iso-latin-1-dos
    However, none of them safely encodes the target text.

    This is bogus, since the § sign is perfectly legal in latin-1.

I use the standard python.org distribution of Python 2.3 (I
haven’t gotten around to upgrading to 2.4 yet), not the ActiveState
one
. I tried it, and like the fact that the win32com module is
bundled, but the python.org version is a leaner download and has a
more usable HTML help application (particularly the good index).

To get a grip of how to do things with python, I’ve used the
online version of Mark Pilgrim’s Dive Into
Python
, as well as the Python
cookbook
. This, together with the reference manual, (the eff-bot
guide to) The Standard Python Library
and Text Processing in Python has
been all I need so far.