Lagen.nu behind the scenes
Now that lagen.nu has been out for some time, it might be a good idea to write down what I’ve learned from it so far, in blog form. Much of the discussion will be centered around python, a language I’m far from proficient in, but it’s possible that someone will learn at least something from it.
First, take a look at this post that explains what lagen.nu is, from a user perspective.
This post is about how the site is produced. When I started out, I had no clear idea of what I wanted to do, other than to download the text of all swedish laws and convert it to some sort of nice HTML. I knew I wanted to do as much as possible with static HTML files, and I had a hunch that XML would be involved in some way.
So, essentially, the code only needs to run off-line, with no GUI required.
I thought about doing this in C#, since it would be a good experience building project in a language for which expertise is highly sought after. But since I’m no longer programming for food (actually I am, for another four days, but still), I took the opportunity to do it in python, a language which I’ve always liked but never become friends with.
From a high level, the code does the following:
- Finds out what laws are available
- Downloads the law text HTML documents
- Converts the text to XML
- Transforms the XML to HTML
There are some extra steps involved in creating the front page, RSS feeds, and handling the verdicts database, but these are the main steps.
The result of the program is a tree with static HTML files, ready for deployment.
I started out by looking for a good Python IDE. I did not find it, and settled for Emacs with python-mode.
Once set up with a recent version of python-mode, properly configured, I had a nice light-weight development environment. Here’s my minimal configuration (this goes into your .emacs file):
(autoload 'python-mode "python-mode" "Python Mode." t)
(add-to-list 'auto-mode-alist '("\.py'" . python-mode))
(add-to-list 'interpreter-mode-alist '("python" . python-mode))
(setq py-python-command "C:\Python23\python.exe")
My code lives in classes, and to test things out, I have code at the end of the main code file that looks sort of like the following:
if __name__ == "__main__":
vc = VerdictCollection()
vc.get(2004,refreshidx=True)
(That is, if I want to test the get method of the VerdictCollection class). To test the code, I just press C-c C-c in the python editor window. The entire python buffer gets sent to the python shell, and the last part (after if __name__ == “__main__”:) executes.
Things that are good about this environment:
- Free, in both senses of the word
- The intendation support really works, which is quite important with python
- Reasonably fast edit-run cycle
- The interactive python shell
Things that are bad:
- I can’t debug stuff. It seems like it should be possible, but I have no pdb.exe, which seems to be a requirement. In particular, it would be nice to be able to automatically start debugging when an unhandled exception is raised.
-
Copy and paste from the *Python* buffer has character set
problems. For example, if my code outputs a § sign, and I cut’n
paste it into another file, emacs will complain:
These default coding systems were tried: iso-latin-1-dos However, none of them safely encodes the target text.
This is bogus, since the § sign is perfectly legal in latin-1.
I use the standard python.org distribution of Python 2.3 (I haven’t gotten around to upgrading to 2.4 yet), not the ActiveState one. I tried it, and like the fact that the win32com module is bundled, but the python.org version is a leaner download and has a more usable HTML help application (particularly the good index).
To get a grip of how to do things with python, I’ve used the online version of Mark Pilgrim’s Dive Into Python, as well as the Python cookbook. This, together with the reference manual, (the eff-bot guide to) The Standard Python Library and Text Processing in Python has been all I need so far.
Tags: lagen.nu, programmering, python