The legality of screenscraping

Niklas Lundblad directs
me
to a couple of interesting propositions about pending laws
regarding computer crime (one of them actually uses the phrase ”crimes
in cyberspace” — very 1995’ish retro). Both Ds 2005:5 and Ds 2005:6 are
intended to be the first steps in implementing recent EC legislation
(particularly ”Convention
on Cybercrime ETS no.:185
”) in swedish law.

Unfortunately, both documents are very thin when it comes to
defining what should be regarded as ”illegal access to information
systems”. Swedish law has not been well defined in this area
before, and I was hoping that maybe the lawmakers would take the
opportunity to clarify this.

The issue I’m mostly concerned with is screen
scraping
. I like screen scraping. I think screen scraping is
cool. I wrote my first screen scraping program the same week I got my
first job, almost ten years ago (it was a simple script to
automatically download the latest Dilbert cartoon and email it as an
attachment to myself).

Now, it’s easy to see why a content provider would be opposed to
screen scraping. When I got my Dilbert strip in my inbox, there was no
advertising attached to it, thus I was depriving Unitedmedia of ad
revenue.

Years later, I was involved in the XMLTV project as I was
playing around with a homebrew HTPC. For Swedish TV listings, there
was a simple program that would fetch TV listings from dagenstv.com and
re-format them into the XMLTV format. One day, their service started
to serve up a very hostile-worded text file when I ran this
program. Basically, dagenstv.com had changed their web server
configuration so that requests from a certain User-agent (our
screen scraper program), would get a stern warning that we were doing
illegal things and that our IP adress had been logged, or something
like that.

Now, were we doing something illegal? Keep in mind that each user
would run this program on his or hers individual computer; we never
redistributed the content. Wheter or not we could have done that
legally in Sweden is another question, one that maybe could be
answered by pondering URL 49
§
and related materials. It’s an interesting question in it’s own
right, just not the subject of today’s blog post.

Was the mere act of accessing the site with a different tool than
the site owner intended, thus gaining access to digital data in a
non-approved way, illegal? For me as a programmer, this feels like an
absurd question. I’m only sending humble GET
requests
, if the site owner doesn’t want me to have the
information, then don’t send it! But with my legal student-glasses on,
this could be considered as computer infringement, as per the wording
in BrB 4:9 c: ”Den som
[…] olovligen bereder sig tillgång till upptagning för automatisk
databehandling […]döms för dataintrång till böter eller fängelse i
högst två år.”
. (A rough translation would be ”Someone who gets
hold of a recording for automatic computer handling without permission
is to be sentenced for computer infringement to fine or prison for no
more than two years”
).

Dagenstv.com could be said to have given users with normal web
browsers implicit permission to access the data, but probably not to
us with our screen scraper. If we had asked the site owners, they
would very likely had said ”no”, and therefore, they could well argue
that we were getting hold of a ”recording” witout permission.

(As an aside: the use of ”recording” (”upptagning” in swedish) in
the quoted law text is interesting in it’s own right — the
legislation was originally written with telephone wiretapping, opening
of letters, and similar things in mind, then ”adapted” (used in the
loosest of senses) into the digital age.)

I would prefer that questions like these were solved by technical,
not legal, means. Dagenstv.com used one such mean (the User-agent
discrimination) to block our screen scraper. We could have changed our
program to masquerade as a normal Internet Explorer browser, but that
would only escalate into a pointless arms race. Someone wrote a
different script that fetched the data from another site instead, and
that was the end of it. Furthermore, if we had bypassed dagenstv.com’s
User-agent check, we would have essentially said ”Even though we’ve
been told in no uncertain terms that what we’re doing is not permitted
by the site owners, we’re choosing to ignore that and circumvent the
access control” — if we had done that, dagenstv.com would really be
right in saying we were getting hold of data without permission.

But there’s a lot to be said for screen scraping. lagen.nu could not exist without screen
scraping. A lot of really cool web services over the years have been
made possible by screen scraping. It has enabled loose
coupling
years before anyone had talked about web services. It’s
the basis for a lot of interesting research and data mining. And
sometimes it just enables plain cool stuff.

Furthermore, it would be wrong to assume that all content providers
are opposed to screen scraping. For example, what is the one thing
that distinguishes forward-thinking web companies? They provide API’s
to their services (Amazon,

Livejournal,
Yahoo, Google, Flickr), enabling
anyone to build cool applications on top of their data, just like we
wanted to build a cool HTPC application using data from
dagenstv.com. By providing API’s, smart web sites remove the need for
actual screen scraping (which, in all fairness, is a messy and seldom
very interesting technological challenge, and furthermore only a means
to an end), but enable and encourage the same kinds of applications.
These API’s (and the XML-RPC/SOAP-based underpinnings) did not emerge
from a vacuum. People have been screen-scraping Amazon.com for their
own little needs since it was launched. Smart service providers
realise that it’s better to work with all this creativity than against
it.

If web site providers choose to do what dagenstv.com did, then
fine. They’ve stated their intent, it’s their service, their rules,
they’re entitled to take their ball and go home. But before a site
owner puts such a block in place (which could also be done through a
robots.txt file), screen
scraping should in no way be considered unlawful computer
infringement.

It turns out that the EC
convention
that this new legislation is to implement provides for
these kinds of distinctions, under article 2 (my emphasis):

Article 2 – Illegal access

Each Party shall adopt such legislative and other measures as may be
necessary to establish as criminal offences under its domestic law,
when committed intentionally, the access to the whole or any part of a
computer system without right. A Party may require that the offence be
committed by infringing security measures
, with the intent of
obtaining computer data or other dishonest intent, or in relation to a
computer system that is connected to another computer system.

Knowingly circumventing a access control system by, for example,
changing the User-agent string, might be considered infringing
security measures (weak as they are), but an unassuming GET request
could, with this definition, never be considered illegal access. I
hope that Sweden takes this opportunity to better define what should
be considered illegal access.

Another aside: Since lot of my current activities, and thus my blog
writing, revolve around swedish law, it’s sometimes difficult to write
in English, as there are a lot of precise Swedish legal terms that I’m
not comfortable translating. For anyone versed in Swedish law, posts
about it in English is probably way harder to read. Furthermore, most
of these posts are probably of limited interest to non-swedes.

Therefore, I’m considering switching the language of this blog to
Swedish. If you don’t understand Swedish, but would like to continue
reading this blog, please say so in the comments. Thank you.

Terry Fisher: ”Promises to keep”

Lessig comments on
and recommends
Terry
Fisher
‘s book ”Promises
to keep”
, with the subtitle ”Technology, Law and the Future of
Entertainment”. Coming from a technology background and moving into
law, these discussions always interests me. The introduction and final
chapter is available
online
, and from what I’ve read so far it’s a very readable
discussion on the historic reasons for intellectual property rights
(including patents, trade secrets and copyrights), how digital
distribution changes the prerequisites for the existing laws, and how
new ways of looking at intellectual property can bring legislation
that encourages the creation of cultural content and innovations,
without restricting user’s rights.

It seems that Fisher’s advocating a sort of content
flatrate
solution, which I’ve previously written of as
unrealistic, but through a brief discussion around ”public goods”, he
argues his case very strongly. I will probably have more to say once
I’ve read the entire chapter.

Two more things on copyright and related laws

In regards to the previously mentioned controversy around The
Pirate Bay: The signature ‘Judas’, acting as legal counsel for TPB, argues that
providing a link (in this case a .torrent file) to copyrighted
material is not copyright infringement under Swedish law. He cites the
verdict NJA 1996 s. 79 (which isn’t available on the web, so I haven’t
read it) in his support.

However, from the context he gives, it seems
to my non-legally-trained eyes that the verdict NJA
2000 s. 292
concerns a case which is much more similar to the TPB
case. The verdict is about a case where a person provided links to MP3
files that were not hosted on his site (i.e. potential infringement of
the second degree), and it ended in partial victory for the
plaintiff. Any laywers well versed in Swedish copyright law reading
this, please feel free to weigh in 🙂

On a different, but sort-of-related note, I was browsing around Nicklas Lundbergs blog some
more, and found this
interesting paper
on the legality of search engines. Two examples
in particular that interested me:

First, the quote [my
translation]: ”It’s not impossible – with regard to the things
previously said – that [search engines] would never have come into
existance if somebody had first consulted a legal expert”
— food
for thought.

Secondly, the (un-tried) argumentation that
governement agencies should not try to restrict search engine access
to their websites (through robots.txt or similar means),
since it would violate the Swedish principle
of free access to public records
. It’s particularly interesting
since Rixlex, the main
repository of swedish law texts, does just that.

The GNU GPL, modifications and swedish copyright law.

Yesterday, I was involved in a discussion that started with the Affero General Public
License
, a modification of GNU GPL, indented to close the ASP
loophole
in GNU GPL v2. The ”ASP loophole” referst to the fact
that a Application service provider (ASP) can make the functionality
of a GPL’ed program available, without distributing the actual
program, through a Web UI or something similar.

The Affero GPL (and upcoming
GPL v3
) intends to close this by demanding (in 2 d) that, if the
software has functionality to allow users to download the source code
of that program, you may not remove this functionality.

Now, the GPL (both Affero and GNU v2) is not a binding license. It
even says so itself in section 5: ”You are not required to accept this
License, since you have not signed it”
. It’s power derives from
the fact that, unless you accpt it, normal copyright law applies,
which forbids you to redistribute software for which you do not hold
the copyright.

However, to exploit the Affero/GNU v3 GPL, you don’t have to
redistribute the software, just remove the feature that allows users
to download the source. And if you don’t accept the license, who’s
going to stop you? Section 5 of the Affero GPL goes on to say:
”However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License.”
(emphasis
mine)

So, the whole power of the Affero GPL hinges on the fact that,
under normal law, you do not have the right to modify a program for
which you do not hold the copyright. But is that really correct? I’ve
read and re-read the Swedish copyright law, and I cannot
find where it says that personal modifications are not allowed. In
fact, it explicitly allows the modification of programs and copying
(but not redistributing) the result in 26 g §, in certain
cases. The swedish copyright law is all about making copies and
the redistribution of copyrighted material, not modification-making.

So, under Swedish law, are you allowed to modify software for which
you do not have the copyright? If so (and I think that’s the case),
the raison d’etre for Affero GPL is null and void in
Sweden. Presumably, this is different under US copyright law.

If you know (or think that you know) that I’m wrong in the above
assumption, I’d love to hear about it, preferably with a reference to
a law and section, or reference to a precedential case. Or, you know,
just with a logical reasoning about why I’m wrong. Still no talkback
system, but I check my referrer logs, so if you blog about it, I’ll
read it. Or just email me at staffan@tomtebo.org

The one thing I can think of that would make modifications of a
software program illegal, according to Swedish law, is 12 §, which states that
one may not create copies of a computer program, not even for personal
use (as opposed to most other works such as books or music, where
personal copying is allowed). Maybe it can be argued that modifying a
program is, in effect, making a copy of it?