The Cookie directive and HTML5

In 2002, the European community introduced Directive 2002/58/EC, commonly known as the Directive on privacy and electronic communications. Amongst other provision, it has the subarticle 5(3) which has made it known as ”The cookie directive”, as the subarticle states that information may be stored in or retrieved from end user computers only if the user is made aware of this and is given the opportunity to refuse this storage or retrieving. In 2009 the directive was amended (2009/136/EC) so that storage or retrieval is only permitted if the user has given his or her consent.

The full text of the amended subarticle is as follows:

3. Member States shall ensure that the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned has given his or her consent, having been provided with clear and comprehensive information, in accordance with Directive 95/46/EC, inter alia, about the purposes of the processing. This shall not prevent any technical storage or access for the sole purpose of carrying out the transmission of a communication over an electronic communications network, or as strictly necessary in order for the provider of an information society service explicitly requested by the subscriber or user to provide the service.

This regulation is generally understood to apply to the HTTP State Management Mechanism (RFC 6265, earlier RFC 2965, RFC 2109), most commonly known as ”Cookies”. In fact, preamble 25 of 2002/58/EC and preamble 66 of 2009/136/EC explicitly mention cookies as one example of such mechanisms. National regulations and in particular guidelines have focused on this particular mechanism for storing and accessing information on end user computers over a network.

But the directive text can clearly apply to other mechanisms apart from HTTP Cookies. Among mechanisms that permit similar storage and retrieval of information are the Local Shared Object mechanism found in Flash, the userData functionality in Internet Explorer, and more recently, a variety of mechanisms being defined and implemented under the html5 umbrella.

Two questions are therefore interesting:

  1. Are there mechanisms in html5 that allow user tracking (including by third parties) in a way that is not subject to the consent requirement?
  2. Are there mechanisms in html5 that have no privacy concerns, yet is subject to the consent requirement?

The first question is the most sensitive, and the hardest to answer. But consider a javascript that is served by by a third-party ad network, and is included by a number of unrelated content sites. If such a script:

  1. Generates a local GUID on the client (ie an identifier that the ad network did not choose)
  2. Stores this GUID in local storage.
  3. Sends this GUID back to the ad network using a background XMLHTTPRequest (and, presumably some other information, such as the URL of the page embedding the script) to the ad network.

(Step 1-2 are skipped if the GUID is already present in local storage)

Such a script has the same ability to track a user’s movement across sites, and to assign a user (or rather his/her computer) a permanent identifier. But does it require consent according to article 5(3)? One way to argue that it does not, is to take note of preamble 66: ”Third parties may wish to store information on the equipment of a user, or gain access to information already stored, for a number of purposes”. It may be argued that step 1-2 does not mean that it is the third party (the ad network) that stores the information (indeed, the ad network does not know what information is stored). If the third party hasn’t stored the information, then the gaining of information in step 3 might not be be subject to the rule as well, since the wording seem to require that information gained by a third party must have been previously stored by the same party. (If there is no requirement that the information gained must have been stored by the same party, one must note that every third party whose resources are included by a web page automatically gains access to a lot of information, such as the User-agent string, and ask if that information gaining is subject to the directive as well).

I will concede that this argument is not strong, as it’s assumption that step 1-2 does not constitute information storage by the third party, when the third party is responsible for sending the javascript code that ultimately results in information being stored. It seems functionally equivalent to traditional HTTP Cookie-based storage of information. But the difference is that using this method, the third party does not specify what information should be stored. Could this not be significant?

The second question seems easier to answer. Consider Offline web applications. These are web pages that contain a reference to all resources (HTML, Javascript, CSS) they require in order to work. A browser supporting offline applications will download all these resources so that the application works even if there’s no internet connection. Note that if the browser does not support offline apps, they still work — they just require you to be online. A simple example containing a version of the Halma game is described by Mark Pilgrim.

This mechanism causes the storing of information on the end user computer. This storage is not strictly necessary in order to provide the service (remember, the app works without the mechanism if the user is online — offline support is just a nice-to-have). No information is ever accessed by the provider of the game, but this is not a requirement of the directive, storing of information is enough. Thus, consent is needed. And yet there are no privacy concerns (no personal identifiable information is ever retrieved).

The aim of article 5(3) was to regulate certain usages of cookies percieved to be illegitimate. But it was written to be technology neutral, as new techniques similar to HTTP cookies were sure to be created after the directive (The diabolical evercookie uses 12 additional mechanisms, including a brilliantly twisted way of storing information in the users browsers history of visited URLs). The problem is that such mechanisms are only similar, not identical. This make writing technology neutral legislation really difficult.

Thesis: Appendicies and backmatter

The rest of the thesis consists of two appendicies (firstly describing the system prototype in detail, including how to run it yourself, secondly describing the ”gold standard” tests we’ve evaluated the system against) and the bibliography.

That is all, for this time! If you’ve read all chapters so far, I’d really appreciate your comments and suggestions for improvement.

Download appendicies and backmatter here.

Thesis chapter 6: Conclusion and future work

After having described relevance and information retrieval in general and legal context, as well as reviewing previous work in the field as well as designing and evaluating a better relevance ranking method, are we done? No, we’ve only just started! Here are some pointers on how this approach might be improved.

Download chapter 6 here.

Thesis chapter 5: A prototype of a legal relevance function

Finally, this is the heart and soul of this thesis (even if it’s only a few pages). A system designed for better legal relevance ranking is described and evaluated. Although primitive, being based only on simple known link analysis algorithms, it seems to perform really good compared to traditional ranking methods.

Download chapter 5 here.

Thesis chapter 3: Information retrieval

Relevance can be interpreted in many ways, from subjective to objective. Which interpretations are built into traditional information retrieval systems, and what properties does these manifestations of relevance have? The use of IR for legal information has a long history. How does legal information retrieval correspond to the legal method, and can we improve on this correspondance, by e.g. creating a relevance ranking function more in line with what is considered legally relevant?

Download chapter 3 here.

Thesis chapter 2: The concept of relevance

In order to define a better relevance ranking method, we need to delve deep into what relevance really is, and what aspects of it we can measure in an information retrieval system. We also examine what relevance means in a legal context, and how it is connected to other concepts such as authority and what clues to relevance we can find in legal information.

Download chapter 2 here.

Thesis chapter 1: Introduction

The first chapter sets the scene by describing the basis for information retrieval systems, legal information and how it is used, as well as the motivation for improving the former so that we can use the latter better. It also contains a description of the method used in the thesis, as well as the general structure of it.

Download chapter 1 here.

Towards a theory of jurisprudential relevance ranking

My graduate thesis, somewhat loftily titled ”Towards a theory of jurisprudential relevance ranking – Using link analysis on EU case law” has been submitted to and approved by my supervisor. It has taken far too long time since I first started working on it, but I’m very satisfied that it is finally finished. Except that it’s not really finished, since I hope to re-work and extend it with the aim of publishing it in some other form. Which is why I’m soliciting feedback on it.

Over the coming week, I’ll be publishing a chapter at a time. Each chapter will be available in PDF form and also inline in the form of images. This since that was the best conversion to a web-friendly format I could manage… (also note that the pagination differs slighly between the PDF and the web version).

If you are at all interested in legal informatics, information retrieval, jurisprudence or just what we really mean when we say that something is relevant, I hope you will find the time to read the chapters and maybe also give me your feedback below.

We’ll be kicking of with the front matter of the thesis. It does not contain anything substantial in itself, but it has a very neat Gephi-drawn cover and some interesting quotes. The table of contents should give you an idea of what it is about.

Download the front matter here.

Snart dags att återvinna pappersböcker

I veckan släppte Amazon version 2 av sin ebokläsare Kindle. Den stora nyheten verkar främst vara förbättrad design, vilket kanske var rätt fokusering med tanke på att version ett var ful som stryk. Inte heller version 2 finns tillgänglig för oss svenskar, och anledningen till det är ganska obegriplig. Visst är det så att whispernet inte finns tillgängligt här, men det finns väl andra sätt att få in böcker i apparaten? Mina analoga bokhylla har klarat sig ganska bra utan trådlös nätverksaccess.

Och kanske är det här ett steg på vägen mot en ipodifiering av bokläsandet (kom ihåg att inte ens ipod var en succé från dag ett). Bakåtsträvare tjatar om att det ändå är något visst med att hålla en fysisk bok i händerna, känna sidorna under fingrarna och kunna anteckna klokskaper i marginalen, men för mig låter det som det där vurmandet för stora LP-omslag och lägga-nålen-på-skivytran-ritualen som man hörde strax innan CD:n slog ut vinyl.

På samma sätt kommer det bli med böcker – dagens exemplarkramande kommer att dö ut i takt med att verktygen för att läsa elektroniskt blir bättre. De praktiska fördelarna med att kunna bära med sig hela sitt bibliotek, kunna söka i böckerna, slippa den urtrista möbeltypen ”bokhylla”, och kunna dela med sig av sina böcker elektroniskt är helt enkelt för stor.

Och av det sista så förstår ni att det kommer bli ytterligare en upphovsrättsdebatt när gemene man börjar fildela PDF:er. Det får bli ett ämne för en senare postning, nu tänkte jag fokusera på varför du inte redan läser böcker på skärmen och när du kommer börja med det.

Jag använder ofta min tablet som ebokläsare. Eftersom skärmbilden kan roteras till stående nästan-A4 och upplösningen är närmare 150 DPI blir det ganska lättläst. Kombinerat med PDF Annotator i fullskärmsläge kan jag stryka under och anteckna i marginalerna bäst jag vill (anteckningarna är läsbara i vilken annan PDF-läsare som helst). Dock är en två kilo tung dator, som dessutom blir rätt varm, inte så bra rent ergonomisk, och en aktivt lysande skärm ger trots allt inte samma känsla som en bok. Lägg till det de distraktioner som trådlös internetåtkomst ger, så har vi en förklaring varför jag inte tycker den i alla avseenden är bättre än analoga böcker.

Dock är den ofta good enough – jag köpte för ett tag sedan ”Learning Drupal 6 Module development” från förlaget och fick då en PDF-version direkt, samt en fysisk bok någon vecka senare i posten. Jag använder oftare den elektroniska varianten eftersom den alltid är med och det rent ergonomiskare faktiskt är smidigare med en bok som är ett Alt-Tab bort, snarare än en som är en armlängd bort.

Men jag tror ändå att jag skulle gilla en läsare med e-bläck, dvs en reflektiv skärmtyp. Eftersom Amazon inte vill sälja till mig har jag tittat på vilka andra alternativ som finns. Många talar varmt om Hanlin V3 som har ungefär samma formfaktor som Kindle, minus tangentbordet. Själv är jag dock mest sugen på den större iRex DR1000S, som har en tiotumsskärm och anteckningsmöjligheter genom medföljande Wacom-penna. Kolla gärna igenom MobileReads utmärkta wiki med bland annat en jämförelseöversikt över befintliga modeller. Poängen med den större skärmen är att man faktiskt kan läsa de flesta PDF-filer på den med en hel sida i taget. Eftersom PDF är det klart dominerande formatet för allt som jag skulle vilja kunna läsa på en sån här pryl, det formatet är hopplöst knutet till ursprunglig sidstorlek, och den sidstorleken vanligtvis är A4 eller något av de konstiga amerikanska standardformaten, så kan det nog vara värt den större formfaktorn. Dock är DR1000S svindyr. Jag har en osviklig förmåga att pricka in köp av ny hårdvara ungefär ett halvår innan priserna rasar rejält (har hittils skett med modem, hemmabiosystem, TFT-skärm, MP3-spelare, DVD-brännare och den tidigare omtalade tablet-PC:n), så kanske bör jag hålla mig ett tag till.

Å andra sidan, nån måste ju vara early adopter…