Macfilsystem och unicode

Jag är, om inte en varm förespråkare för svenska språket, så åtminstone militant förkämpe för våra svenska tecken. Att se ett filnamn som raksmorgas.txt eller en URL i stil med http://foo.com/overgangsbestammelser gör ont i mig. Så när jag började med etikettexperimentet ville jag naturligtvis bevara svenska tecken både utåt och inuti systemet.

Just nu ser det faktiskt inte så bra ut utåt, då det står http://lagen.nu/etikett/s%C3%A4rskiljningsf%C3%B6rm%C3%A5ga i webläsarens URL-rad när det borde stå http://lagen.nu/etikett/särskiljningsförmåga — en bieffekt av spännande UTF-8-problem på Win32-versionen av Apache, vilket jag kanske berättar mer om nån annan gång — men inåt är det ganska fint.

Eftersom allting som kan vara en statisk fil är det på lagen.nu så sparas naturligvis ovanstående websida som
index/tags/särskiljningsförmåga.html i filsystemet. Filsystemet är, i det här fallet, ett NTFS-filsystem. Jag gör hela genereringen av alla statiska sidor på en windowsmaskin, nämligen, eftersom det är det enda fartmonstret jag har hemma (och det tar ändå flera timmar). Sen, när allt är klart, rsync‘ar jag över allt till den Mac OS X-maskin som driftar själva lagen.nu.

Och det var här nånstans problemen började. Att överföra ”särskiljningsförmåga.html” från Windows till MacOS X över SMB (den ”vanliga” fildelningen) gick utmärkt, men med rsync fick jag det obegripliga felmeddelandet ”invalid argument”.

Det visade sig att Mac OS X-filsystemet internt lagrar filnamn i ”Normalization form D” — en unicodevariant på att representera tecken genom, att i fallet för bokstaven ‘ä’, först ange ‘a’ följt av U+0308 aka COMBINING DIARESIS, dvs två fristående prickar, avsedda att kombineras med föregående tecken. Det är inte nödvändigtvis så tokigt — jag matar in ‘ü’ på den här datorn genom att först trycka på ”två fristående prickar”-knappen, sedan ‘u’, en snarlik tankeprocess. Men det verkar lite bakvänt att lagra saker på det viset i filsystemet. Det verkar riktigt bakvänt att inte ens tillåta API-anropen för att skapa filer att använda alla giltiga unicodetecken. Och det verkar urbota tokbakvänt att skeppa MacOS X med en rsync som inte kompenserar för det här — jag vet inte om rsync-protokollet stödjer en angivelse av teckenuppsättning för filnamn, men i brist på annat borde man väl anta UTF-8 eller åtminstone Latin-1. Jag menar, det funkar finfint med den samba som levereras med systemet, som sköter den vanliga fildelningen.

Men så är det, så vad gör man? Man skriver ett litet fulhack som transformerar
särskiljningsförmåga.html till särskiljningsförmaÌŠga.html (den UTF-8-kodade representationen av det Normalization Form D-omvandlade filnamnet, sett genom Latin-1-glasögon), rsyncar, problem solved!

Svenska tecken funkar utmärkt på MacOS X vanligtvis — ända tills man gör något som inte riktigt förutsetts. Exempel: skapa en fil räksmörgås.txt i Finder, byt sen till Terminalen och prova följande:

[staffan@minimac tmp]$ ls r*.txt
-rwxr--r--  1 staffan  staffan  10  4 Sep 17:28 ra??ksmo??rga??s.txt*

Ett utmärkt exempel på en läckande abstraktion.

Imorgon ska jag tala om hur man inte ska designa sina RSS-feeds. Jepp, även de måste designas, men kanske inte på det sätt man först tänker.

Smalltalk and Seaside for web applications

So, after debating at length with Göran about pros and cons of Smalltalk as a platform for real world development, I came to the conclusion that Smalltalk was worth looking into in greater detail, and since I’ve been wanting to write my own blog/wiki hybrid, it was suggested that I check out Seaside, a Smalltalk-based framework for web development comparable to ASP.NET or Java Server Faces.

At first look, Seaside doesn’t look much different than ASP.NET. It’s all about modelling your application’s interface in terms of objects and methods (”messages” for you Smalltalkers). Pages are built up of components, inheriting from System.Web.UI.Control (ASP.Net) or WAComponent (Seaside) that can include other components. When the user does things with any component, it results in events being fired (ASP.NET) or messages being sent (Seaside). Both frameworks seem to strive to abstract away the request/response nature of the web, and to allow the programmer to use a more event-driven approach to developement. In addition, seaside uses at it uses continuations to make it possible to, for example, ask the user something (similar to how a modal dialog would do it in a normal GUI enviroment), and then do something with the provided answer — all within the context of a
method.

The main difference is that programming in the Seaside framework results in a lot less housekeeping code. The object is really a ordinary object, except that it’s executed through the web. The difference didn’t really dawn on me before I tried to recreate WACounter as a ASP.Net Server component — it did involve a whole lot of code to handle events, manage viewstate and so on.

From a security perspective, Seaside has problems with the fact that the session id is present in the URL, and it seems harder to make applications RESTful, but apart from those issues, Seaside is definitly a framework worth looking closer at. From my own perspective, I hope that the ASP.NET developers do 🙂

Unfortunately, things elsewhere has been chaotic (and not in a good way), so I have not had any further time for experimentation with Seaside. As always, further progress will be reported here.

Update: This is a much better explaination of what continuation-based, or synchronous, web programming is.

Quickies of the day

  • Anil John writes about developing ASP.NET applications that run under Partial Trust. The whole Code Access Security framework in .Net is a complex beast, and I fear that most developers never will learn enough to actually use it properly, leaving them with applications that appear to be secured against malicious in-process code, but still can be vulnerable to ”luring attacks”. And if you let a single malicious assembly run with FullTrust, it’s Game over for your entire host process, as explained by Keith Brown in Beware of Fully Trusted Code. As Anil says, chapter 6-9 in Improving Web Application Security: Threats and Countermeasures is recommended reading. As a sidenote, are there any MVP’s that specialize in Code Access Security?
  • Tim Bray writes about the higher level web services specifications, and how the law of leaky abstractions work against them. ”[…]; applications that try to abstract away the fact that they’re exchanging XML messages will suffer for it”
  • Anil Dash warns against yet another scenario where Word’s ”Track Changes” feature can come back and bite you in the ass. I once recieved a press release in .doc format that had Track Changes enabled in such a way that they didn’t show up on screen, but did when you printed it. Oops indeed.
  • Jon Udell observes that developers still have a lot to learn when it comes to internationalizing applications, and compares us with 13-th century French Artisans. I don’t think I have linked to Joel Spolsky’s excellent Unicode primer yet, and even if I have, its such a recommended reading that I should do it again. I did a small project involving UTF-8 to Windows-1256 (Arabic) conversion on a low level a while ago, and it was most illuminating.
  • My column on the Smalltalk heritage on IDG has spawned a small debate about ”industry languages” such as Java and C# compared to more dynamic, ”cutting edge” languages like Smalltalk and Python. My take on the debate is that if you want to get stuff done togheter with other developers that may not be on the same level as you, C# and Java will get you there with the lowest amount of risk. For single-developer projects, or for small projects that everyone involved are really bright, Python and similarly dynamic languages (including Smalltalk, Lisp/Scheme, and even Perl) can get you there faster, while allowing you to have more fun along the way.
  • Ted Neward (By the way, it’s cool that a MVP’s RSS feed URL ends in .jsp :-)is involved in a debate over a set of security guidelines (subscription required) published in Java Developers Journal. Ted observes that for many of threats that the guidelines seek to guard against to even be theoretically exploitable, the attacker already must have greater access than he stands to gain by exploiting the vulnerability. This observation is similar to Peter Torr’s that VBA and Outlook’s object model does not really increase the attack surface, since, for an attacker to make use of them, he must already have full access to the machine: ”The problem isn’t that you have knives or saucepans or shoes in your house; it’s that the burglar keeps getting inside!”
  • Cedric Beust puts his money where his mouth is; disappointed by JUnit, he writes his own testing framework, TestNG.
  • Brad Adams gets DDJ to allow republising Steven Clarke’s article on Measuring API Usability.