Smalltalk, risk and success stories

In the comments section to an earlier post  Göran Krampe brought up some excellent points, backed by experience, on my remark that more dynamic languages increases risk in projects, especially if all developers in the project are not on the same level. Here are some of my comments on those, but please do read Göran’s entire comment for more context. Since I find the discussion to be very interesting, I thought I’d bring it up as a new article.

Just read your article at IDG regarding Alan Kay/Smalltalk – and it was good. For once not a single glaring misconception regarding Smalltalk 🙂 – I am a bit jaded by all people throwing around ”truths” about Smalltalk without having actual experience themselves. So kudos for that. 🙂

Thank you! However, I must confess that my Smalltalk experience is limited to playing around with Squeak in my spare time, so I have never used it in a large project. Bear this in mind when you read my response…

Above though it sounds like you are saying that Java would mean a lower risk than Smalltalk for larger projects or projects in which the developers are of ”average” or ”mixed” level. That I definitely do NO agree with on the other hand. 😉

[…]

So my experience is the exact opposite. I claim that less experienced developers can more easily be made productive in Smalltalk than in Java. I have repeatedly taught OO *and* the basics of Smalltalk in a single day including practice and then had these pupils find bugs and complement a working small system the next day.

I suspected that a smalltalker might not agree with that 🙂 Please not that I’m only talking about risk, not programmer effiency. Risk isn’t inherently bad, since it usually goes hand in hand with opportunity. Your points about new programmers learning Smalltalk quicker than Java is obviously backed up by experience, but a little beside the point.

The reason that I feel projects done in more dynamic environments (I’m going to lump Smalltalk, Python and Lisp together here, which might not be entirely fair) to have a higher level of risk is that the dynamic properties of the language will allow your system to take on aspects of a domain specific language, especially if some members of your team understands the Zen of Smalltalk/Lisp/Python. For example, in both Lisp and Smalltalk you can essentially redefine parts of the syntax, like the if/then/else statements, and if it makes sense in your design, you might very well want to do that. But if another programmer, who has yet to be enlightened (sticking with the Zen theme) on the expects the syntax to work like default, this might trip him up (Operator overloading in C++ and .Net gives us essentially the same problem). This risk increases if the programmer has been taught more traditional imperative programming in the style of Java/C# and have not been given a good introduction to Smalltalk.

Things like tail recursion, higher order functions, metaclasses and even polymorphism cam be difficult to wrap your head around, especially since you can get a lot of stuff done without them, and in not-so-dynamic environments, you usually do.

Or to put it more soundbite-friendy way: The more dynamic and open-ended a language is, the more choices you have at any given time. Every choice introduces opportunity — and risk.

However, since I haven’t done any large projects in either Smalltalk or Lisp (I did a elevator control system in Scheme in school, though :-)), maybe my opionions should be taken with a grain of salt. I’d be happy to hear your opinions on why the things I’ve mentioned above aren’t such big risk factors after all.

Also, the idea that there is something inherent in Smalltalk making it unsuitable for large scale development is simply not true either. And there is ample evidence for that.

There are very large systems having been built in Smalltalk with great success over a long period of time. And I mean *large*. 14000 classes, 65 developers/staff, 500 users etc, see for example the ”Kapital” system at JP Morgan Bank, https://secure.cwheroes.org/briefingroom_2004/pdf_frame/index.asp?id=4909)

I don’t think I’ve implied anything like that. It is interesting to hear about these big successful projects, but I keep wondering why we don’t hear more about them. Paul Graham published an essay about how good Lisp was for building ViaWeb, and most people that are familiar with XP has at least heard about the C3 project, but there must be more stories, right?

The point of my column that started the debate was that while many of the inventions and discoveries that come from the Smalltalk community are now used everywhere (MVC, JIT, design patterns…, XP), the language itself isn’t. I don’t believe it’s as simple as Sun and Microsoft having a larger marketing budget, but I’m very interested in a discussion of what the real reason could be, as I still think there are more that the general programmer community could learn from Smalltalk.

Making music on a computer, from a programmers perspective

I spent most of yesterday’s evening setting up Cubase SE after my harddrive crash. I went through all free CD’s that I got with the issues of swedish computer music magazine Studio and discovered a bunch of interesting virtual effects and instruments.

A little background in how modern versions of Cubase (or indeed most any sequencer program works): The program mimics a analog multi-track recorder, to which you can record analouge sound sources (in my cases an electric guitar and an electric bass — soon, vocals as well). There’s nothing magical about the way the audio is recorded — basically they just end up as high-resolution WAV files on your harddrive.

In addition, you can record or program MIDI as well. A midi file is essentially a bunch of messages arranged on a time scale, the most important being Note on (that has some parameters, like pitch and velocity) and Note off. MIDI is about 20 years old as a standard now, but it’s surprisingly versatile in a computer recording environment.

In the old way of doing things (like 5 or 10 years ago), these MIDI messages were sent to an external unit of some kind — a synthesized/sampler/workstation/sound module or some other box outside of the computer that made some kind of sound, that was then recorded more or less like an analouge source (like a electric guitar). Nowadays, it’s feasible to simulate all kinds of sound-generating boxes inside of the sequencer program in form of plugins. In Cubase lingo, such a plugin is called a Virtual instrument, or VSTi (Virtual Studio Technology instrument), and it’s essentially a multithreaded win32 dynamically loaded library, compatible with C++ calling conventions. When writing a VSTi, you inherit from a base C++ class and override a bunch of methods. Basically, one method that the host application calls to notify you that there is a incoming MIDI message, and another that the host application calls to find out how the instrument sounds. Your plugin renders the sound, based on incoming MIDI data, in the form of an array of doubles that represent the sound (typically 44100 doubles each second) — essentially PCM samples.

Now, MIDI messages can be more than just Note on and note off. There’s a whole class of messages that are used for parametrizing various aspects of a electronic instruments sound generation — things like changing the cutoff frequency or the resonance. If you’ve ever twiddled on a Roland TR303 or it’s virtual cousin Rebirth you know the kind of things that can be tweaked, and what kind of sounds can come as a result. Each VSTi can choose if and how to map these messages to some form of internal state.

There are some additional methods that are used for the host application to query and set the state of the instrument (for things like changing the sound preset, or manipulating parameters that can’t be changed by MIDI messages), but essentially that’s the entire interface — it’s less than 10 methods, most with very very simple signatures. But that’s enough for building an amazing amount of virtual instruments — every known technique for generating sound (playing samples, frequency modulation, simulating analog circuits, and so on) has been encapsulated in the form of VSTi’s. I used to play with hardware synthesizers and ”music workstations” back in the day, and it’s so much more flexible and downright fun to work with virtual instruments.

There is a similar (and older) API for writing effect plugins, like reverb and delay — override some methods, recieve a bunch of floats and tweak them to produce the sound you want — enabling you to virtualize a bunch of effect boxes, and even simulate complex things like the sound of a Marshall tube amplifier playing through a 4×12 speaker.

My favorites so far:

  • sfz – a soundfont player. A soundfont is basically a bunch of audio samples with some metadata that tells which samples that corresponds to what notes and velocity ranges. This is a common format for free sample collections.
  • crystal – a polyphonic synthesizer with really good presets
  • creakbox – sounds like a Roland TR303, including all the knobs to tweak.
  • Lallapallooza lite – great for more noisy sounds
  • Little green amp II – a guitar amplifier simulator

Quickies of the day

  • Anil John writes about developing ASP.NET applications that run under Partial Trust. The whole Code Access Security framework in .Net is a complex beast, and I fear that most developers never will learn enough to actually use it properly, leaving them with applications that appear to be secured against malicious in-process code, but still can be vulnerable to ”luring attacks”. And if you let a single malicious assembly run with FullTrust, it’s Game over for your entire host process, as explained by Keith Brown in Beware of Fully Trusted Code. As Anil says, chapter 6-9 in Improving Web Application Security: Threats and Countermeasures is recommended reading. As a sidenote, are there any MVP’s that specialize in Code Access Security?
  • Tim Bray writes about the higher level web services specifications, and how the law of leaky abstractions work against them. ”[…]; applications that try to abstract away the fact that they’re exchanging XML messages will suffer for it”
  • Anil Dash warns against yet another scenario where Word’s ”Track Changes” feature can come back and bite you in the ass. I once recieved a press release in .doc format that had Track Changes enabled in such a way that they didn’t show up on screen, but did when you printed it. Oops indeed.
  • Jon Udell observes that developers still have a lot to learn when it comes to internationalizing applications, and compares us with 13-th century French Artisans. I don’t think I have linked to Joel Spolsky’s excellent Unicode primer yet, and even if I have, its such a recommended reading that I should do it again. I did a small project involving UTF-8 to Windows-1256 (Arabic) conversion on a low level a while ago, and it was most illuminating.
  • My column on the Smalltalk heritage on IDG has spawned a small debate about ”industry languages” such as Java and C# compared to more dynamic, ”cutting edge” languages like Smalltalk and Python. My take on the debate is that if you want to get stuff done togheter with other developers that may not be on the same level as you, C# and Java will get you there with the lowest amount of risk. For single-developer projects, or for small projects that everyone involved are really bright, Python and similarly dynamic languages (including Smalltalk, Lisp/Scheme, and even Perl) can get you there faster, while allowing you to have more fun along the way.
  • Ted Neward (By the way, it’s cool that a MVP’s RSS feed URL ends in .jsp :-)is involved in a debate over a set of security guidelines (subscription required) published in Java Developers Journal. Ted observes that for many of threats that the guidelines seek to guard against to even be theoretically exploitable, the attacker already must have greater access than he stands to gain by exploiting the vulnerability. This observation is similar to Peter Torr’s that VBA and Outlook’s object model does not really increase the attack surface, since, for an attacker to make use of them, he must already have full access to the machine: ”The problem isn’t that you have knives or saucepans or shoes in your house; it’s that the burglar keeps getting inside!”
  • Cedric Beust puts his money where his mouth is; disappointed by JUnit, he writes his own testing framework, TestNG.
  • Brad Adams gets DDJ to allow republising Steven Clarke’s article on Measuring API Usability.

This month’s IDG.se column: Giving props to Smalltalk

Alan Kay recently won the Turing award for his work on Smalltalk and subsequent projects. I spent some time with the free smalltalk implementation Squeak this weekend, and was particularly impressed with the very dynamic nature of things — being able to modify the actual root Object is not something that’s common in more mainstream environments. Felt very much like a Lisp environment like Emacs, but object oriented. Anyway, it all led to me writing a column on Smalltalk and it’s impact for IDG.se. As usual, in Swedish.

Update: Oh, look, it made the frontpage

Quickies of the day

  • Jiri has an interesting comparison between the state of infrastructure security as opposed to application security.
  • Michael Howard has the slides from what appear to be an excellent presentation about Secure coding issues up (by way of Sergey Simakov
  • The widely-talked-about paper from Paul Watson on the TCP reset vulnerability that threatened to destroy the internet last week is now online.
  • Charles Miller discusses where bugs come from, and why unit testing only will catch a part of them.
  • Mr Ed from Hacknot asks all developers to spare a thought for the next guy that will change your code — it could be you.

Also, with all the recent book reviews all over the .Net blogosphere, I broke down and went crazy on Amazon. The following books should soon be here:

Microsoft PAG on ”Improving Web Application Security”

Anil John points me to the Microsoft Pattern and Practices site. I’ve stumbled over the ”Application blocks” examples that they have up once or twice, but I never went to their front page to see what it’s all about. I took a glance at the ”Improving Web Application Security: Threats and Countermeasures” guide, and… godDAMN this is a comprehensive guide (900+ printed pages, not much filler) to just about everything you need to know about secure web development on the Microsoft platform, including how to harden the base services (like IIS and MS SQL Server) your application uses. Much of the stuff (like the chapters on Code Access Security and Data access) is useful in non-web development as well. So far I’ve only skimmed through it, but it looks to be a must-read.

Fiddler – a HTTP debug proxy

I saw this on Scott Water’s blog: A HTTP debugging proxy called Fiddler, that sets itself up as a default system proxy and enables you to see every little detail about every request sent and response recieved to your computer, as well as intercept requests/responses and ”fiddle” with them. Great for troubleshooting and a bunch of other stuff as well. I’ve been holding on to a copy of Steve Genusa’s TelHTTP to be able to inspect headers and similar stuff from random webservers, but this does all that TelHTTP ever did, and so much more.

Works with IE, and Mozilla too, if you configure it to use localhost:8888 as it’s proxy.

Premature cliches are the root of all evil (and considered harmful)

Did you know that you can use wild cards in Google searches? Searching for ”premature * is the root of all evil” yields a lot of people quoting Knuth in an almost fanboy like fashion (though it seems that Tony Hoare is the actual source of the quote). I’d like to nominate ”Premature optimization is the root of all evil”, together with ”Goto considered harmful”, as the most tired and overrated quotes in computer science ever.

Don’t get me wrong, these quotes made a lot of sense thirty, or even ten years ago, but now that the ideas that they illustraded has been absorbed into the programming community, they do more harm than good. For example, Dijkstras paper titled ”Goto considered harmful” argued against the practices found in ALGOL and similar environments 36 years ago. Most common languages at the time had weaker support for structured programming, and the general consciousness about structuring code in the programming community was low, leading to ”spaghetti code” with goto’s all over, making it hard to figure out how the program actually worked. At that time, the condemnation of goto’s made sense.

Well, a lot have happened since that time, both language-wise and with programmer’s attitude towards the systems they’re building. ”Software ICs”, object-orientation, components, SOA… all leading to environments and programs that can grow quite large without collapsing under the weight of it’s own complexity (some say that we’ve exchanged the problems of ”spaghetti code” for that of ”ravioli code”, but that’s another discussion). When a programmer nowadays considers using ”goto”, it’s usually for situations where it actually makes sense (breaking out of nested loops, for example). Having the mantra ”Goto considered harmful” alive in the collective programming community consciousness actually makes things worse, as it discourages people for using the construct where it actually would make the program easier to read.

The quote about ”Premature optimization is the root of all evil” is more of a ”timeless” truth, but of course it’s basically a truism — premature anything, is by definition, never a good thing. The quote is rooted in a time where optimizing code would make it harder to read and maintain, and really is aimed against those micro-optimizations that make code harder to read (explicit loop unrolling, contrived program flow, unneccesary use of bitshifting and bitfields) without providing a huge performance benefit. However, this quote has been repeated so often, often in an authorative voice, that new programmers think that there’s something wrong with thinking about performance as you design and write the initial version of your program — fearing that it will lead to a design that is worse and a program that’s harder to maintain.

These fears do not match my experiences. I think it makes sense to think about performance through the design phase, particularly considering which pieces of code will be run often (I seem to recall statistics saying that 98% of a program’s running time is spent in 2% of it’s code), and what is needed to get those parts to run really fast. Of course, I don’t consider these optimizations to be premature.

I’ve seen cases where architectural descisions were made early on, without measuring performance, leading to a system that was not only slow, but basically impossible to optimize without rearchitecting. As these systems were built without much thought about performance, when the performance problems surfaced, it was basically too late. These scenarios are, in my experience, much more common than systems where optimization has made the code hard to read and maintain.

However, there are other stuff that often is done prematurely, and in my experience are a much larger source of evil — abstraction and/or generalization, things that are generally considered to be good things. New (or even experienced) programmers that are ambitious will often design overly generalized systems without being familiar enough with the problem domain to know which generalizations makes sense. Similarly, they will create abstractions through elaborate class hierarchies and interfaces with the intention that it will make the individual pieces of the system easier to understand, without realising that the abstractions get in the way of understanding the system as a whole.

Code that has been obfuscated by premature optimization attempts usually can be made more readable piece by piece, but an overengineered system with sophisticated class hiearchies that turned out not to match the problem domain is much harder to refactor into something useable. I wholeheartedly agree with this page on the original wiki that states:

It’s probably important not to create an abstract base class or an interface until you can think of at least two descendants or implementations — that you are actually going to use right now

When reading and understanding other peoples code, I find that it much more common to find it over-abstracted than under-abstracted. Maybe we need new quotes/cliches/memes to warn us of where the real risks actually are now, as opposed to where they were thirty years ago.

Next time, I’ll take apart the Fred Brooks quote ”Adding manpower to a late software project makes it later” 🙂