Is the 27 club a myth? Maybe not.

A recent study published in the British Medical Journal, widely linked (NYT, Jezebel, Andrew Sullivan) claims that there is no statistical support for the assumption that rock stars die more frequently at the age of 27 (aka the 27 club meme)

However, the study used a sampling scheme that excluded four of the most well known members of the 27 club (Robert Johnson, Jimi Hendrix, Janis Joplin and Jim Morrisson). Furthermore, it only included 71 musicians that actually died. This seems to me to be a very small sample. Can we procure a better sample?

Using DBPedia (the structured database containing facts gathered from Wikipedia), I selected a list of dead musicians that were born after 1900 using a somewhat simple SPARQL query:

SELECT DISTINCT ?person ?birth ?death WHERE {
     ?person dbo:birthDate ?birth .
     ?person dbo:deathDate ?death .
     ?person rdf:type ?musician
     FILTER (regex(str(?musician), "Musicians"))
     FILTER (?birth >= "1900-01-01"^^xsd:date) .

This yields a list of almost 2500 persons. When plotting a histogram for the age of these musicians, we get the following view:

It seems to me that there is a small but significant spike at 27 (154 % increased chance of death compared to the year before), with a secondary spike at 32 (78 % increased chance). This shows the value of selecting a good sample when using statistical analysis.

Furthermore, it shows the value of structured data. I fixed the above charts by fiddling with SPARQL queries for half an hour, then about an hour of fiddling with Excel. I believe my criteria (”Dead musician famous enough to have a Wikipedia article”) is better than the one used in the study (”Artists with a number one UK album”). But if you disagree, you can easily modify the SPARQL query to work with a different study sample. Let me know your results!