I have an "on again/off again" relationship with treemaps. Lately, I've been taking another look, partly inspired by Björn Engdahl's MSc thesis
Ordered and Unordered Treemap Algorithms and Their Applications on Handheld Devices. He describes a simple treemap algorithm which he calls
Split Layout. It has the nice properties of having a good aspect ratio (most cells in the treemap are approximately square) and it keeps the cells in roughly the original order. This later property is important as one thing I find distracting with tree diagrams is if the order of the objects in the tree keep changing.
I also have an "on again/off again" relationship with the Catalogue of Life, which is potentially very useful, but seems determined to undermine this with some poor design decisions. But, I finally bit the bullet and extracted a complete classification from the 2008 edition of the Catalogue of Life. I
downloaded an ISO image, burnt a CD, installed it on a Windows box (gack), grabbed the MySQL database files, and put those on my MacBook Air. Using some tools I developed for working with the NCBI taxonomy, I wanted to extract the tree from the
taxa table, only to discover that this table isn't a tree. Not all the taxa in the table are flagged
is_accepted_name, and if you remove those, then the remaining taxa don't form a tree. It's clear that some taxa have been orphaned when the table was created. For example,
Enteromorpha flexuosa is not an accepted name, and is flagged as such in the
taxa table, yet it is has four child taxa that are accepted (
Enteromorpha flexuosa subsp.
linziformis,
Enteromorpha flexuosa subsp.
biflagellata,
Enteromorpha flexuosa subsp.
pilifera, and
Enteromorpha flexuosa forma
submarina). These taxa are orphaned in the tree. Eventually I gave up trying to extract the tree using SQL, and had to traverse the entire structure starting at the root node. This extracts a tree, at the cost of the orphans. It appears that Catalogue of Life haven't checked whether there classification is, in fact, a tree (OK, technically it is a forest as it is a set of disjoint trees comprising the eight kingdoms CoL recognises, but I make it a tree by rooting it on a node called "life").
After much anguish, I have a tree. I then coded up Engdahl's algorithm, based on the pseudocode he provides on p. 31 of his thesis (I think there's a bug in his code as he doesn't deal with the case when the cell being partitioned is narrower than it is wide, but this was easy to fix). One thing I was keen to do is just use HTML, no SVG or Flash. Here's an example of the treemap, showing the eight kingdoms. Each taxon is drawn proportional to log
10(
n + 1), where
n is the number of terminal taxa (i.e., species or below) in that taxon (the number of terminals is shown in each cell). The log scale was chosen to avoid mega-diverse groups crowding out the smaller taxa.
Animalia 892,966
Archaea 281
Bacteria 9,588
Chromista 6,855
Fungi 33,017
Plantae 206,843
Protozoa 6,435
Viruses 1,906
The
live version is here. It's a bit crude (to go back up the tree just use your browser back button), but it's simple, and it's HTML. The underlying code is PHP, but it would be quite easy to convert this to Javascript to make a simple drop in widget. In addition to Björn Engdahl's algorithm, and the Catalogue of Life data, I should acknowledge Samson's code for
generating colour gradients.
There are all sorts of things that could be done to improve this. One approach would be to include exemplar pictures of the taxa in each cell, to help navigate in unfamiliar taxa. Denise Green and Rebecca Shapley's
Teaching with a visual tree of life report has some examples of this idea (see their p. 86), and Marcos Weskamp (author of the very cool
newsmap) has done a
mockup for EOL using Flash.
As to the treemap idea itself, there are some fun things which could be done with it. I'm not convinced that it is great for navigation. However, it is probably very useful for showing changes over time. For example, imagine making the
State of Observed Species report dynamic. Take the
uBio RSS feed for new names, classify the new names, then colour the treemap cells by the number of new names (in a sense, this is a taxonomic version of
newsmap).