Tuesday, June 10, 2008

Catalogue of Life as a treemap

I have an "on again/off again" relationship with treemaps. Lately, I've been taking another look, partly inspired by Björn Engdahl's MSc thesis Ordered and Unordered Treemap Algorithms and Their Applications on Handheld Devices. He describes a simple treemap algorithm which he calls Split Layout. It has the nice properties of having a good aspect ratio (most cells in the treemap are approximately square) and it keeps the cells in roughly the original order. This later property is important as one thing I find distracting with tree diagrams is if the order of the objects in the tree keep changing.

I also have an "on again/off again" relationship with the Catalogue of Life, which is potentially very useful, but seems determined to undermine this with some poor design decisions. But, I finally bit the bullet and extracted a complete classification from the 2008 edition of the Catalogue of Life. I downloaded an ISO image, burnt a CD, installed it on a Windows box (gack), grabbed the MySQL database files, and put those on my MacBook Air. Using some tools I developed for working with the NCBI taxonomy, I wanted to extract the tree from the taxa table, only to discover that this table isn't a tree. Not all the taxa in the table are flagged is_accepted_name, and if you remove those, then the remaining taxa don't form a tree. It's clear that some taxa have been orphaned when the table was created. For example, Enteromorpha flexuosa is not an accepted name, and is flagged as such in the taxa table, yet it is has four child taxa that are accepted (Enteromorpha flexuosa subsp. linziformis, Enteromorpha flexuosa subsp. biflagellata, Enteromorpha flexuosa subsp. pilifera, and Enteromorpha flexuosa forma submarina). These taxa are orphaned in the tree. Eventually I gave up trying to extract the tree using SQL, and had to traverse the entire structure starting at the root node. This extracts a tree, at the cost of the orphans. It appears that Catalogue of Life haven't checked whether there classification is, in fact, a tree (OK, technically it is a forest as it is a set of disjoint trees comprising the eight kingdoms CoL recognises, but I make it a tree by rooting it on a node called "life").

After much anguish, I have a tree. I then coded up Engdahl's algorithm, based on the pseudocode he provides on p. 31 of his thesis (I think there's a bug in his code as he doesn't deal with the case when the cell being partitioned is narrower than it is wide, but this was easy to fix). One thing I was keen to do is just use HTML, no SVG or Flash. Here's an example of the treemap, showing the eight kingdoms. Each taxon is drawn proportional to log10(n + 1), where n is the number of terminal taxa (i.e., species or below) in that taxon (the number of terminals is shown in each cell). The log scale was chosen to avoid mega-diverse groups crowding out the smaller taxa.


Animalia 892,966

Archaea 281

Bacteria 9,588

Chromista 6,855

Fungi 33,017

Plantae 206,843

Protozoa 6,435

Viruses 1,906



The live version is here. It's a bit crude (to go back up the tree just use your browser back button), but it's simple, and it's HTML. The underlying code is PHP, but it would be quite easy to convert this to Javascript to make a simple drop in widget. In addition to Björn Engdahl's algorithm, and the Catalogue of Life data, I should acknowledge Samson's code for generating colour gradients.

There are all sorts of things that could be done to improve this. One approach would be to include exemplar pictures of the taxa in each cell, to help navigate in unfamiliar taxa. Denise Green and Rebecca Shapley's Teaching with a visual tree of life report has some examples of this idea (see their p. 86), and Marcos Weskamp (author of the very cool newsmap) has done a mockup for EOL using Flash.

As to the treemap idea itself, there are some fun things which could be done with it. I'm not convinced that it is great for navigation. However, it is probably very useful for showing changes over time. For example, imagine making the State of Observed Species report dynamic. Take the uBio RSS feed for new names, classify the new names, then colour the treemap cells by the number of new names (in a sense, this is a taxonomic version of newsmap).

4 comments:

Javier de la Torre said...

Hi,

I have been recently also playing with taxonomic trees and visualization. Look at the next post for an example of using the Iphone UI ideas together with images from google.

http://biodivertido.blogspot.com/2008/06/taxonomic-browser-in-flex.html

There is a nice component to create tree maps with Flex. I also tried it an agree with you that, apart from cool, i dont see it that much useful.

Great blog by the way!

David Cannatella said...

I had become interested in treemaps a couple of years ago, and hoped someone would find a realistically usable way of implementing a phylogeny using this metaphor.

Rod's example is clearly a step in the right direction...

sexy said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,A片,視訊聊天室,聊天室,視訊,視訊聊天室,080苗栗人聊天室,上班族聊天室,成人聊天室,中部人聊天室,一夜情聊天室,情色聊天室,視訊交友網

免費A片,AV女優,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,A片,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛

A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,avdvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX,成人論壇

情趣用品,A片,免費A片,AV女優,美女視訊,情色交友,色情網站,免費AV,辣妹視訊,美女交友,色情影片,成人網站,H漫,18成人,成人圖片,成人漫畫,成人影片,情色網


情趣用品,A片,免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,臺灣情色網,色情,情色電影,色情遊戲,嘟嘟情人色網,麗的色遊戲,情色論壇,色情網站,一葉情貼圖片區,做愛,性愛,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,美女交友,做愛影片

av,情趣用品,a片,成人電影,微風成人,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,愛情公寓,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,aio,av女優,AV,免費A片,日本a片,美女視訊,辣妹視訊,聊天室,美女交友,成人光碟

情趣用品.A片,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,色情遊戲,色情網站,聊天室,ut聊天室,豆豆聊天室,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,免費A片,日本a片,a片下載,線上a片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,成人文章,成人影城,成人網站,自拍,尋夢園聊天室

Андрей said...

Hello to all!
I was also playing with the catalogue of life. i'd like to build a hierarchical tree down to genera level. the autor writes: ".. After much anguish, I have a tree.". maybe someone could help me to get the data from the CoL or to build the tree. i crashed trying to get a table form the SQL of CoL. thanks