Friday, August 03, 2007

Visualising very big trees, Part II

OK, time to put my money where my mouth is. Here's a first stab at displaying big trees in a browser. Not terribly sophisticated, but reasonably fast. Take a look at Big Trees.

Approach
Given a tree I simply draw it in a predetermined area (in these examples 400 x 600 pixels). If there are more leaves than can be drawn without overlapping I simply cull the leaf labels. If there are internal node labels I draw vertical lines corresponding to the span of the corresponding subtree, which is simply the range between the left-most and right-most decendants of that node. If internal node labels are nested (e.g., "Mammalia" and "Primates") I draw the most recent internal node label, the rationale being that I want only a single set of vertical bars. This gives the visual effect of partitioning up the leaves into non-overlapping sets. This gives us a diagram like this:


OK, but what about all the nodes we can't see? What I do here is make the tree "clickable" in the following way. If there are internal node labels I make the corresponding tree clickable. I also traverse the tree looking for well defined clusters -- basically subtrees that are isolated by a long branch from their nearest neighbours -- and make these clickable. This approach is partly a hang over form earlier experiments on automatically folding a tree (partly inspired by doi:10.1111/1467-8659.00235). The key point is I'm trying to avoid testing for mouse clicks on nodes and edges, as many of these will be ocluded by other nodes and edges, and it will also be expensive to do hit testing on nodes and edges in a big tree.

If you click on one the script extracts the subtree and reloads the display showing just that part of the tree, using exactly the same approach as above. Behind the scenes the code is doing a least common ancestor (LCA) query, hence it defines subtrees rather like the Phylocode does (oh the irony).

Pros
  • Reasonably fast (everything you see is done live "on the fly").
  • Works in any modern browser, no dependence on plugins or technology that has limited support.
  • Image is clear, text is small but legible.
  • Entirely automated layout


Cons
  • Reloading a new page is costly in terms of time, and potentially disorienting (you loose sense of the larger tree).
  • It is not obvious where to click on the tree (needs to be highlighted).
  • Text is not clickable. This is would be really useful for internal node labels.