Wednesday, September 23, 2015

Visualising big phylogenies (yet again)

Inspired in part by the release of the draft tree of life (doi:10.1073/pnas.1423041112 by the Open Tree of Life, I've been revisiting (yet again) ways to visualise very big phylogenies (see Very large phylogeny viewer for my last attempt).

My latest experiment uses Google Maps to render a large tree. Googletree Google Maps uses "tiles" to create a zoomable interface, so we need to create tiles for different zoom levels for the phylogeny. To create this visualisation I did the following:

  1. The phylogeny laid out in a 256 x 256 grid.
  2. The position of each line in the drawing is stored in a MySQL database as a spatial element (in this case a LINESTRING)
  3. When the Google Maps interface needs to display a tile at a certain zoom level and location, a spatial SQL query retrieves the lines that intersect the bounds of the tile, then draws those using SVG.
Hence, the tiles are drawn on the fly, rather than stored as images on disk. This is a crude proof of concept so far, there are a few issues to tackle to make this usable:
  1. At the moment there are no labels. I would need a way to compute what tables to show at what zoom level ("level of detail"). In other words, at low levels of zoom we want just a few higher taxa to be picked out, whereas as we zoom in we want more and more taxa to be labelled, until at the highest zoom levels we have the tips individually labelled.
  2. Ideally each tile would require roughly the same amount of effort to draw. However, at the moment the code is very crude and simply draws every line that intersects a tile. For example, for zoom level 0 the entire tree fits on a single tile, so I draw the entire tree. This is not going to scale to very large trees, so I need to be able to "cull" a lot of the lines and draw only those that will be visible.
In the past I've steered away from Google Maps-style interfaces because the image is zoomed along both the x and y axes, which is not necessarily ideal. But the tradeoff is that I don't have to do any work handling user interactions, I just need to focus on efficiently rendering the image tiles.

All very crude, but I think this approach has potential, especially if the "level of detail" issue can be tackled.