Wednesday, May 13, 2026

A new way to view the Tree of Life

One of the grand challenges of comparative biology is to assemble the [“tree of life”](https://en.wikipedia.org/wiki/Tree_of_life_(biology), a diagram that connects all species in a single structure (let’s leave aside for now the question of whether a tree is actually the best representation). My goal here is to outline a way of navigating the tree of life, specifically the Open Tree of Life.

Given a tree with some 2 million species, the obvious question is how can we visualise it? There are several projects that can accommodate trees of this size, such as Vienne’s LifeMap, Rosindell’s OneZoom, and Taxonium. Each of these viewers is impressive in their own way, but in my opinion each has problems. LifeMap treats the tree as a static structure in 2D space and uses tiles to enable the user to zoom in and out in the same way we navigate a digital map. Because trees are mostly empty space it is easy to get lost. OneZoom uses an almost hypnotic fractal tree layout, coupled with zooming in and out - a similar approach to LifeMap but with a different way to render the tree. It is fun, but the fractal pattern distorts aspects of the tree. Taxonium takes a different approach, the complete tree is rendered in 2D and is uniformly zoomed on the y-axis, stretching it out.

None of these projects has felt satifsfactory to me. They often don’t use the screen area efficiently, labels can be hard to read, and they treat tree visualisation as simply scaling or stretching a fixed layout. Open Tree of Life itself has a viewer tries a different approach to showing the tree, collapsing various nodes, but it feels clunky in comparison to the other viewers. This is a pity, because the Open Tree of Life is a fascinating project, a supertree that is regularly(ish) updated with new phylogenies, and which links to evidence for each node in that tree.

For a while I’ve been exploring a method called summary trees to display large trees, such as taxonomic classifications (based on work by Karloff and Shirley). The key feature of a summary tree is that you collapse a tree to a specified number of nodes (or leaves), which means you can ensure that the tree fits into your display space, and hence that all labels are legible. The trick is to figure out what nodes to collapse. I’ve used the approach of Libin et al. that partitions a tree based on a score given to each node.

This is a nice idea, but if you fit the tree of life into a browser window say, 30 lines high, then how do you see the rest of the tree? One approach would be to treat growing the tree as a form of zooming, so that one level of zoom would grow the tree to twice the size, and so on, and you would then have to pan to see the whole tree. I think this has potential for individual phylogenies, but for really big trees you just end up getting lost.

Instead, what if you clicked on a node in the tree and that node became the root of a new tree that you could explore, and that tree would be guaranteed to fit in your window? So you browse through the tree, making different parts fan out or collapse as needed.

This seemed appealing, but animating the transition between trees felt rather beyond my programming skills… so I asked ChatGPT and Claude for help. Part of the challenge to problem solving is understanding what the actual problem is. ChatGPT introduced me to the idea of a “transition scene” where you have the before and after trees, and you compute how one transforms into the other. Claude Code made this a reality, and now I could smoothly navigate around the tree. Obviously, starting at the root of the whole tre everytime would get tedious, so I added a simple search tool to find a node in the tree to start from.

So we have a the notion of collapsing a tree to a given size (summary trees), a way to decide what nodes to collapse (a combination of a scoring scheme and a priority queue), and we use transition scenes to move between trees. You can see the result of all this here: https://iphylo.org/ott-viewer.

Having got a browseable tree working, the next issue is how do you go “back”, and what does “going back” even mean? We can wire up the browser’s back button to take you back to the previous tree, but I wanted something more. I’d come across a paper that described “Hoptrees” which shows your navigation history not as a simple linear list of where you have been, but arranges that history as a tree. This felt like a natural fit for navigating the tree of life, and hence above the tree you will see your navigation history as a simplified version of the larger tree.

As always there is more that could be done, but this feels like a natural stopping point. The tree browser works, and when I use it I spend less time thinking about the interface and more about the relationships in the tree, and that feels as it should be.

References

Brooks, M., West, J. D., Aragon, C. R., & Bergstrom, C. T. (2013). Hoptrees: Branching History Navigation for Hierarchies. In P. Kotzé, G. Marsden, G. Lindgaard, J. Wesson, & M. Winckler (Eds), Human-Computer Interaction – INTERACT 2013 (pp. 316–333). Springer. https://doi.org/10.1007/978-3-642-40477-1_20

Karloff, H., & Shirley, K. E. (2013). Maximum Entropy Summary Trees. Computer Graphics Forum, 32(3pt1), 71–80. https://doi.org/10.1111/cgf.12094

Libin, P., Vanden Eynden, E., Incardona, F., Nowé, A., Bezenchek, A., EucoHIV Study Group, Sönnerborg, A., Vandamme, A.-M., Theys, K., & Baele, G. (2017). PhyloGeoTool: Interactively exploring large phylogenies in an epidemiological context. Bioinformatics, 33(24), 3993–3995. https://doi.org/10.1093/bioinformatics/btx535

Page, R. D. (2012). Space, time, form: Viewing the Tree of Life. Trends in Ecology & Evolution, 27(2), 113–120.

Sanderson, T. (2022). Taxonium, a web-based tool for exploring large phylogenetic trees. eLife, 11, e82392. https://doi.org/10.7554/eLife.82392

De Vienne, D. M. (2016). Lifemap: Exploring the Entire Tree of Life. PLOS Biology, 14(12), e2001624. https://doi.org/10.1371/journal.pbio.2001624

Wong, Y., & Rosindell, J. (2022). Dynamic visualisation of million‐tip trees: The OneZoom project. Methods in Ecology and Evolution, 13(2), 303–313. https://doi.org/10.1111/2041-210X.13766

Written with StackEdit.

Monday, May 04, 2026

Alpha shapes and DNA barcoding

How to cite: Page, R. (2026). Alpha shapes and DNA barcoding. https://doi.org/10.59350/qx8j9-vam77

DNA barcoding generates a lot of specimen data with geographical coordinates (see for example Guest post: response to “Putting GenBank Data on the Map”). The question naturally arises: “how accurate are those coordinates?”.

Browsing the BOLD database using BOLD View I often come across sequences whose coordinates are labelled “Coordinates from country centroid”, so these may bear little relation to where the specimen was actually collected. But how can we assess the accuracy of other coordinates?

Inspired by a 2008 Flickr blog post The Shape of Alpha I decided to create plots of the distribution of geotagged specimens in the BOLD database, grouped by geographic level. For example, we could aggregate all points labelled as being from the country “India”, then subset those into points labelled as being from various regions within India, and so on down the geographic hierarchy implied by country, province, etc. Rather than plot all the points, I decided to sumamrise them using the same approach Flickr used, we enclose the points in an alpha shape. Below are examples for India.

The two maps differ in how closely the curve fits the points, which is determined by the value of alpha (α) used to compute the shape. The smaller the value the tighter the fit. The first map used α=0.3 and is fairly coarse, with α=0.1 we see the alpha shape skirts around Bangladesh, and is hence a better representation of the boundary of India.

The original Flickr blog post was showing how well geotagged photographs on Flickr were tracing out geographical areas. From my perspective, one reason to make these maps is to spot problematic records. For example, the map for Tasmania looks a bit strange. There are records on the Australian mainland, and Lord Howe and Macquarie Islands that clearly aren’t from “Tasmania”. Maybe the coordinates are wrong, maybe the placename is wrong? Either way, we now have some records to investigate.

This project is live on the BOLD View web site, it was mostly written using Claude Code, making use of the GIS features in Postgres. It is an example of how easy AI tools make it to do some quick exploration of an idea (in this case, something inspired by a blog post that is nearly twenty years old).

Written with StackEdit.