Friday, June 07, 2013

BioNames - Phylogenies? Yes, phylogenies

One of the things that didn't make last week's deadline for launching BioNames was the inclusion of phylogenies. This was disappointing as one of the reasons I built BioNames was to help span what I see as the gulf between classical biodiversity informatics and its emphasis on taxonomic names and classification, and modern phylogenetics where the tree is the primary focus, not some arbitrary way to partition it up.

So, where to get lots of phylogenies? I use the wonderful PhyLoTA database built by Mike Sanderson and colleagues:
Sanderson, M., Boss, D., Chen, D., Cranston, K., & Wehe, A. (2008). The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics Research. Systematic Biology, 57(3), 335-346. doi:10.1080/10635150802158688

I grabbed a dump of the trees, matched them to sequences in GenBank (more accurately, the European version, EMBL), did some post processing of those sequences, through them into CouchDB, built a SVG viewer, and voilà.

Here is a tree for the fig wasp family Agaonidae, showing the interactive zoomable tree viewer, and thumbnails for other trees for this taxon:


Here is a phylogeny for a genus of deep-sea mussels (Bathymodiolus), showing a map based on those sequences that are georeferenced in GenBank:


Lastly here is the page for the
bat family Vespertilionidae in the NCBI classification. Click on the "Data" tab to see this view.


There's still lots to do on this, but the key parts are in place. Personally I can happily while away the day just browsing through the trees, looking for case where taxa lack scientific names, obvious cases of synonymy (take a look at this tree for fiddler and ghost crabs, for example), and evidence that "species" have considerable internal phylogenetic structure.