Tuesday, May 10, 2016

Notes on next steps for the million DNA barcodes map

Some notes to self about future directions for the "million DNA barcodes map" http://iphylo.org/~rpage/bold-map/.

Screenshot 2016 05 10 13 52 09

At the moment we have an interactive map that we can pan and zoom, and click on a marker to get a list of one or more barcodes at the location. We can also filter by major taxonomic group. Here are some ideas on what could be next.

Search

At the moment search is simply browsing the map. It would be handy to be able to enter a taxon or a barcode identifier and go to the corresponding markers on the map.

What is this?

If we have a single DNA barcode I immediately want to know "what is this?" A picture may help, and I may look up the scientific name in BioNames, but perhaps the most obvious thing to do is get a phylogeny for that barcode and similar sequences. These could then be displayed on the map using the technique I described in Visualising Geophylogenies in Web Maps Using GeoJSON (see also http://dx.doi.org/10.1371/currents.tol.8f3c6526c49b136b98ec28e00b570a1e).

So, ideally we would:

  1. Display information about that barcode (e.g., taxonomic identification where known).
  2. Display the local phylogeny of barcodes that contains this barcode.
  3. Display that phylogeny on the map
Hence we need to be able to generate a local phylogeny of barcodes, either on the fly (retrieve similar sequences then build tree) or using a precompute global barcode phylogeny from which we pull out the local subtree.

What is there?

A question that the map doesn't really answer is "what is the diversity of a given area?". Yes there are lots of dots, and you can click on them, but what would be nice is the ability to draw a polygon on the map (like this) and get a summary of the phylogenetic diversity of barcodes within that area.

100144 drummondFor example, imagine drawing a polygon around Little Barrier Island in New Zealand. Can we effectively retrieve the data published by Drummond et al. ( Evaluating a multigene environmental DNA approach for biodiversity assessment DOI:10.1186/s13742-015-0086-1)?.

To support "what is there?" queries we need to be able to:

  1. Draw an arbitrary spatial region region on the map and retrieve a set of sequences found within that region
  2. Retrieve the phylogeny for that set of sequences
Once agin, we either need to be able to build a phylogeny for an arbitrary set of sequences on the fly, or extract a subtree. If the a global tree is available, we could compute the length of the subtree, and also compute a visual layout fairly easily (essentially with time proportional to the number of sequences).

We'd also need to decide on the best way to visualise the phylogeny for the set of sequences. Perhaps something like Krona, or something more traditional.

Screen phymmbl

Summary

There doesn't seme to be any way of getting away from the need for a global phylogeny of COI DNA barcodes if I want to extend the functionality of the map.