Monday, September 24, 2012

Towards a biogeographic search engine

We all have a "past" that we might not advertise widely, and my past includes flirting with panbiogeography. Indeed my PhD thesis hdl:2292/1999 is entitled "Panbiogeography: a cladistic approach." Shortly after graduating I moved on to host-parasite cospeciation and the gene tree/species tree problem ("reconciled trees", see Katz et al. for a recent example of this approach), but part of me misses the glory days of vicariance, dispersal, and panbiogeography.

One thing which strikes me is how little use large-scale historical biogeography makes of GBIF data. One of the things that made Croizat's panbiogeography so interesting was the way he exposed similar distribution patterns in unrelated groups of organisms. He did this by hand, producing map after map, some embellished with all manner of annotations ("gates", "nodes", "massings", etc.). In some ways, Croizat as an early data miner. Now we are awash in distributional data, where are the people revisiting global scale historical patterns? In particular, wouldn't it be cool to have a biogeographic search engine that could pull out taxa with particular distribution patterns that we could then analyse.

For example, while working on a project to map taxonomic names to literature and genomics data, I embedded a widget to display GBIF maps. Every so often I come across taxa have the classic "Gondwana" distribution pattern. For example, below is a map for stoneflies of the family the Notonemouridae from GBIF.

Below is a map for the Notonemouridae using an orthographic projection (see earlier post for details):

Another family of stone flies, the Gripopterygidae, show a similar pattern:


What I'd like is to be able to query a database like GBIF for patterns such as these Gondwanic distributions, then be able to pull out associated phylogenetic information (e.g., via sequences in GenBank) so that we could determine the antiquity of these patterns, and whether they are consistent with geological models. We could begin to do large-scale testing of biogeographic hypotheses in a (semi-)automated way. At present we generally rely on a few well-studied examples that are either broadly consistent with
Bocxlaer, I. V., Roelants, K., Biju, S. D., Nagaraju, J., & Bossuyt, F. (2006). Late Cretaceous Vicariance in Gondwanan Amphibians. (M. Hofreiter, Ed.)PLoS ONE, 1(1), e74. doi:10.1371/journal.pone.0000074.t002

or contradict
Cook, L. G., & Crisp, M. D. (2005). Not so ancient: the extant crown group of Nothofagus represents a post-Gondwanan radiation. Proceedings of the Royal Society B: Biological Sciences, 272(1580), 2535–2544. doi:10.1098/rspb.2005.3219

the hypothesis that the history of biota of the southern hemisphere has been largely structured by the break-up of Gondwana.

A first step might be to index distributions at, say, family level and above, and provide a series of polygons representing different distribution patterns. We then search for distributions that are largely concordant with those patterns, and query GenBank (or TreeBASE) for sequences (or phylogenies) for those taxa. We then ask the questions "how old are these taxa?" and "what biogeographic histories do they have?"