Thursday, April 18, 2013

Thoughts on GBIC 2012 and a vision of the future of biodiversity informatics

This seems to be the season for big, arm-wavy documents about the future of biodiversity informatics (see A decadal view of biodiversity informatics: challenges and priorities). An equivalent document is being drafted based on the Global Biodiversity Informatics Conference (GBIC 2012) conference. Writing these documents is hard work, they have to balance a set of conflicting visions, predict the future, and communicate a coherent plan to people who either could help make this happen, or feel they have a stake in the outcome.

Leaving all those constraints behind, and waving arms wildly, here's one take on the future of biodiversity informatics. I see three themes.

1. Knowing what we know

We have a limited grasp of how much we actually know, and crap tools to summarise this knowledge. I want a Google Analytics for biodiversity data where I can see at a glance the current state of our knowledge (e.g., what is the rate of sequencing of environmental samples in the Mediterranean? How much of Indonesia's amphibian fauna is in protected areas?). These are fairly trivial queries. If Google can analyse web traffic from sites being hit over a million times per day ( ~ 365 million hits per year) we can do the same thing on GBIF-scale databases. There is huge scope here for cool visualisation of the growth of our knowledge, such as this:

If biologists were explorers (Mammalia)... from Andrew W Hill on Vimeo.

Imagine the GBIF classification like this:

filesystem visualisation from wonderful websolutions on Vimeo.

2. Life stream

Terrible title, but this is where we monitor change, both "organic" and anthropogenic. This is where we use data mining to do a sentiment analysis of the biosphere, looking to detect changes such as outbreaks of disease, invasive species, etc. This builds on 1 but focusses on change. Imagine a "news service" for biology along the lines of tools available to financial markets (e.g., Silobreaker):

This is where we interface with decision makers, in the sense that Braulio Dias's statement "I am convinced that the lack of adequate biodiversity monitoring is at the heart of our difficulties to make convincing arguments" is true, this tackles that question.

3. Modelling the biosphere

Time to model all life on Earth ( is our equivalent of a moon shot (oh how I hate that analogy). Purves et al. have made the case, this is the task that will galvanise people outside the taxonomy/biodiversity community. This is real megascience (1. is data collection, 2. is data mining and analysis). Climate modellers and oceanographers get to do this:

Can we do the same?