Wednesday, August 12, 2009

GBIF and Linked Data

At the end of day two of the GBIF LSID-GUID Task Group I put together this crude diagram to summarise some of the possible links between biodiversity data and the larger linked data cloud, which I, among others, have argued is where biodiversity informatics should be heading. Here's my hastily put together diagram (created using the wonderful OmniGraffle):

I've put GBIF at the centre since we're at GBIF, and it's them we are trying to convince. Yellow circles are biodiversity data sources (which aren't linked data providers (but some can me made so using, for example, my LSID proxy resolver), white circles are linked data sources.

The "sales pitch"is that if we join the linked data cloud we open up the possibility of some very powerful queries, especially once that are outside the relatively narrow scope of what GBIF and TDWG concern themselves with. Imagine being able to query biodiversity data with respect to population and economic data across countries. These are the sort of things we could realistically aim for.

On a practical level, it also means biodiversity database could devolve a lot of their tasks to other databases (via reusing identifiers). Some taxonomists have DBPedia URIs, and more could be added to Wikipedia (and so will find there way into DBPedia). Geonames provides geographic URIs which we could reuse, and so on. Within our own community we could do a better job of reusing our own identifiers, and reusing external ones (such as taxa in Wikipedia).

It's late, this is a rushed diagram, and I don't know if it's going to end up in whatever report we manage to assemble tomorrow (our final day). But I hope it captures some of the scope of what we're looking at. I know there are some problems (as have been pointed out to me on Twitter), I'll try and deal with these tomorrow.