Thursday, May 23, 2013

DOIs for specimens are here, but we're not quite there yet


I've been banging on about having citable, persistent identifiers for specimens, so was suitably impressed when Derek Sikes posted a comment on iPhylo that Arctos already does this. For example, here is a DOI for a specimen: http://dx.doi.org/10.7299/X7VQ32SJ.

Uam

So, we're all done, right? Not quite. DOIs by themselves don't get us where we (OK, where I think we) want to be. The DOI identifies a specimen, which is great (see discussion on iDigBio: You are putting identifiers on the wrong thing for why this matters). We can also get machine-readable metadata using the DOI (by using the URL http://data.datacite.org/10.7299/X7VQ32SJ ). The metadata is limited (ideally we'd want something like Darwin Core), but it is a start. It's not clear how we get from the DOI to Darwin Core.

There are at least two issues that remain to be tackled. The first is that we now have a bunch of identifiers for the same thing, e.g.:

Most of these identifiers don't know about each other (for example, GBIF doesn't know about the DOI, nor does Arctos link to GBIF). So we have disconnected pieces of information about the same thing.

The second issue is how do we discover a specimen DOI? CrossRef supports services where you can take a bibliographic citation, e.g. Phylogeny and biogeography of ice crawlers (Insecta: Grylloblattodea) based on six molecular loci: designating conservation status for Grylloblattodea species and get back a DOI (in this case, http://dx.doi.org/10.1016/j.ympev.2006.04.013). This makes it possible for publishers to take lists of literature cited in authors' manuscripts and quickly add DOIs to those citations. We don't have an equivalent service for specimens, which is going to make our task of linking specimens to sequences and the literature something of a challenge.

We are making progress, but there is some way to go. Identifiers are only part of the solution, we also need services.