Wednesday, January 18, 2012

Yet another reason why we need specimen identifiers, now!

This message appeared on the TAXACOM mailing list:

It is getting more and more necessary for taxonomists to demonstrate
that they are useful and used. This does not only apply to the
individual scientists, but also to institutions with taxonomic
collections, such as museums and herbaria.

In an attempt to live up to that increasing demand for documentation,
the leadership of the Natural History Museum of Denmark has issued an
order to its curatorial staff - The staff members are requested to
document which publications from 2011, written entirely by external
scientists, that in one way or another are based on material in the
collections of the Museum.

Given that most specimens lack resolvable digital identifiers (a theme I've harped on about before, most recently in the context of DNA barcoding), answering this kind of query ends up being a case of searching publications for text strings that contain the acronym of the collection. The sender of the message, Ib Friis, is alarmed at this prospect:

In publications, material from our herbarium at "C" is normally referred
to in text strings of one of the following forms: "(C)", "(C, ", ", C,"
or " C)". But a search in for example Google Scholar or other search
engines result in overflow of thousands and thousands of hits, even
when these text strings are combined with other relevant words such as
"botany", "plants", etc.

In an earlier paper "Biodiversity informatics: the challenge of linking data and the role of shared identifiers" ( (free preprint available here: hdl:10101/npre.2008.1760.1) I argued that having resolvable identifiers for specimens could enable measures of "citation" to be computed for specimens (and data derived from those specimens). Just as we have citation counts for articles and impact factors for journals, we could have equivalent measures for specimens and collections. These measures may keep administrators happy, for scientists I think the real benefits will be the ability to trace the provenance of some data, and the fate of data they themselves have collected or published.

For things such as publications it is trivial to track their usage. For example, to find the number of times the article "Biodiversity informatics: the challenge of linking data and the role of shared identifiers" has been cited, I simply enter the DOI into Google Scholar, e.g. Imagine being able to do the same for specimens?

For this to happen, museum specimens need digital identifiers. If museums are serious about quantifying the impact of their collections, they should make assigning digital identifiers a priority.