Thursday, May 16, 2013

The impact of museum collections: one collection ≈ one Nobel Prize

Ideas on measuring the "impact" of a natural history collection have been bubbling along, as reflected in recent comments on iPhylo, and some offline discussions I've been having with David Blackburn and Alan Resetar.

My focus has been at the specimen-level, with a view to motivation the adoption of persistent specimen-level identifiers so that we can citations of specimens over time (e.g., in publications and databases such as GenBank). Not only does this provide a measure of the "impact" of a collection, it helps with provenance. If we sequence a specimen that is subsequently assigend to a different taxon and we have a way of tracking that specimen via its identifier, then we can transmit that new identification to other consumers of data based on that specimen. For example, we could automatically notify GenBank that what we thought was an x is actually a y.

So I made a simple "league table" of museum collections based on specimens cited in BioStor. There are all sorts of issues with this approach. Once you rank collections, people may use that to argue some can be axed and more resources funnelled into others. A more positive approach would be to indetify collections that are underused, and try and figure out why. And in the same way that taxonomic papers may have a citation long life, specimens may sit in a museum for a long time before being cited (for example, when eventually recognised as a new species doi:10.1016/j.cub.2012.10.029). So, metrics can be a double-edged sword.

Citing specimens is a useful metric, but not all citations are equal, and not all citations are immediate. A specimen that yields DNA sequences that are published in, say, Nature, arguably has more weight than a specimen listed in a rarely cited paper. Likewise, subsequent citations of a paper that cites a specimen should confer more weight on the value of that specimen. Elsewhere (doi:10.1093/bib/bbn022, preprint here: hdl:10101/npre.2008.1760.1) I've argued for a Google PageRank-style way to measure the impact of a specimen that takes into account papers and other objects derived from a specimen (e.g., images, sequences).

Meanwhile, Morgan Jackson alerted me to a quicker way to get a measure of the impact of the collection.

The "short note" Morgan refers to is by Kevin Winker and Jack J. Withrow:
Winker, K., & Withrow, J. J. (2013). Natural history: Small collections make a big impact. Nature, 493(7433), 480–480. doi:10.1038/493480b

They constructed a Google Scholar profile and collected papers that cite the University of Alaska Museum's bird collection (see here for full details). The h-score of this collection of papers is 42, which Winkler and Withrow note is "equivalent to an average Nobel laureate in physics". Here's the graph of citations over time:

Chart  1
It's a neat trick, if a little time consuming. But one advantage it has is that it puts collections on a similar footing to individual researchers. You could imagine asking the question "how much money would you spend supporting a researcher at this level?" How does this compare to the resources actually being spent?

One thing I hope will emerge from discussions like this is a desire to make specimens first-class citizens of the web, with stable identifiers that enable them to be cited in the same way we cite papers and, increasingly, data sets.