The top ten collections with specimens in BioStor are:
|Dataset||Number of specimens|
|NMNH Vertebrate Zoology Herpetology Collections (National Museum of Natural History)||11194|
|Herpetology Collection (University of Kansas Biodiversity Research Center)||9619|
|Herpetology Collection (University of Kansas Biodiversity Research Center)||9328|
|NMNH Invertebrate Zoology Collections (National Museum of Natural History)||9061|
|CAS Herpetology Collection Catalog (California Academy of Sciences)||6720|
|MCZ Herpetology Collection (Museum of Comparative Zoology, Harvard University)||5818|
|NMNH Vertebrate Zoology Fishes Collections (National Museum of Natural History)||4642|
|MCZ Herpetology Collection - Reptile Database (Museum of Comparative Zoology, Harvard University)||4380|
|FMNH Herpetology Collections (Field Museum)||2110|
|FMNH Fishes Collections (Field Museum)||2061|
This is pretty much what I expected. Virtually complete runs of publications from The Field Museum at Chicago, the University of Kansas, and the Biological Society of Washington are available in BHL, and many of these have been added to BioStor. These journals have extensive taxonomic treatments of vertebrate taxa, particularly frogs, hence herpetology collections dominate the rankings.
There will inevitably be errors in the mapping between specimen codes and GBIF occurrences. I've tried to minimise these by mapping codes within taxonomic groups, but it's clear that there are duplicate codes even within some collections. There is also all manner of variation in the way people cite museum specimens, and these are often different from the codes that appear in GBIF. There will also be issues with extracting specimen codes, and I'm also discovering a few *cough* duplicates of articles in BioStor, so the numbers I present above are liable to change as I clean things up.
But one could imagine a "league table" of museum collections, where we can measure both the extent to which those collections have been digitised, and the extent to which material from those collections have been cited. We could use this to compute measures of the impact of a collection.
But for now I'm browsing the results trying to get a sense of how successful the mapping has been. There are some interesting examples. The specimen codes extracted from the article Review Of The Chewing Louse Genus Abrocomophaga (Phthiraptera : Amblycera), With Description Of Two New Species are those for the mammalian hosts of the lice. Hence someone viewing the records for these specimens and following the link to this paper would discover that these mammals had parasitic lice. If we add other sorts of links to the mix, such as between specimens and DNA sequences, then we can start to build a rich network of connections between the basic data of biodiversity.