This a quick writeup of an analysis I did to make the case that the list of names held by the Index of Organism Names (ION) (part of Thomson Reuters) would be very useful for GBIF. I must declare a bias, in that I've spent a good chunk of the last 3-4 years exploring the ION database and investigating ways to link the taxonomic names it contains to the primary taxonomic literature, culminating in building BioNames.
What makes ION special is its scope (it endeavours to have all names covered by the ICZN), and that many of its names have associated citation information (i.e., details on the publication that published the name). Like any name database it has duplications and errors, and some of the older content is a bit ropey, but it's a tremendous resource and from my perspective nothing else in zoology come close.
But rather than rely on anecdote, I decided to do a quick analysis to see what ION could potentially add to GBIF. I've been doing some work on bird names recently, so as an exercise I searched GBIF for holotype specimens for birds. The search (13 May 2015) returned 11,664 records. I then filtered those on taxonomic names that GBIF could not match exactly (TAXON_MATCH_FUZZY) or names that GBIF could only match to a higher rank (TAXON_MATCH_HIGHERRANK). The query URL is:
http://www.gbif.org/occurrence/search?TAXON_KEY=212 &TYPE_STATUS=HOLOTYPE &ISSUE=TAXON_MATCH_FUZZY &ISSUE=TAXON_MATCH_HIGHERRANKThis query found 6,928 records, so over half the bird holotype specimens in GBIF do not match a taxonomic name in GBIF. What this means is that GBIF can't accurately place these names in its own taxonomic hierarchy. It also makes it hard to do meaningful analyses of things such as "how long does it take before a bird specimen is collected to when it is described as a new species?" because if you can match the name then you can't get the date the name was published.
To explore this further, I downloaded the results of the query (the download has DOI http://doi.org/10.15468/dl.vce3ay). I then wrote a script to parse the specimen records and extract the GBIF occurrence id, catalogue number, and scientific name. I then used the GBIF API to retrieve (where available) the verbatim record for each specimen (using the URL http://api.gbif.org/v1/occurrence/
Here's a sample of the mapping:
Occurrence | Holotype | GBIF matched name | Verbatim name | ION | BioNames | Publicaton |
---|---|---|---|---|---|---|
883603238 | USNM PAL378357.3368464 | Porzana Vieillot, 1816 | Porzana severnsi | 879659 | 2c4f3... | Olson, S. L., & James, H. F. (1991). Descriptions of thirty-two new species of birds from the Hawaiian Islands: Part 1. Non-Passeriformes. Ornithological Monographs, 45, 1-88. doi:10.2307/40166794 |
858732312 | AMNH Skin-245914 | Otus choliba (Vieillot, 1817) | Otus choliba duidae | 4307811 | b3315... | Chapman, F. M., & History, T. D. E. of the A. M. of N. (1929). Descriptions of new Birds from Mt. Duida, Venezuela. American Museum Novitates, 380, 1-27. Retrieved from http://hdl.handle.net/2246/3988 |
858732345 | AMNH Skin-245936 | Atlapetes Wagler, 1831 | Atlapetes duidae | 4307791 | b3315... | Chapman, F. M., & History, T. D. E. of the A. M. of N. (1929). Descriptions of new Birds from Mt. Duida, Venezuela. American Museum Novitates, 380, 1-27. Retrieved from http://hdl.handle.net/2246/3988 |
858733764 | AMNH Skin-45339 | Leptotila Swainson, 1837 | Leptotila gaumeri Lawr. | |||
858744126 | AMNH Skin-218110 | Zosterops Vigors & Horsfield, 1827 | Zosterops alberti ablita |
The complete result of this mapping can be viewed here. Of the 6,392 holotypes with names not recognised by GBIF, nearly half (3,165, 49.5%) exactly matched a name in ION. Many of these are also linked to the publication that published that name.
So, adding ION help us find half the missing holotype names. This is before doing anything more sophisticated, such as approximate string matching, resolving synonyms, etc. Hence, I'd argue that the names in ION would add a lot to GBIF's ability to interpret the occurrence records it receives from museums.
I've not had time for further analysis, but at first glance a lot of the missed names are subspecies, the are quite a few fossils, and many names are in the relatively older literature. However there are also some recently described taxa, such as the hawk-owl Ninox rumseyi Rasmussen et al. 2012, and a bunting subspecies from Tristan du Cuhna (Nesospiza acunhae fraseri Ryan, 2008) that are missing from GBIF.