Monday, March 07, 2011

Nomenclator Zoologicus meets Biodiversity Heritage Library: linking names directly to literature

Following on from my previous post on microcitations I've blasted all the citations in Nomenclator Zoologicus through my microcitation service and created a simple web site where these results can be browsed.

The web site is here:

To create it I've taken a file dump of Nomenclator Zoologicus provided by Dave Remsen and run all the citations through the microcitation service, storing the results in a simple database. You can search by genus name, author and year, or publication. The search is pretty crude, and in the case of publications can be a bit hit and miss. Citations in Nomenclator Zoologicus are stored as strings, so I've used some crude rules to try and extract the publication name from the rest of the details (such as page numbering).

To get started, you can look at names published by published by Distant in 1910, which you can see below:


If the citation has been found you can click on the icon to view the page in a popup, like this:


You can also click on the page number to be taken to that page in BHL.

I've also added some other links, such as to the name in the Index to Organism Names, as well as bibliographic identifiers such as DOIs, Handles, and links to JSTOR and CiNii.

So far only 10% of Nomenclator Zoologicus records have a match in BHL, which is slightly depressing. Browsing through there are some obvious gaps where my parser clearly failed, typically where multiple pages are included in the citation, or the citation has some additional comments. These could be fixed. There are also cases where the OCR text is so mangled that a match has been rejected because the genus name and text were too different.

This has been hastily assembled, but it's one vision of a simple service where we can go from genus name to being able to see the original publication of that name. There are other things we could do with this mapping, such as enabling BHL to tell users that the reference they are looking at is the original source of a particular name, and enabling services that use BHL content (such as EOL and Atlas of Living Australia to flag which reference in BHL is the one that matters in terms of nomenclature.