Friday, August 26, 2016

Displaying original species descriptions in BioNames

B8e253dc3be3d84f2c69c51b0af86c03 400x400The goal of my BioNames project is to link every taxonomic name to its original description (initially focussing on animal names). The rationale is that taxonomy is based on evidence, and yet most of this evidence is buried in a non-digitised and/or hard to find literature. Surfacing this information not only makes taxonomic evidence accessible (see Surfacing the deep data of taxonomy), it also surfaces a lot of basic biological information. In many cases the original taxonomic description will be an important source of information about what a species looks like, where it lives, and what it does.

To date I've focussed on linking names to publications, such as articles, on the grounds that this is the unit of citation in science. It's also the unit most often digitised and assigned an identifier, such as a DOI. But often taxonomists cite not an article but the individual page on which the description appears. In web-speak, taxonomists cite "fragment identifiers". Page-level identifiers are not often encountered in the digital world, in part because many digital representations don't have "pages". But this doesn't mean that we can't have identifiers for parts of an article, for example in Fragment Identifiers and DOIs Martin Fenner gives examples of ways to link to specific parts of an online article. His examples work if the article is displayed as HTML. If we are working with XML (say, for a journal published by Pensoft), then we can use XPath to refer to sections of a document. Ultimately it would be nice to have stable identifiers for document fragments linked to taxonomic names, so that we can readily go from name to description (even better if that description was in machine-readable form). You could think of these as locators for "taxonomic treatments", e.g. Miller et al. 2015.

As a quick and dirty approach to this I've reworked BioNames to be able to show the page where a species name is first published. This only works if a number of conditions are met:

  • The BioName database has the page number ("micro reference") for the name.
  • BioNames has the full text for the article, either from BioStor or a PDF.
  • The taxonomic name has been found in that text (e.g., by the Global Names GNRD service).

If these conditions are met, then BioNames will display the page, like this example (Belobranchus segura Keith, Hadiaty & Lord 2012: Screenshot 2016 08 26 16 13 50

Both the page image and OCR text (if available) are displayed. This is a first step towards (a) making stable identifiers available for these pages, and (b) making the text accessible for machine reading.

For some more examples, try Heterophasia melanoleuca kingi Eames 2002 (bird), Echinoparyphium anatis Fischthal & Kuntz 1976 (trematode), Bathymodiolus brooksi Gustafson, Turner, Lutz & Vrijenhoek 1998 (bivalve), Amolops cremnobatus Inger & Kottelat 1998 (frog), Leptothorax caesari Espadaler 1997 (ant), and Daipotamon minos Ng & Trontelj 1996 (crab).