Friday, May 27, 2022

Round trip from identifiers to citations and back again

Note to self (basically rewriting last year's Finding citations of specimens).

Bibliographic data supports going from identifier to citation string and back again, so we can do a "round trip."

1.

Given a DOI we can get structured data with a simple HTTP fetch, then use a tool such as citation.js to convert that data into a human-readable string in a variety of formats.

Identifier Structured data Human readable string
10.7717/peerj-cs.214 HTTP with content-negotiation CSL-JSON CSL templates Willighagen, L. G. (2019). Citation.js: a format-independent, modular bibliography tool for the browser and command line. PeerJ Computer Science, 5, e214. https://doi.org/10.7717/peerj-cs.214

2.

Going in the reverse direction (string to identifier) is a little more challenging. In the "old days" a typical strategy was to attempt to parse the citation string into structured data (see AnyStyle for a nice example of this), then we could extract a truple of (journal, volume, starting page) and use that to query CrossRef to find if there was an article with that tuple, which gave us the DOI.

Identifier Structured data Human readable string
10.7717/peerj-cs.214 OpenURL query journal, volume, start page Citation parser Willighagen, L. G. (2019). Citation.js: a format-independent, modular bibliography tool for the browser and command line. PeerJ Computer Science, 5, e214. https://doi.org/10.7717/peerj-cs.214

3.

Another strategy is to take all the citations strings for each DOI, index those in a search engine, then just use a simple search to find the best match to your citation string, and hence the DOI. This is what https://search.crossref.org does.

Identifier Human readable string
10.7717/peerj-cs.214 search Willighagen, L. G. (2019). Citation.js: a format-independent, modular bibliography tool for the browser and command line. PeerJ Computer Science, 5, e214. https://doi.org/10.7717/peerj-cs.214

At the moment my work on material citations (i.e., lists of specimens in taxonomic papers) is focussing on 1 (generating citations from specimen data in GBIF) and 2 (parsing citations into structured data).