Monday, May 07, 2007

Catalogue of Life, OpenURL, and taxonomic literature

Playing with the recently released "Catalogue of Life" CD, and pondering Charles Hussey's recent post to TAXACOM about the "European Virtual Library of Taxonomic Literature (E-ViTL)" (part of EDIT) has got me thinking more and more about how primitive our handling of taxonomic literature is, and how it cripples the utility of taxonomic databases such as the Catalogue of Life. For example, none of the literature listed in the Catalogue of Life is associated with any digital identifier (such as a DOI, Handle, SICI, or even a URL). In the digital age, this renders the literature section nearly useless -- a user has to search for the reference in Google. Surely we want identifiers, not (often poorly formed) bibliographic citations? For example, I think hdl:2246/4613 is more useful than
Schmidt, K. P. 1921. New species of North American lizards of the genera Holbrookia and Uta. American Museum Novitates (22)

Given the Handle hdl:2246/4613, we get straight to the bibliographic resource, and in this case, a PDF of the paper. In the digital age this is what we need.

So, how to get there? Well, I think we need to focus on developing services to associate references with identifiers. Imagine a service that takes a bibliographic record and returns a globally unique identifier for that reference. This, of course, is part of what CrossRef provides through its OpenURL resolver.

OpenURL has been around a while, and despite the fact that it is probably over complicated (see I hate library standards for more on the seeming desire of librarians to make things harder than they need to be), I think it is a useful way to think about the task of linking taxonomic names to literature, especially if we keep things simple (see Rethinking OpenURL). In particular, drop the obsession with local context -- I don't care what my local library has, my library is the cloud.

So, what if we had an OpenURL service that took a bibliographic citation and queried local and remote sources for a digital identifier, such as a DOI or a Handle, for that citation? If there is no such identifier, then the next step is to create one. For example, the service could create a SICI (see my bookmarks for sici) for that item. Ideally, for those items that were digitised, we could have a database that associated SICIs with the resource location. For example, most of the journal Psyche is available free online as PDFs, and has XML files for each volume providing full bibliographic details (including URLs). It would be trivial to harvest these and add this information to an OpenURL service.

These ideas need a little more fleshing out, but I think it's time the taxonomic community started thinking seriously about digital identifiers for literature, and how they would be used. CrossRef is a great example of what can be done with some simple services (Handles + OpenURL), and it's a tragedy that every time DOIs come up people get blinded by cost, and don't spend time trying to understand how CrossRef works. If nyou want a good demonstration of what can be done with CrossRef, just look at Connotea, which builds much of its functionality on top of CrossRef web services.

It is also interesting that CrossRef is much simpler to use than repositories such as DSpace (used by the AMNH's digital library) -- each DSpace installation has it's own hooks to retrieve metadata (in some cases, such as the AMNH, appallingly badly formed), and as a result there is no easy way to discover what metadata is associated with a given handle, nor given a citation whether a handle exists for that citation.

So, when projects such as EDIT start talking about taxonomic libraries, I think they need to think in terms of simple web services that will serve as the building blocks for other tools. An OpenURL service would be a major boon, and would speed us towards the day when databases such as the Catalogue of Life would not contain (often inconsistently formed) text records of bibliographic works, but actionable identifiers. Any thing less and we remain in the dark ages.

No comments: