Wednesday, September 05, 2012

BHL is duplicating DOIs because it doesn't know about articles

Quick note that as much as I like that the Biodiversity Heritage Library is using DOIs, they are generating them for publications that already have them (or are acquiring them from other sources). For example, here are the two DOIs for the same article (formatted using the DOI Citation Formatter), one from BHL and one from the Smithsonian:

Springer, V. G. (1982). Pacific Plate biogeography, with special reference to shorefishes / Victor G. Springer. Smithsonian Institution. doi:10.5962/bhl.title.37141
Springer, V. G. (1982). Pacific Plate biogeography, with special reference to shorefishes. Smithsonian Contributions to Zoology, (367), 1–182. doi:10.5479/si.00810282.367

The BHL DOI resolves to a page in BHL, the other DOI resolves to the a page in the Smithsonian Digital Repository (this article also has the handle hdl:10088/5222).

Now this is a problem, because DOIs are meant to be unique: one article, one DOI. I've encountered duplicates elsewhere, but in these cases one should be an alias of the other. In the example above, the DOIs resolve to different locations. If you are just after the content this isn't a huge problem, but if, say, you were using the DOI to uniquely identify the publication (say, in a database) you have a problem: which DOI to choose? If you and I choose differently then we will make statements about the same article but be unaware of that sameness.

Much of this problem arises because BHL has no concept of articles. Most articles are likely to reside within scanned volumes of a journal, but some articles (e.g., monographs) may be treated a single title by BHL, and each BHL title now gets a DOI.

I know that handling articles is on BHL's radar, but it because it hasn't tackled it yet we are going to have cases where BHL DOIs duplicate existing DOIs. In these cases, BHL may have to make their DOI an alias of the other DOI.