Tuesday, February 08, 2022

Duplicate DOIs (again)

This blog post provides some background to a recent tweet where I expressed my frustration about the duplication of DOIs for the same article. I'm going to document the details here.

The DOI that alerted me to this problem is https://doi.org/10.2307/2436688 which is for the article

Snyder, W. C., & Hansen, H. N. (1940). THE SPECIES CONCEPT IN FUSARIUM. American Journal of Botany, 27(2), 64–67.

This article is hosted by JSTOR at https://www.jstor.org/stable/2436688 which displays the DOI https://doi.org/10.2307/2436688 .

This same article is also hosted by Wiley at https://bsapubs.onlinelibrary.wiley.com/doi/abs/10.1002/j.1537-2197.1940.tb14217.x with the DOI https://doi.org/10.1002/j.1537-2197.1940.tb14217.x.

Expected behaviour

What should happen is if Wiley is going to be the publisher of this content (taking over from JSTOR), the DOI 10.2307/2436688 should be redirected to the Wiley page, and the Wiley page displays this DOI (i.e., 10.2307/2436688). If I want to get metadata for this DOI, I should be able to use CrossRef's API to retrieve that metadata, e.g. https://api.crossref.org/v1/works/10.2307/2436688 should return metadata for the article.

What actually happens

Wiley display the same article on their web site with the DOI 10.1002/j.1537-2197.1940.tb14217.x. They have minted a new DOI for the same article! The original JSTOR DOI now resolves to the Wiley page (you can see this using the Handle Resolver), which is what is supposed to happen. However, Wiley should have reused the original DOI rather than mint their own.

Furthermore, while the original DOI still resolves in a web browser, I can't retrieve metadata about that DOI from CrossRef, so any attempt to build upon that DOI fails. However, I can retrieve metadata for the Wiley DOI, i.e. https://api.crossref.org/v1/works/10.1002/j.1537-2197.1940.tb14217.x works, but https://api.crossref.org/v1/works/10.2307/2436688 doesn't.

Why does this matter?

For anyone using DOIs as stable links to the literature the persistence of DOIs is something you should be able to rely upon, both for people clicking on links in web browsers and developers getting metadata from those DOIs. The whole rationale of the DOI system is a single, globally unique identifier for each article, and that these DOIs persist even when the publisher of the content changes. If this property doesn't hold, then why would a developer such as myself invest effort in linking using DOIs?

Just for the record, I think CrossRef is great and is a hugely important part of the scholarly landscape. There are lots of things that I do that would be nearly impossible without CrossRef and its tools. But cases like this where we get massive duplication of DOIs when a publishers takes over an existing journal fundamentally breaks the underlying model of stable, persistent identifiers.