Wednesday, August 03, 2022

Papers citing data that cite papers: CrossRef, DataCite, and the Catalogue of Life

Quick notes to self following on from a conversation about linking taxonomic names to the literature. There are different sorts of citation:
  1. Paper cites another paper
  2. Paper cites a dataset
  3. Dataset cites a paper
Citation type (1) is largely a solved problem (although there are issues of the ownership and use of this data, see e.g. Zootaxa has no impact factor. Citation type (2) is becoming more widespread (but not perfect as GBIF's #citethedoi campaign demonstrates. But the idea is well accepted and there are guides to how to do it, e.g.:
Cousijn, H., Kenall, A., Ganley, E. et al. A data citation roadmap for scientific publishers. Sci Data 5, 180259 (2018). https://doi.org/10.1038/sdata.2018.259
However, things do get problematic because most (but not all) DOIs for publications are managed by CrossRef, which has an extensive citation database linking papers to other paopers. Most datasets have DataCite DOIs, and DataCite manages its own citations links, but as far as I'm aware these two systems don't really taklk to each other. Citation type (3) is the case where a database is largely based on the literature, which applies to taxonomy. Taxonomic databases are essentially collections of literature that have opinions on taxa, and the database may simply compile those (e.g., a nomenclator), or come to some view on the applicability of each name. In an ideal would, each reference included in a taxonomic database would gain a citation, which would help better reflect the value of that work (a long standing bone of contention for taxonomists). It would be interesting to explore these issues further. CrossRef and DataCite do share Event Data (see also DataCite Event Data). Can this track citations of papers by a dataset? My take on Wayne's question:
Is there a way to turn those links into countable citations (even if just one per database) for Google Scholar?
is that he's is after type 3 citations, which I don't think we have a way to handle just yet (but I'd need to look at Event Data a bit more). Google Scholar is a black box, and the academic coimmunity's reliance on it for metrics is troubling. But it would be interetsing to try and figure out if there is a way to get Google Scholar to index the citations of taxonomic papers by databases. For instance, the Catalogue of Life has an ISSN 2405-884X so it can be treated as a publication. At the moment its web pages have lots of identifiers for people managing data and their organisations (lots of ORCIDs and RORs, and DOIs for individual datasets (e.g., checklistbank.org) but precious little in the way of DOIs for publications (or, indeed, ORCIDs for taxonomists). What would it take for taxonomic publications in the Catalogue of Life to be treated as first class citations?