Thursday, September 29, 2022

The ideal taxonomic journal

This is just some random notes on an “ideal” taxonomic journal, inspired in part by some recent discussions on “turbo-taxonomy” (e.g., and, and also examples such as the Australian Journal of Taxonomy which seems well-intentioned but limited.


One approach is to have highly structured text that embeds detailed markup, and ideally a tool that generates markup in XML. This is the approach taken by Pensoft. There is an inevitable trade-off between the burden on authors of marking up text versus making the paper machine readable. In some ways this seems misplaced effort given that there is little evidence that publications by themselves have much value (see The Business of Extracting Knowledge from Academic Publications). “Value” in this case means as a source of data or factual statements that we can compute over. Human-readable text is not a good way to convey this sort of information.

It’s also interesting that many editing tools are going in the opposite direction, for example there are minimalist tools using Markdown where the goal is to get out of the author’s way, rather than impose a way of writing. Text is written by humans for humans, so the tools should be human-friendly.

The idea of publishing using XML is attractive in that it gives you XML that can be archived by, say, PubMed Central, but other than that the value seems limited. A cursory glance at download stats for journals that provide PDF and XML downloads, such as PLoS One and ZooKeys, PDF is by far the more popular format. So arguably there is little value in providing XML. Those who have tried to use JATS-XML as an authoring tool have not had a happy time: How we tried to JATS XML. However, there are various tools to help with the process, such as docxToJats,
texture, and jats-xml-to-pdf if this is the route one wants to take.

Automating writing manuscripts

The dream, of course, is to have a tool where you store all your taxonomic data (literature, specimens, characters, images, sequences, media files, etc.) and at the click of a button generate a paper. Certainly some of this can be automated, much nomenclatural and specimen information could be converted to human-readable text. Ideally this computer-generated text would not be edited (otherwise it could get out of sync with the underlying data). The text should be transcluded. As an aside, one way to do this would be to include things such as lists of material examined as images rather than text while the manuscript is being edited. In the same way that you (probably) wouldn’t edit a photograph within your text editor, you shouldn’t be editing data. When the manuscript is published the data-generated portions can then be output as text.

Of course all of this assumes that we have taxonomic data in a database (or some other storage format, including plain text and Mark-down, e.g. Obsidian, markdown, and taxonomic trees) that can generate outputs in the various formats that we need.

Archiving data and images

One of the really nice things that Plazi do is have a pipeline that sends taxonomic descriptions and images to Zenodo, and similar data to GBIF. Any taxonomic journal should be able to do this. Indeed, arguably each taxonomic treatment within the paper should be linked to the Zenodo DOI at the time of publication. Indeed, we could imagine ultimately having treatments as transclusions within the larger manuscript. Alternatively we could store the treatments as parts of the larger article (rather like chapters in a book), each with a CrossRef DOI. I’m still sceptical about whether these treatments are as important as we make out, see Does anyone cite taxonomic treatments?. But having machine-readable taxonomic data archived and accessible is a good thing. Uploading the same data to GBIF makes much of that data immediately accessible. Now that GBIF offers hosted portals there is the possibility of having custom interfaces to data from a particular journal.

Name and identifier registration

We would also want automatic registration of new taxonomic names, for which there are pipelines (see “A common registration-to-publication automated pipeline for nomenclatural acts for higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank)” These pipelines do not seem to be documented in much detail, and the data formats differ across registration agencies (e.g., IPNI and ZooBank). For example, ZooBank seems to require TaxPub XML.

Registration of names and identifiers, especially across multiple registration agencies (ZooBank, CrossRef, DataCite, etc.) requires some coordination, especially when one registration agency requires identifiers from another.


If data is key, then the taxonomic paper itself becomes something of a wrapper around that data. It still serves the function of being human-readable, providing broader context for the work, and as an archive that conforms to currently accepted ways to publish taxonomic names. But in some ways it is the last interesting part of the process.

Written with StackEdit.