Monday, July 27, 2015

Nanopublications and annotation: a role for the Biodiversity Data Journal?

I stumbled across this intriguing paper:

Do, L., & Mobley, W. (2015, July 17). Single Figure Publications: Towards a novel alternative format for scholarly communication. F1000Research. F1000 Research, Ltd.
The authors are arguing that there is scope for a unit of publication between a full-blown journal article (often not machine readable, but readable) and the nanopublication (a single, machine readable statement, not intended for people to read), namely the Single Figure Publications (SFP):

The SFP, consisting of a figure, the legend, the Material and Methods section, and an optional Results/Discussion section, reduces the unit of publication to a more tractable size. Importantly, it results in a markedly decreased time from data generation to publication. As such, SFPs represent a new means by which to communicate scientific research. As with the traditional journal article, the content of the SFPs is readily understandable by the scientist. Coupled with additional tools that aid in structuring content (e.g. describing in detail the methods using pre-defined steps from protocols), the SFP represents a “bottom-up” means by which scholars can structure the content of their findings in a modular and piece-wise fashion wedded to everyday laboratory life.
It seems to me that this is something that the Biodiversity Data Journal is potentially heading towards. Some of the papers in that journal are short, reporting say, new occurence records for a single species e.g.:

Ang, Y., Rohner, P., & Meier, R. (2015, June 26). Across the Baltic: a new record for an enigmatic black scavenger fly, Zuskamira inexpectata (Pont, 1987) (Sepsidae) in Finland. BDJ. Pensoft Publishers.

Imagine if we have even shorter papers that are essentially a series of statements of fact, or assertions (linked to supporting evidence). These could potentially be papers that annotated and/or clarified data in an external database, such as GBIF. For example, let's imagine we find two names in GBIF that GBIF treats as being different taxa, but a recent publication asserts are actually synonyms. We could make that information machine readable (say, using Darwin Core Archive format), link it to the source(s) of the assertion (i.e., the DOI of the paper making the synonymy), then publish that as a paper. As the Darwin Core Archive is harvested by GBIF, GBIF then has access to that information, and when the next taxonomic indexing occurs it can make use of that information.

One reason for having these "micropublications" is that sometimes resolving an issue in a dataset can take some time. I've often found errors in databases and have ended up spending a couple of hours finding names, literature, etc. to figure out what is going on. As fun as that is, in a sense it's effort that is wasted if it's not made more widely available. But if I can wrap that couple of hours scholarship into a citable unit, publish it, and have it harvested and incorporated into, say, GBIF, then the whole exercise seems much more rewarding. I get credit for the work, and GBIF users get (hopefully) a tiny bit of improvement, and they can see the provenance of that improvement (i.e., it is evidence-based).

This seems like a simple mechanism for providing incentives for annotating databases. In some ways the Biodiversity Database Journal could be though of as doing this already, however as I'll discuss in the next blog post, there's an issue that is preventing it being as useful as it could be.