I need more time to sketch this out fully, but I think a case can be made for a taxonomy-centric (or, perhaps more usefully, a biodiversity-centric) clone of PubMed Central.
Here are some reasons:
- PubMed Central has pioneered the use of XML to archive and publish scientific articles, specifically JATS XML. But the biodiversity literature comes in all sorts of formats, including several flavours of XML (such as SciElo XML, XML from OCR literature such as DjVu, ABBYY, and TEI, etc.)
- While Europe PMC is doing nice things with ORCIDs and entity extraction, it doesn't deal with the kinds of entities prevalent in the taxonomic literature (such as geographic localities, specimen codes, micro citations, etc.). Nor does it deal with extracting the core scientific statements that a taxonomic paper makes.
- Given that much of the taxonomic literature will be derived from OCR we need mechanisms to be able to edit and fix OCR errors, as well as markup errors in more recent XML-native documents. For example, we could envisage having XML stored on GitHub and open to editing.
- We need to embed taxonomic literature in our major biodiversity databases, rather than have it consigned to ghettos of individual small-scale digitisation project, or swamped by biomedical literature whose funding and goals may not align well with the biodiversity community (Europe PMC is funded primary by medical bodies).