Tuesday, June 04, 2013

BioNames and where taxonomy is published

I've added a simple "dashboard" to BioNames to display some basic data about what is in the database. Apart from a table of the number of bibliographic identifiers in the database (currently there are 54,422 publications with DOIs, for example), there are some graphic summaries. These are a bit slow to load as they are created on the fly.

Publishers

The first summarises the relative frequency of articles from different publishers (broadly defined to include digital repositories such as DSpace and JSTOR). For most of this information I'm using data returned when I resolve a DOI at CrossRef. The data is incomplete and likely to change as I add more articles, and CouchDB finally catches up and indexes all the data.

Publishers

The biggest blob is BioStor, which is my project to extract articles from BHL. Magnolia Press publish Zootaxa, then there are some well-known mainstream publishers such as Springer, Wiley, and Taylor & Francis (Informa UK). These publishers have digitised the back catalogues of a number of society journals, so their prominence here doesn't mean that they are actively publishing new taxonomic content. One use for a diagram like this is to think about what content to data mine. BioStor content is open access (via BHL) and so can be readily mined. Some articles in Zootaxa are open access and so could also be downloaded and processed. Then we have the big commercial publishers, who have a significant fraction of taxonomic content behind their paywalls. If the community was to think about mining this data, then this diagram suggests which publishers to start asking first.

Journals

The next diagram shows articles grouped by journal (using the journal's ISSN).

Journals

There circles are too small to be labelled usefully. A couple of things strike me. The first is the sheer number of journals! The taxonomic literature is widely scattered across numerous different outlets, which is part of the challenge of indexing the literature (and this diagram includes only those journals that have ISSNs, many smaller or older ones don't). There is no one journal which dominates the landscape (the largest circle on the top right is Zootaxa). But this diagram spans the complete history of taxonomic publication, so includes large journals (such as Annals and Magazine of Natural History) that no longer exist (at least in their present form). Might be useful to slice this diagram by, say, decade to get a clearer picture of patterns of publication.

As the database builds I post some more summaries at BioNames.