Thursday, February 14, 2013

Rate of description of new animal species and *that* Taxatoy graph

As part of the discussion on whether legacy biodiversity literature matters a graph from the following paper came up:

Sarkar, I., Schenk, R., & Norton, C. N. (2008). Exploring historical trends using taxonomic name metadata. BMC Evolutionary Biology, 8(1), 144. doi:10.1186/1471-2148-8-144

So, why is the Sarkar et al. graph bogus? Here is their graph (Fig. 3) for animals:


This is the number of new animal species described each year, estimated by parsing taxonomic names and extracting the date in the taxonomic authority. There are two prominent "spikes" which are worrying. Sarkar et al. discuss the peak in 1994:

For example, the analyzed data indicate that a significant portion of the 1994 peak is due to an increase in descriptions of the family Cerambycidae, a large group of beetles.

So, 1994 was a bumper year for describing new species of Cerambycidae? Not quite. Taxatoy is based on names in uBio, and I have a local copy of most of these names. The Cerambycidae names contain lots of duplicate names that differ only in taxon authority. For example, searching the name Ancylocera macrotela on uBio finds:

Ancylocera macrotela
Ancylocera macrotela Aurivillius, 1912
Ancylocera macrotela BATES Henry Walter, 1880
Ancylocera macrotela Bates, 1880
Ancylocera macrotela Bates, 1885
Ancylocera macrotela Blackwelder, 1946
Ancylocera macrotela Chemsak & Linsley, 1970
Ancylocera macrotela Chemsak, 1963
Ancylocera macrotela Chemsak, 1964
Ancylocera macrotela Chemsak, Linsley & Mankins, 1980
Ancylocera macrotela Chemsak, Linsley & Noguera, 1992
Ancylocera macrotela Lameere, 1883
Ancylocera macrotela Maes & al., 1994
Ancylocera macrotela Monné & Giesbert, 1994
Ancylocera macrotela Monné, 1994
Ancylocera macrotela Noguera & Chemsak, 1996
Ancylocera macrotela Viana, 1971

These names are chresonyms. The original name is Ancylocera macrotela Bates, 1880 (you can see first publication of this name in BHL), the rest are subsequent citations of that name (gotta love taxonomy...).

Why the spike in 1994? I suspect that this is due to the publication in 1994 of "Checklist of the Cerambycidae and Disteniidae (Coleoptera) of the Western Hemisphere" by Miguel A Monné and Edmund F Giesbert. At least 8552 names from that checklist seem to have ended up in uBio, all with the date "1994". So the spike is an artefact. Similarly, the other peak (1912) corresponds to the publication of a checklist by Per Olof Christopher Aurivillius, which contributes over 3000 names.

One reason I was suspicious of the Taxatoy graph is that it doesn't look anything like the equivalent graph from the Index of Organism Names. After a bit of fussing I've grabbed data from the ION site, and from Taxatoy's Google Code repository and created the following chart:

Taxatoy version2

The data for this chart is on figshare ION is an index of all new animal names, based on Zoological Record. I place more confidence in its data than data derived from uBio, but it clearly ION has its own issues (such as the gap after 1850, and the uneven sampling of the early years of taxonomy). The key point is that arguments on the temporal distribution of taxonomic descriptions (and the value of legacy literature) need to be aware that the data used is in pretty poor shape.

Update 2013-02-23
Jose Antonio Gonzalez Oreja pointed out in an email that the values for ION that I used were a little higher than those that appear on the ION web site. My script for retrieving those values hadn't quite worked. I've uploaded the corrected data to Figshare, updated the diagram above, and put the web calls I used to fetch the data on GitHub The story doesn't change, but it helps to have the correct data.