Reading the GitHub issue Define objective rules for taxon concept identity referred to by Markus Döring in a comment on a previous post, I'm once again struck by the unholy mess generated by any discussion of "taxonomic concepts". The sense of déjà vu is overwhelming. What drives me to distraction is how little of this seems to be directed at solving actual problems that biologists have, which are typically things like "what does this random name that I've come across refer to?" and "will you please stop changing the damn names!?".
One thing that's also struck me is the importance of stable identifiers for species, that is, identifiers that are stable even in the face of name changes. If you have that, then you can talk about changes in classification, such as moving a species from one genus to another.
In the diagram above, the species being moved from Sasia to Verreauxia has the same identifier ("afrpic1") regardless of what genus it belongs to. This enables us to easily determine the differences between classifications (and then link those changes to the evidence supporting the change). I find it interesting that projects that manage large classifications, such as eBird and the Reptile database use stable species identifiers (either externally or internally). If you are going to deal with classifications that change over time you need stable identifiers.
So I'm beginning to think that perhaps the single most useful thing we could do as a taxonomic database community is to mint stable identifiers for each unique, original species name. These could be human readable, for example the species epithet plus author name plus year, suitably cleaned up (e.g., all lower case). So, our species could be "sapiens-linnaeus-1758". This sort of identifier is inspired by the notion of uninomial nomenclature:
If the uninomial system is not accepted, or until it is, I see no hope of ever arriving at a really stable nomenclature. - Hubbs (1930)
For more reading, see for example
- Cantino, D. P., Bryant, H. N., Queiroz, K. D., Donoghue, M. J., Eriksson, T., Hillis, D. M., & Lee, M. S. Y. (1999). Species Names in Phylogenetic Nomenclature. Systematic Biology, 48(4), 790–807. doi:10.1080/106351599260012
- Hubbs, C. L. (1930). SCIENTIFIC NAMES IN ZOOLOGY. Science, 71(1838), 317–319. doi:10.1126/science.71.1838.317
- Lanham, U. (1965). Uninominal Nomenclature. Systematic Zoology, 14(2), 144. doi:10.2307/2411739
- Michener, C. D. (1963). Some Future Developments in Taxonomy. Systematic Zoology, 12(4), 151. doi:10.2307/2411757
- Michener, C. D. (1964). The Possible Use of Uninominal Nomenclature to Increase the Stability of Names in Biology. Systematic Zoology, 13(4), 182. doi:10.2307/2411777
Just to be clear I'm NOT advocating replacing binomial names with uninomial names (the references above are just to remind me about the topic), but approaches to developing uninomial names could be used to create simple, human-friendly identifiers. Oh, and hat tip to Geoff Read for the comment on an earlier post of mine that probably planted the seed that started me down this track.
So, imagine going to a web site and with the uninomial identifier being able to get the list of every variation on that name, including species names being in different genera (in other words, all the objective or homotypic synonyms of that name).
OK, nice, but what about taxa? Well the second thing I'd like to get is every (significant) use of that name, coupled with a references (i.e., a "usage"). These would include cases where the name is regarded as a synonym of another name. Given that each usage is dated (by the reference), we then have a timestamped record of the interpretation of taxa referred to by that name. Technically, what I envisage is that we are tracking nomenclatural types, that is, for a given species name we are returning every usage that refers to a taxon that includes the type specimen of that name.
We could imagine doing something trivial such as putting "/n/" before the identifier to retrieve all name variations, and "/t/" to retrieve all usages. One could have a suffix for a timestamp (e.g., "what was the state of play for this name in 1960?")
It seems that something like this would help cut through a lot of the noise around taxa. By itself, a list of names and references doesn't specify everything you might want to know about a taxon, but I suspect that some of the things taxonomists ask for (e.g., every circumscription, every set of "defining" characters, every pairwise relationship between every variation on a taxon's interpretation) are both unrealistic and probably not terribly useful.
For example, circumscriptions (defining a taxon by the set of things it includes) are often mentioned in discussions of taxon concepts, but in reality (at the species level) how many explicit circumscriptions do we have in the taxonomic literature? I'd argue that the circumscriptions that we do have the are the ones being generated by modern databases such as GBIF, iNaturalist, BOLD, and GenBank. These explicitly link specimens, photos, or sequences to a taxon (defined locally within that database, e.g. by an integer number), and in some cases are testable, e.g., BLAST a sequence to see if it falls in the same set of sequences. These databases have their own identifiers and notions of what comprises a taxon (e.g., based on community editing, automated clustering, etc.).
This approach of simple identifiers that link multiple name variations would support the name-based matching that is at the heart of matching records in different databases (despite the wailing that names ≠ taxa, this is fundamentally how we match things across databases). The availability of timestamp usages would enable us to view a classification at given point in time.
This needs to be fleshed out more, and I really want to explore the idea of edit scripts (or patch files) for comparing taxonomic classifications, and how we can use them to document the evidence for taxonomic changes. More to come...