Friday, February 27, 2009

Something's missing from taxonomic name vocabularies

In the wiki examples I've been developing I've been trying to model names using the TDWG LSID vocabularies, particularly TaxonName. Roger Hyam has obviously put a huge amount of work into developing these, and they handle just about everything I need. However, I think that there's one thing missing, namely a way to express the logical relationship between the parts of a multinomial taxonomic name.

For example, consider the fish Chromis circumaurea Pyle, Earle, and Greene, 2008, described by Rich Pyle and colleages (TED have recently posted a great video of Rich talking about discovering new species of fish). Chromis circumaurea is a species in the genus Chromis, and in the TaxonName vocabulary I can represent this relationship using the term "genusPart", which specifies the name of the genus. In a wiki page this could be a link to a page called "Chromis".

But, which "Chromis"? There are at least three:
  • Chromis Hübner 1819
  • Chromis Lacepède 1802
  • Chromis Cuvier, 1814
Only one of these is the fish (Chromis Cuvier, 1814). Cases of the same name being used for different organisms (homonymy) is not uncommon, so linking to strings isn't adequate to express the relationship between the two parts of the name Chromis circumaurea.

I'd alluded to this issue in my first major foray into RDF and taxonomic names (Taxonomic names, metadata, and the Semantic Web), where I proposed using the Dublin Core term "isPartOf" to link the specific epithet to the genus part. In this case, the link would be between URIs for the names Chromis circumaurea Pyle, Earle, and Greene, 2008 and Chromis Cuvier, 1814.

It's a small point, but without some means to link components of a name we're going to struggle to sensibly answer questions such as listing all the species in a given genus (or, perhaps more correctly, all the species names that have been published in a given genus).


Roger Hyam said...

Having put effort into the names vocabulary you would have expected me to have thought of that wouldn't you - and I kind of have - but maybe only half.

Under the ICBN it doesn't matter what the genus is. I'll use the Chromis example here but as if it were a plant. Suppose the author intended Chromis circumaurea to be in Chromis Hübner 1819 but that later some one dug up Chromis Lacepède 1802 and so Chromis Hübner was a later/junior homonym and so couldn't be used. What would happen to C. circumaurea? Well all the species names in both chromis' would still exist and still be valid but it would appear that they are all now in the new Chromis. Some one would have to come along and actually publish the new combinations to move those that used to be in the defunct Chromis to a new genus name.

Zoology under ICZN is different as new combinations aren't actual nomenclatural acts unless they create a homonym in which case the earliest use becomes considered a nomenclatural act and has priority. Looking at the ICZN code is as clear as mud on this. They should be 'disregarded' - no idea what that means but there you go. Perhaps some one could clarify.

I can't find in the code if the author of a new species-group name has to cite the original publication of the genus group name under which she is creating the new species. It maybe that they can just say "Chromis". This certainly doesn't have to happen with subsequent combinations of the name.

Bottom line is that we are just talking rules for making up a string. Saying which Chromis gets very close to taxonomy.

Having said all that I am, at this moment, doing some more work on the names voc with the intention of taking it forward for TDWG ratification. If we needed to have a pointer I'd add it. But we need explicit documentation in ICZN or ICBN to say that it is necessary. I have a rule that if I can cite the code paragraphs that govern something I am happy. If not it falls into taxonomy and should be dealt with in the TaxonConcept object.

God these nomenclatural debates are so wordy!


Rod Page said...

I guess I see the problem independently of whatever the codes say. Given a binomial name I'd like to be able to link to the genus, and if I rely on the string there will be cases when I can't make the link (it's ambiguous). In the case of names such as Agathis, I could end up linking an insect species epithet to a plant genus. I'd like to avoid this, where possible.

If we have multinomial names, then there is some sort of link between the parts of the names, and I'd like to reflect this using links between objects, rather than rely on strings alone.

Roger Hyam said...

You evil man confusing nomenclature and taxonomy!

The thing is that names are ambiguous and just because you want to link them together doesn't make them less so. If you want to impose your own disambiguation layer on the data (which is a good idea) it should be explicit that this is you making a judgment call and not something inherent in the names that you are just extracting.

I intend to make all these things easier to express with a simpler TaxonConcept object to enable the linking. Perhaps we should work together a little especially on examples.

Rod Page said...

OK, consider Agathis montana. Would you agree that it would be reasonable to link the species epithet montana de Laub. to the plant genus AgathisSalisbury, and not to the wasp genus Agathis Latreille? Likewise, should the wasp species name montana Shestakov be linked to the wasp genusAgathis Latreille and not plant genus AgathisSalisbury.?

Once we have multinomials aren't we bound to have to deal with this issue, and isn't it still one of nomenclature?

Frank Anderson said...

Rod -- Would a uninominal system, where each species is simply "species epithet + author, date" (e.g., melanogaster Meigen, 1830 for Drosophila melanogaster) help with this?

I guess you'd still have the problem of linking these uninominal species to clades (and there certainly can be multiple clades with the same name), but it would at least get you out of the requirement to make sure the species epithet is tied to the correct genus.

I see no reason other than tradition and inertia not to go to a uninominal system of some sort (too bad the Phylocoders caved on species, but I guess they'd ticked off enough people as it was...), but I wonder if such a move would hurt or help databasing, etc., efforts.

Rod Page said...

I guess the question is uniqueness. There aren't many species names (binomials) shared between plants and animals (such as Agathis montana), but many species epithets will be. The question will be how many of these will be unique if you add the author's name and date. I suspect there may be quite a few collisions if the same author has been working on a number of groups (for example, Linnaeus probably reused the same epithet many times in the same work). A search of the uBio database would help answer this.

I don't see any advantage in doing this retrospectively, given the huge disconnect with past literature and databases. Going forward it might be attractive. What would be even more attractive is if we started using GUIDs of some sort, so that it was clear what we were talking about. In the same way, DOIs have made linking to literature much clearer (think of all the different ways it's possible to write a bibliographic citation, as opposed to simply citing a DOI).

Roger Hyam said...

Uninomial system would have been better but it is a bit late now.