Friday, February 27, 2009

Something's missing from taxonomic name vocabularies

In the wiki examples I've been developing I've been trying to model names using the TDWG LSID vocabularies, particularly TaxonName. Roger Hyam has obviously put a huge amount of work into developing these, and they handle just about everything I need. However, I think that there's one thing missing, namely a way to express the logical relationship between the parts of a multinomial taxonomic name.

For example, consider the fish Chromis circumaurea Pyle, Earle, and Greene, 2008, described by Rich Pyle and colleages (TED have recently posted a great video of Rich talking about discovering new species of fish). Chromis circumaurea is a species in the genus Chromis, and in the TaxonName vocabulary I can represent this relationship using the term "genusPart", which specifies the name of the genus. In a wiki page this could be a link to a page called "Chromis".

But, which "Chromis"? There are at least three:
  • Chromis Hübner 1819
  • Chromis Lacepède 1802
  • Chromis Cuvier, 1814
Only one of these is the fish (Chromis Cuvier, 1814). Cases of the same name being used for different organisms (homonymy) is not uncommon, so linking to strings isn't adequate to express the relationship between the two parts of the name Chromis circumaurea.

I'd alluded to this issue in my first major foray into RDF and taxonomic names (Taxonomic names, metadata, and the Semantic Web), where I proposed using the Dublin Core term "isPartOf" to link the specific epithet to the genus part. In this case, the link would be between URIs for the names Chromis circumaurea Pyle, Earle, and Greene, 2008 and Chromis Cuvier, 1814.

It's a small point, but without some means to link components of a name we're going to struggle to sensibly answer questions such as listing all the species in a given genus (or, perhaps more correctly, all the species names that have been published in a given genus).