Another issue I'm trying to get my head around is how to deal with labels in phylogenies. These can be any number of things, such as GenBank sequences, specimen codes, taxon names, abbreviations of taxon names, laboratory codes, etc. Here's my quick attempt to model these:
This sketches various levels of indirection to go from a label in a tree to a taxon name. The may be short form of a taxon name (one redirect to name), it may contain a specimen code (redirect to specimen code, then link to name), or it may be GenBank sequence (redirect to accession number, then via source to taxon name with corresponding NCBI taxon id). There are other cases to consider, such as synonyms, but I'll try to deal with these later. At this stage I'm looking at how to make it simple to query for all phylogenies that contain a given taxon.
1 comment:
The NCBI taxon id works in a way similar to the GeoSpecies ID's in that the ID stays the same despite changes in nomenclature. I would use this except that there is no NCBI ID for most species. There needs to be a sequence for there to be an NCBI ID.
In this sense, the NCBI ID and GeoSpecies ID are compatible types of OTU's.
It seems that the genomics people are already using forms of the NCBI ID as Species URI's
This gets into what makes a good species URI.
I think that there is a species and different scientists publish taxonomic hypotheses for that species.
It seems that most others see this the other way around.
They see it from this perspective. There are publications (each with a slightly different meaning for the species they discuss). They point these at each other merge etc. and create an identifier.
Is this a useful, meaningful identifier? Maybe for publications and as additional information related to species concepts, but not something I would use to tie specimens to a operational taxonomic unit (OTU).
RelatedTopic: I have treebaseID's in the GeoSpecies database, but I have not figured out the best way to express them. Do you have a standard URI?
Post a Comment