Monday, February 02, 2009

Wiki modelling - Part 4

Another issue I'm trying to get my head around is how to deal with labels in phylogenies. These can be any number of things, such as GenBank sequences, specimen codes, taxon names, abbreviations of taxon names, laboratory codes, etc. Here's my quick attempt to model these:

This sketches various levels of indirection to go from a label in a tree to a taxon name. The may be short form of a taxon name (one redirect to name), it may contain a specimen code (redirect to specimen code, then link to name), or it may be GenBank sequence (redirect to accession number, then via source to taxon name with corresponding NCBI taxon id). There are other cases to consider, such as synonyms, but I'll try to deal with these later. At this stage I'm looking at how to make it simple to query for all phylogenies that contain a given taxon.