Friday, January 24, 2014

NCBI taxonomy database now shows type material

Scott Federhen told me about a nice new feature in GenBank that he's described in a piece for NCBI News. The NCBI taxonomy database now shows a list of type material (where known), and the GenBank sequence database "knows" about types. Here's the summary:

The naming, classification and identification of organisms traditionally relies on the concept of type material, which defines the representative examples ("name-bearing") of a species. For larger organisms, the type material is often a preserved specimen in a museum drawer, but the type concept also extends to type bacterial strains as cultures deposited in a culture collection. Of course, modern taxonomy also relies on molecular sequence information to define species. In many cases, sequence information is available for type specimens and strains. Accordingly, the NCBI has started to curate type material from the Taxonomy database, and are using this data to label sequences from type specimens or strains in the sequence databases. The figure below shows type material as it appears in the NCBI taxonomy entry and a sequence record for the recently described African monkey species, Cercopithecus lomamiensis.

You can query for sequences from type using the query "sequence from type"[filter]. This could lead to some nice automated tools. If you had a bunch of distinct clusters of sequences that were all labelled with the same species name, and one cluster includes a sequence form the type specimen, then the other clusters are candidates for being described as new names.