Tuesday, April 24, 2012

Dark taxa even darker: NCBI pulls (some) DNA barcodes from GenBank (updated)

Dark taxa have become even darker. NCBI has pulled the plug on large numbers of DNA barcode sequences that lack scientific names. For example, taxon Cyclopoida sp. BOLD:AAG9771 (tax_id 818059) now has a sparse page that has no associated sequences. From an earlier download of EMBL I know that this taxon is associated with at least 5 sequences, such as GU679674. But if you go to that sequence you get this:

Obsolete

So the the sequence is hidden. You can retrieve it by clicking on the obsolete version link, but by default it is hidden.

It's an extraordinary state of affairs that a huge slice of fundamental biodiversity data has been effectively "pulled" from view.

UpdateSujeevan Ratnasingham from iBOL has pointed out that the sequence I'd used above (GU679674) was not one of the ones hidden by NCBI, rather it was suppressed at the request of the investigator (which I'd have realised if I'd paid more attention to the screenshot). HQ918317 is an example of a BOLD record that was suppressed:

Hq