From the "wouldn't it be cool if" department, one thing I've often thought would be very handy would be a web site that listed sequences in GenBank that were known (or suspected) to be problematic (especially sequences thought to have been misidentified). What I'd like to see is a site called something like "GenBank Watch" (a ripe off of Search Engine Watch) where this information is recorded.
There has some commentary on this issue in the literature (Rytas Vilgalys's article in New Phytologist doi:10.1046/j.1469-8137.2003.00894.x, and James Harris' article in Trends in Ecology and Evolution doi:10.1016/S0169-5347(03)00150-2).
Some workers do make available lists of dubious sequences, such as list of rejected sequences provided by the mor project. My concern is that a lot of this sort of information is buried in papers (e.g. this one suggesting AF203470 has been misidentified), or even worse, comes to light when manuscripts are reviewed, the authors remove the sequences from their data set, but the important information (that the sequence is bogus) isn't mentioned in the paper.
Wouldn't it be great if there was a web site were one could go and search for a sequence by accession number to if somebody had flagged that sequences as problematic? Ideally the site would enable users to comment on a sequence (for example, it whether the sequence is bogus might be contentious), and it would also need a web service interface so the search could be automated.
One for the "if I only had time" list.
No comments:
Post a Comment