Monday, October 06, 2008

Global biogeographical data bases on marine fishes: caveat emptor

D. Ross Robertson has published a paper entitled "Global biogeographical data bases on marine fishes: caveat emptor" (doi:10.1111/j.1472-4642.2008.00519.x - DOI is broken, you can get the article here). The paper concludes:
Any biogeographical analysis of fish distributions that uses GIS data on marine fishes provided by FishBase and OBIS 'as is' will be seriously compromised by the high incidence of species with large-scale geographical errors. A major revision of GIS data for (at least) marine fishes provided by FishBase, OBIS, GBIF and EoL is essential. While the primary sources naturally bear responsibility for data quality, global online providers of aggregated data are also responsible for the content they serve, and cannot side-step the issue by simply including generalized disclaimers about data quality. Those providers need to actively coordinate, organize and effect a revision of GIS data they serve, as revisions by individual users will inevitably lead to confused science (which version did you use?) and a tremendous expenditure of redundant effort. To begin with, it should be relatively easy for providers to segregate all data on pelagic larvae and adults of marine organisms that they serve online. Providers should also include the capacity for users to post readily accessible public comments about the accuracy of individual records and the overall quality of individual data bases. This would stimulate improvements in data quality, and generate 'selection pressures' favouring the usage of better quality data bases, and the revision or elimination of poor-quality data bases. The services provided to the global science community by the interlinked group of online providers of biodiversity data are invaluable and should not be allowed to be discredited by a high incidence of known serious errors in GIS data among marine fishes, and, likely, other marine organisms. (emphasis added)

As I've noted elsewhere on this blog, and as demonstrated by Yesson et al.'s paper on legume records in GBIF (doi:10.1371/journal.pone.0001124) (not cited by Robertson), there are major problems with geographical information in public databases. I suspect there will be more papers like this, which I hope will inspire database providers and aggregators to take the issue seriously. (Thanks to David Patterson for spotting this paper).