Thursday, November 08, 2007

GBIF data evaluation

Interesting paper in PLoS ONE (doi:10.1371/journal.pone.0001124) on the quality of data housed in GBIF. The study looked at 630,871 georeferenced legume records in GBIF, and concluded that 84% of these records are valid. As examples of those that aren't, below is a map of legumes placed in the sea (there are no marine legumes).

Although the abstract warns of the dire consequences of data deficencies, the conclusions make for interesting reading:

The GBIF point data are largely correct: 84% passed our conservative criteria. A serious problem is the uneven coverage of both species and areas in these data. It is possible to retrieve large numbers of accurate data points, but without appropriate adjustment these will give a misleading view of biodiversity patterns. Coverage associates negatively with species richness. There is a need to focus on databasing mega-diverse countries and biodiversity hotspots if we are to gain a balanced picture of global biodiversity. A major challenge for GBIF in the immediate future is a political one: to negotiate access to the several substantial biodiversity databases that are not yet publicly and freely available to the global science community. GBIF has taken substantial steps to achieve its goals for primary data provision, but support is needed to encourage more data providers to digitise and supply their records.


Andy Mabbett said...

I wonder how many of those eroneous, "marine" records would be correct, if a negative latitude and/or longitude value were corrected to a positive, or vice versa?

Rod Page said...

Some may well be. I've noted some examples of this GenBank records.