Thursday, May 08, 2008

Fixing GBIF

The more I play with GBIF the more I come across some spectacular errors. Here's one small example of what can go wrong, and how easy it would be to fix at least some of the errors in GBIF. This is topical given that the recent review of EOL highlighted the importance of vetting and cleaning data.

The frog Boophis periegetes features in a recent study of DNA barcoding (doi:10.1186/1742-9994-2-5). The sequences from this study (AY848605-9) aren't georeferenced in GenBank, but in iPhylo they are courtesy of Metacarta's web services. The sequences are located in Madagascar.

Finding errors

Curious about the frog I did a search in iSpecies and got the following map:


Oops, the frog is found in the middle of the South Atlantic(!), and in Brazil(!?).
These specimen records are provided by the MCZ, Harvard. Looking at the latitude and longitude co-ordinates, it's clear that there has been a comedy of errors. In the case of MCZ A-119852 the longitude is west instead of east, for MCZ A-119850 and MCZ A-119851 the latitude and longitudes have been swapped, and the longitude is west instead of east (again). If we make these changes, the specimens go back to Madagascar (the rectangle on the SVG map below). If you don't see the map, use a decent web browser such as Safari 3 or Firefox 2. If you must use Internet Explorer, grab the RENESIS player.


Error, browser must support "SVG"




Interestingly the DiGIR records all list the country as Madagascar, so for any specimen in GBIF it would be trivial to test whether:
  1. do the co-ordinates for the specimen fall inside the bounding box for the country?
  2. if not, will they if we change sign (i.e., hemispheres) and/or swap latitude and longitude?

These would be trivial things to do, and would probably catch a lot of the more egregious errors in GBIF.

Fixing errors
What will be interesting is whether these records will be fixed. I have sent feedback via GBIF's web site, as well as sending an email to the MCZ. I'll let readers know what happens.

Ground truth

Lastly, those interested in the frog itself may find the iSpecies search frustrating as the link returned by Google Scholar leads to a page in Ingenta saying:
This title is now published by Blackwell Publishing and can be found here www.ingentaconnect.com/content/bsc/zoj

Nope, the paper in question is actually at ScienceDirect (doi:10.1006/zjls.1995.0040). This paper describes the species, and gives the latitude and longitude of the collection localities (correctly).