DNA barcoding generates a lot of specimen data with geographical coordinates (see for example Guest post: response to “Putting GenBank Data on the Map”). The question naturally arises: “how accurate are those coordinates?”.
Browsing the BOLD database using BOLD View I often come across sequences whose coordinates are labelled “Coordinates from country centroid”, so these may bear little relation to where the specimen was actually collected. But how can we assess the accuracy of other coordinates?
Inspired by a 2008 Flickr blog post The Shape of Alpha I decided to create plots of the distribution of geotagged specimens in the BOLD database, grouped by geographic level. For example, we could aggregate all points labelled as being from the country “India”, then subset those into points labelled as being from various regions within India, and so on down the geographic hierarchy implied by country, province, etc. Rather than plot all the points, I decided to sumamrise them using the same approach Flickr used, we enclose the points in an alpha shape. Below are examples for India.
The two maps differ in how closely the curve fits the points, which is determined by the value of alpha (α) used to compute the shape. The smaller the value the tighter the fit. The first map used α=0.3 and is fairly coarse, with α=0.1 we see the alpha shape skirts around Bangladesh, and is hence a better representation of the boundary of India.
The original Flickr blog post was showing how well geotagged photographs on Flickr were tracing out geographical areas. From my perspective, one reason to make these maps is to spot problematic records. For example, the map for Tasmania looks a bit strange. There are records on the Australian mainland, and Lord Howe and Macquarie Islands that clearly aren’t from “Tasmania”. Maybe the coordinates are wrong, maybe the placename is wrong? Either way, we now have some records to investigate.
This project is live on the BOLD View web site, it was mostly written using Claude Code, making use of the GIS features in Postgres. It is an example of how easy AI tools make it to do some quick exploration of an idea (in this case, something inspired by a blog post that is nearly twenty years old).
Written with StackEdit.


