Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.
ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Friday, June 13, 2008
From PDFs to Google Earth
I've added a service to bioGUID that takes a PDF and attempts to extract latitude and longitude data from the PDF, returning those co-ordinates in either a Google Earth KML file, or in JSON format. This is one of a bunch of services that I'm adding to bioGUID to support some of the data mining that I'm doing.
To see what it can do, try this URL to get a list of localities in the paper Description of eight new species of shrub frogs (Ranidae: Rhacophorinae: Philautus) from Sri Lanka.
Then try this one to get the KML file, and open it in Google Earth. The service uses a bunch of regular expressions to try and extract latitude and longitude pairs from the text (needless to say, there are nearly as many different ways to write a latitude and longitude as there are authors).
The ultimate aim is to assemble a bunch of Open Access PDFs (say, from Zootaxa), run them through this service, then display the result on Google Earth. Think of it as a geography of taxonomy.
Oh, and the irony of me criticising GBIF for displaying poor quality data, then adding to this by providing a service to extract yet more co-ordinates of possibly doubtful validity has not entirely escaped me...
No comments:
Post a Comment
Note: only a member of this blog may post a comment.