Tuesday, April 21, 2015

Looking up specimen codes in GBIF using Google Spreadsheet

Playing with the my "material examined" tool I've been working on, I wondered whether I could make use of it in, say, a spreadsheet. Imagine that I have a spreadsheet of museum codes and want to look those up in GBIF. I could create a service for Open Refine but Open Refine is a bit big and clunky, you have to fire up a Java application and point your browser at it, and Open Refine isn't as intuitive or as flexible as a spreadsheet.

It turns that Google Spreadsheets supports custom functions, including importing JSDON from a remote data source. Following How to import JSON data into Google Spreadsheets in less than 5 minutes here's what to do:

  1. Create a new Google Spreadsheet.
  2. Click on Tools -> Script Editor.
  3. Click Create script for Spreadsheet.
  4. Delete the placeholder content and paste the code from this script.
  5. Rename the script to ImportJSON.gs and click the save button.
  6. Back in the spreadsheet, in a cell, you can type “=ImportJSON()” and begin filling out it’s parameters.

Lets imagine we have a spreadsheet with a specimen code in cell A1, e.g. "FMNH 187122".


To call the material examined service, we need a function like this:

=ImportJSON(CONCATENATE("http://bionames.org/~rpage/material-examined/service/api.php?code=",A1,"&match&extend=10"), "/hits/key,/hits/scientificName", "noHeaders")

Paste this into cell B1 (i.e., just to the right of the specimen code) and after a short delay you should see something like this:


The three parameters supplied to ImportJSON are are the query URL, written as a spreadsheet function that grabs the specimen code from cell A1, a list of the bits of data we want to extract from the result (expressed as JSON paths), and some options (in this case, don't show the headers). ImportJSON will grab the specimen code in cell A1, add it to the query URL, then output the results. You should see something like this:

The first column is the GBIF occurrence ID, the second is the scientific name (you can add more JSON paths to get more fields).

Note that we have multiple rows as there is more than one specimen with the code "FMNH 187122" in GBIF. Now, we can ask the material examined service to return only certain taxa (such as mammals) by adding the "scientificName" parameter:

=ImportJSON(CONCATENATE("http://bionames.org/~rpage/material-examined/service/api.php?code=",A10,"&scientificName=",B10,"&match&extend=10"), "/hits/key,/hits/scientificName", "noHeaders")

If you put the specimen code in cell A10, and the higher taxon "Mammalia" in cell B10, and paste the function above into cell C10, then you should see something like this:


Note that now we have a single row with the mammal specimen.

It's a little bit fussy (you need to get the ImportJSON script, and mess a bit with the parameters but it's quick and flexible, and you get all the power of a spreadsheet to help clean the data before trying to match it to GBIF. Plus you can do it all in your browser.