Monday, January 25, 2016

iSpecies is back: mashing up species data

A decade ago (OMG, that can't be right, an actual decade ago) I created "iSpecies", a simple little tool to mashup a variety of data from GBIF, NCBI, Yahoo, Wikipedia, and Google Scholar to create a search engine for species. It was written in PHP, relied on some degree of *cough* web scraping to get its data, and was a bit of a toy (although that didn't stop me complaining that it could do more than EOL at the time). Eventually I got sick of dealing with Google Scholar constantly changing it's HTML and blocking IP addresses to stop people harvesting data (I once managed to get my entire campus blocked), or services disappearing such as Yahoo's image search, and I eventually pulled the plug on it.

A short course I run on "phyloinformatics" starts this week and one of the examples I show is a crude Javascript-based mashup. It struck me that I could tweak that and recreate a simple version of iSpecies, and that's exactly what I've done: http://ispecies.org.

Ispecies

It's nothing fancy, just takes a species name and searches GBIF, EOL, CrossRef, and Open Tree of Life, grabs some data and puts it together on a web page. There are lots of limitations (e.g., only fetches the first 300 localities in GBIF, requires scientific names, tree viewer is pretty awful) but it was pretty simple to put together. It's entirely client-side based, the code is all in the HTML file (and a few Javascript libraries) (the code is on GitHub: https://github.com/rdmpage/ispecies).

Fun as this was, there's a bigger problem with iSpecies and that's that it is a "mashup". I'm simply grabbing data from different sources and redisplaying it. What I really want is what has been described as a "mashup" (awful term, don't use it), that is, I want to combine the data so that it is more than the sum of its parts. For example, some of the data could be cross linked (especially if add a few more sources and we drill down a bit). Some of the papers discovered by CrossRef may include original descriptions, or may be the source of some of the points plotted on the GBIF map. Some may include the phylogenies used to build the Open Tree of Life tree. In order to build a data mashup instead of a web mashup we need to operate at the level of data rather than just human-readable web pages. That is the next thing I'd like to work on, and in many ways it shouldn't be a big leap. The new iSpecies was fairly easy to create because we now have a bunch of web services that all speak JSON. It's a small step from JSON to JSON-LD (especially if the JSON-LD is constructed with reuse in mind). So while it's nice to see iSpecies back, there's a much more interesting next step to think about.