Wednesday, October 24, 2018

Ottawa Ecobiomics hackathon: graph databases and Wikidata

Flag of Canada Pantone svg I spent last week in Ottawa at a "Ecobiomics" hackathon organised by Joel Sachs. Essentially we spent a week exploring the application of linked data to various topics in biodiversity, with an emphasis on looking at working examples. Topics covered included:

In addition to the above I spent some of the time working on encoding GBIF specimen data in RDF with a view to adding this to Ozymandias. Having Steve Baskauf (@baskaufs) at the workshop was a great incentive to work on this, given his work with Cam Webb on Darwin-SW: Darwin Core-based terms for expressing biodiversity data as RDF.

A report is being written up which will discuss what we got up to in more detail, but one take away for me is the large cognitive burden that still stands in the way of widespread adoption of linked data approaches in biodiversity. Products such as Metaphactory go some way to hiding the complexity, but the overhead of linked data is high, and the benefits are perhaps less than obvious. Update: for more o this see Dan Brickley's comments on "Semantic Web Interest Group now closed".

In this context, the rise of Wikidata is perhaps the most important development. One thing we'd hoped to do but didn't get that far was to set up our own instance of Wikibase to play with (Wikibase is the software that Wikidata runs on). This is actually pretty straightforward to do if you have Docker installed, see this great post in Medium Wikibase for Research Infrastructure — Part 1 by Matt Miller, which I stumbled across after discovering Bob DuCharme's blog post Running and querying my own Wikibase instance. Running Wikibase on your own machine (if you follow the instructions you also get the SPARQL query interface) means that you can play around with a knowledge graph without worrying about messing up Wikidata itself, or having to negotiate with the Wikidata community if you want to add new properties. It looks like a relatively painless way to discover whether knowledge graphs are appropriate for the problem you're trying to solve. I hope to find time to play with Wikibase further in the future.

I'll update this blog post as the hackathon report is written.