Friday, December 13, 2019

The Semantic Web revisited: thoughts on SWAT4HCLS

This week I attended the SWAT4(HC)LS (Semantic Web Applications and Tools for Healthcare and Life Sciences) meeting in Edinburgh. Although a relatively small meeting, SWAT4(HC)LS attracts some big names in the field and featured keynotes by Denny Vrandečić (founder of Wikidata), Dov Greenbaum, Birgitta König-Ries, and Helen Parkinson.
For me this was a chance to get a sense of the state of the Semantic Web, and also to present a talk on biodiversity knowledge graphs. Given that this is a computer science meeting, you need to get a paper submitted and accepted in order to give a talk, so I hastily wrote up some notes on matching author names in taxonomic and bibliographic databases (there's a version of this on bioRxiv):
Page, R. D. M. (2019). Reconciling author names in taxonomic and publication databases. doi:10.1101/870170
Google the "Semantic Web" and pretty soon you discover that many people think it is dead (see Whatever Happened to the Semantic Web?). But it is still here, maybe partly because there is some ambiguity about just what it is. The 2003 paper "Which semantic web?" By Catherine C. Marshall and Frank M. Shipman (doi:10.1145/900051.900063) sketches three different Semantic Webs:

  1. a universal library, to be readily accessed and used by humans in a variety of information use contexts.
  2. the backdrop for the work of computational agents completing sophisticated activities on behalf of their human counterparts
  3. a method for federating particular knowledge bases and databases to perform
(1) is essentially what Google gives us, the ability to use a web browser to find stuff on the web, augmented by structured markup to help us do that (the "Library of Alexandria"). (2) is the idea of global ontologies, agents, and reasoning (the Knowledge Navigator), and (3) focusses on cross linking data in different databases (the "Federated Knowledge Base").

My own focus is very much in area (3), I want to link disconnected datasets together. Many of the presentations at SWAT4(HC)LS were more in area (2) and focussed on ontologies, especially medical. This is a world of big - not always open - ontologies, and lots of discussions about how to model data. In other words, what many people think of as the Semantic Web.

One of the nice things about the conference was the way people with posters got to give a lightning talk about their poster (I've seen this at VIZBI as well). I think this is a great idea and would love to see this at biodiversity conferences. The posters that I got the most out of were from the researchers at the DBCLS in Japan, such as TogoStanza (visualisations of SPARQL results), SPARQList (Markdown notebook for SPARQL), and Umaka Viewer (visualise classes in a SPARQL endpoint).

For fun I tried Umaka Viewer on my Ozymandias knowledge graph. You can see the results here.
It took about 30 minutes to generate the data for this visualisation, but it was fun to poke around at the internals of a knowledge graph that I had created. I discovered classes I'd forgotten I'd used!

As someone who spends a lot of time messing about with ways to collect, clean, and visualise data, it's no surprise that posters and presentations on tools for doing this are what I found most useful. The thing I find most appealing about the Semantic Web is the notion of having simple APIs that can query knowledge encoded in both web pages and databases (see also work by Franck Michel and colleagues on SPARQL Micro-Services, e.g. SPARQL Micro-Services Demo Page).