iPhylo: TDWG 2017: thoughts on day 1

Roderic D. M. Page

Tuesday, October 03, 2017

TDWG 2017: thoughts on day 1

Some random notes on the first day of TDWG 2017. First off, great organisation with the first usable conference calendar app that I've seen (https://tdwg2017.sched.com).

I gave the day's keynote address in the morning (slides below).

Towards a biodiversity knowledge graph from Roderic Page

It was something of a stream of consciousness brain dump, and tried to cover a lot of (maybe too much) stuff. Among the topics I covered were Holly Bik's appeal for better links between genomic and taxonomic data, my iSpecies tool, some snarky comments on the Semantic Web (and an assertion that the reason that GenBank succeeded was due more to network effects than journals requiring authors to submit sequences there), a brief discussion of Wikidata (including using d3sparql to display classifications, see here), and the use of Hexastore to query data from BBC Wildlife. I also talked about Ted Nelson, Xanadu, using hypothes.is to annotate scientific papers (see Aggregating annotations on the scientific literature: a followup on the ReCon16 hackday), social factors in building knowledge graphs (touching on ORCID and some of the work by Nico Franz discussed here), and ended with some cautionary comments on the potential misuse of metrics based on knowledge graphs (using "league tables" of cited specimens, see GBIF specimens in BioStor: who are the top ten museums with citable specimens?).

TDWG is a great opportunity to find out what is going on in biodiversity informatics, and also to get a sense of where the problems are. For example, sitting through the Financial Models for Sustaining Biodiversity Informatics Products session you couldn't help being struck by (a) the number of different projects all essentially managing specimen data, and (b) the struggle they all face to obtain funding. If this was a commercial market there would be some pretty drastic consolidation happening. It also highlights the difficulty of providing services to a community that doesn't have much money.

I was also struck by Andrew Bentley's talk Interoperability, Attribution, and Value in the Web of Natural History Museum Data. In a series of slides Andrew outlined what he felt collections needed from aggregators, researchers, and publishers, e.g.:

What do collections want from aggregators like @GBIF ? #tdwg17 pic.twitter.com/WRmeafSbtv
— Roderic Page (@rdmpage) October 2, 2017

Chatting to Andrew at the evening event at the Canadian Museum of Nature, I think there's a lot of potential for developing tools to provide collections with data on the use and impact of their collections. Text mining the biodiversity literature on a massive scale to extract (a) mentions of collections (e.g., their institutional acronyms) and (b) citations of specimens could generate metrics that would be helpful to collections. There's a great opportunity here for BHL to generate immediate value for natural history collections (many of which are also contributors to BHL).

Also had a chance to talk to Jorrit Poelen who works on Global Biotic Interactions (GloBI). He made some interesting comparisons between Hexastores (which I'd touched on in my keynote) and Linked Data Fragments.

The final session I attended was Towards robust interoperability in multi-omic approaches to biodiversity monitoring. The overwhelming impression was that there is a huge amount of genomic data, much of which does not easily fit into the classic, Linnean view of the world that characterises, say, GBIF. For most of the sequences we don't know what they are, and that might not be the most interesting question anyway (more interesting might be "what do they do?"). The extent to which these data can be shoehorned into GBIF is not clear to me, although doing so may result in some healthy rethinking of the scope of GBIF itself.