Tuesday, November 05, 2019

Thoughts on Biodiversity Next

It’s been a while since I’ve posted on iPhylo. Since returning from a fun and productive time in Australia there have been a bunch of professional and personal things that have needed attending too. In amongst all this I attended Biodiversity Next in Leiden, a large (by biodiversity informatics standards) conference with the tag line "Building a global infrastructure for biodiversity data. Together." In this post I try and bring together a few random thoughts on the conference (the Twitter hashtag #biodiversitynext gives a much broader sense of the conference).

Spectacle


The venue for the keynotes was delightful, and guest speakers were ushered on stage with a rock-star soundtrack (which, frankly, grated at bit). Some of the keynotes were essentially TED talks, such as Theo Jansen on his wonderful Strandbeest and Jalila Essaidi on bulletproof skin and other biotechnology. Interesting, polished, hopeful.



Some keynotes were pitches, such as Paul Hebert’s BIOSCAN where we divide the planet into a grid (of squares, really?) and sequence barcodes for everything within each grid. The theme was moving from “artisanal to industrial” scale. BIOSCAN has a rival, the Earth BioGenome Project (EBP) (see https://doi.org/10.1073/pnas.1720115115) which aims to sequence the whole genome of every eukaryote in 10 years (at a cost of $US 4.7 billion). BIOSCAN is a rather cheaper, although Herbert sees it as the precursor to a larger initiative. But what makes BioSCAN more appealing to me is that it includes and explicit geographical and ecological context. BIOSCAN is interested in what species occur where, and who they are interacting with (the “symbiome”). But not everyone is convinced that mega-genomics projects are a good idea, see for example Proposals to “sequence the DNA of all life on Earth” suffer from the same issues as “naming all the species” by @JeffOllerton.

Other keynotes that resonated with me were Maxwell Gomera’s where he points out that for many people biodiversity is a risk (including an anecdote about people in Namibia seeing biodiversity as attracting the unwanted attention of outside interests, and hence something to be actively minimised), and Jorge Soberon on just how much of biodiversity informatics is data driven and theory-free. The presentation by Ana María Hernández Salgar on IPBES was perhaps the least exciting keynote, arguably because she’s tackling a probably intractable problem. We have some spectacular technology for documenting and understanding biodiversity, but no obvious way to change or significantly influence human impacts on that biodiversity.

Optics


The conference managed to score a pretty spectacular own goal by having an all-white, all male panel (“manel”) for one session (moderated by Ely Wallis @elyw).



There was a pointed response to this later in the conference (again moderated by Ely).



Personally I felt that neither panel contributed much beyond platitudes. I don’t think panel discussions like these do much to explore ideas, they are much more about appearances and positions (which makes the manel even more unfortunate).

There were other comments that were tone deaf. One senior figure argued that “money wasn’t a problem”, the implication being that there’s lots of it around, we just have to figure out how to access it. Yet, one of the sessions I attended featured a young researcher from Brazil who had to crowdfund his attendance at the conference. Money (or rather its uneven distribution) is very much a problem.

Infrastructure


The conference had its own app, and it worked well. It certainly made it easier to plan the day, which sadly was mostly realising that the two topics that you were out interested in hearing about were on at the same time. Big conferences have this fundamental problem that there are too many people and too many talks for people to see everything. This makes the event more a statement about community being large enough to stage such an event, than actually being a place to learn what is going on. But I guess the combination of breaks between sessions, social events, and the pre-conference workshops mean there are times and spaces where things can actually get done.

Substance


There was a lot going on at the conference, I am going to pick out just a few highlights for me. These are obviously very biased, and I missed a lot of the talks.

Cordra

The thing I was most interested to learn about was the technology underpinning DISSCO’s approach to putting specimen records online. Alex Hardisty (@AlexHardisty) gave a nice demo of DISSCO’s approach, which uses Cordra. From Cordra’s website:
Cordra is a highly configurable open source software offered to software developers for managing digital objects with resolvable identifiers at scale.
Cordra is from the Corporation for National Research Initiatives (CNRI), the people behind the Handle system which underpins DOIs. It's a NoSQL data store that can generate and manage persistent identifiers (e.g., Handles). I’ve not been following DISSCO closely, but this approach makes a lot of sense, and it will be interesting to see how it develops. Alex demoed a “digital specimen repository”, for example the record for specimen BMNH:2006.12.6.40-41 is here: http://nsidr.org/#objects/20.5000.1025/486a7e883f14f88bba37. Early days, but digitial identifiers for specimens are going to be crucial to efforts to interlink biodiversity data.

Knowledge graphs

I did my best to spread the knowledge graph meme, and Wikidata is attracting growing interest. Unfortunately I couldn’t see Franck Michel’s (@franck_michel2) talk on Bioschemas, but the idea of having light-weight markup for life science data is very attractive. It seems that long-standing dreams of linking things together are starting to slowly take shape.


Traits

This is an area that I have not thought much about. The Encyclopaedia of Life tried to carve out a niche in this area (TraitBank) but their latest iteration abandons the JSON-LD they developed in version 2.0, which seems a strategic blunder given the growth of interest in knowledge graphs, Bioschemas, and Wikidata. It seems that people working on traits are in a sort of pre-GBIF phase looking for ways to integrate diverse data into one or more places where people can play with it. There’s a lot of excitement here, but lots of data wrangling issues to deal with.

Credit and identity

The hashtag #citetheDOI became something of a rallying cry for those interested in GBIF data. Citing data downloaded from GBIF enables GBIF to pass information on usage along to data providers. Yet another example of the most compelling use case for identifiers not being scientific but cultural.

“Get yourself an ORCID” was another rallying cry. The challenge here is that the most obvious beneficiary of you getting an ORCID is not (yet) you, which makes the sales pitch a bit tricky.


People

It may be partly an age thing, but an increasingly important aspect of conference alike this is the chance to catchup with people you know, as well as develop new contacts and (hopefully) have your preconceptions challenged by people smarter than yourself. I spent quite a bit of time with the BHL crowd, which meant teasing them about their obsession with old books, which did not end well:

It was also fun to see Roger Hyam (@RogerHyam) in action again. Roger has a knack for cutting through the noise to make tools that are useful. He gave a nice demo of using the International Image Interoperability Framework (IIIF) to display herbarium images. Under the hood IIIF is JSON-LD and models everything as an annotation, so I think this framework is going to see a lot more use across a range of biodiversity projects. It certainly inspired me to add IIIF to my newly relaunched BioStor.


Agency


It's more a kind of pragmatic Archimedean sense that you might be able to move some subset of the world connected to any system on which you have root access or any project for which you're building a key component—from the leverage point of the command line. (The Emergence of Digital Humanities by Steven E. Jones, emphasis added)
One final thought which struck me is the notion of "agency", in the sense of a person being able to do things. For me one of the joys of biodiversity informatics is that I can make stuff that seems useful (if only to me). If, say, BHL ignores articles, well, you can grab their data and build something that finds those articles. If the data is available, and you can code, then there are few limits to what you can do. Even if you can't code, limits to what people can do are being removed. You have citizen scientists like @SiobhanLeachman (who presented at Biodiversity Next) revelling in the wealth of tools such as Wikipedia, Wikidata, etc. that enable her to add to biodiversity knowledge.

Do that at scale, as demonstrated by Carrie Seltzer's keynote on iNaturalist, and you can get millions of data points added by a passionate, empowered community.

Yet, I would find myself talking to biodiversity professionals working at some of the world's leading museums and herbaria, and they had far less agency than someone like Siobhan. They have no influence over the databases and software they use, even trivial changes aren't made because... reasons. Seemingly obvious suggestions of things that could be done, or offers of additional data are met with responses along the lines of "even if you gave us that data, we couldn't do anything with it because there's not a field in our database."

As a somewhat cranky, independent-minded academic, I greatly value the freedom to create things, and I'm extremely lucky that I can do that. But it is interesting to see that people fascinated by science but who are not employed as scientists often have more agency than the professional scientists. And maybe that's why I'm resistant to large conferences such as Biodiversity Next (and processors such as e-Biosphere 09). They represent the increasing professionalisation of the field, and with that often comes decreasing agency. When I grow up, I want to be a citizen scientist.