Showing posts with label conference. Show all posts
Showing posts with label conference. Show all posts

Wednesday, June 03, 2009

e-Biosphere '09: Twitter rules, and all that


So, e-Biosphere '09 is over (at least for the plebs like me, the grown ups get to spend two days charting the future of biodiversity informatics). It was an interesting event, on several levels. It's late, and I'm shattered, so this post ill cover only a few things.

This was first conference I'd attended where some of the participants twittered during proceedings. A bunch of us settled on the hashtag #ebio09 (you can also see the tweets at search.twitter.com). For the uninitiated, a "hashtag" is a string preceded by a hash symbol (#), to indicate that it is a tag, such as #fail. It provides a simple way to tag tweets so that others interested in that topic can find them.

Twittering created a whole additional layer to the conference. We were able to:

Twitter greatly enhanced the conversation, noticeably when a speaker said something controversial (all too rare, sadly), or when a group rapporteur's summary didn't reflect all the views in that group. It also helped document what was going on, and this can be further exploited. For fun, I grabbed tweets from days 2 and 3 and made a wordle:
As @edwbaker noted @edwbaker @rdmpage The size of 'together', 'people' & 'visionary' is somewhat telling...... In case you're wondering about the prominence of "Knowlton", it's because Nancy Knowlton gave a nice talk highlighting the every increasing number of cases where we have no names for the things we are encountering (for example, when barcoding fresh samples from poorly studied environments). This is just one example of the huge disconnect between the obsession with taxonomic names in biodiversity informatics, and the reality of metagenomics and DNA barcoding. Just as worrying is the lack of resemblance of the taxonomic classification used by the Encyclopedia of Life and our notion of the evolutionary tree of those organisms. A systematist would find much of EOL's classification laughable. I don't want to bash EOL, but it's worrying that they can continue to crank out press releases, but fail to provide something like a modern classification.

But I digress. In many ways this was less of a scientific conference and more of an event to birth a discipline, namely "biodiversity informatics" (which I'm sure some would claim as been around for quite a while). So, the event was to attract attention to the topic, and assure the outside world (and those attending) that the field exists and has something to say. It also was billed as a forum to discuss strategies for its future. Sadly, much of this discussion will take place behind closed doors, and will feature the major players who bring money and influence (but not much innovation) to the table.

Symptomatic of this lack of innovation, in a sense, was the contrast between the official "Online Conference Community", and the twitter feed. When I asked if anybody on twitter had used the official forum, @fak3r replied tellingly: @rdmpage thought we were on it ;) #ebio09. As fun as it is to use the new hotness to conduct a parallel (and slightly subversive) discussion at a conference it's worrying that, in a field that calls itself "informatics" the big beasts probably had little idea what was going on. If we are going to exploit the tools the web provides, we need people who "get it", and I'm unconvinced that the big players in this area truly grasp the web (in all it's forms). There's also a worrying degree of physics envy, which might be cured by reading The Unreasonable Effectiveness of Data (doi:10.1109/mis.2009.36).

I tried to stir things up a little (almost literally as captured in this photo by Chris Freeland), with a couple of questions, but to not much effect (other than apparently driving to despair the poor chap behind me ).


But enough grumbling. It was great to see lots of people attending the event, the were lots of interesting posters and booths (creating a market for this field may go some way towards providing an incentive to provide better, more reliable services), and my challenge entry won joint first prize, so perhaps I should sit back, enjoy the wine Joel Sachs choose as the prize (many thanks for his efforts in putting the challenge event together), and let others say what they thought of the meeting.

Thursday, August 14, 2008

BNCOD 2008 Workshop


The proceedings of the BNCOD 2008 Workshop on "Biodiversity Informatics: challenges in modelling and managing biodiversity knowledge" are online. This workshop was held in conjunction with the 25th British National Conference on Databases (BNCOD 2008) at Cardiff, Wales. The papers make interesting reading.

Exploring International Plant Names Index (IPNI) Data using Visualisation by Nicola Nicolson [PDF]
This paper describes visualisation as a means to explore data from the International Plant Names Index (IPNI). Several visualisations are used to display large volumes of data and to help data standardisation efforts. These have potential uses in data mining and in the exploration of taxon concepts.
Nicky explores some visualisations of the IPNI plant name database. Unfortunately only one of these (arguably the east exciting one) is shown in the PDF. The visualisations of citation history using Timeline, and social networks using prefuse are mentioned, but not shown.

Scratchpads: getting biodiversity online, redefining publication by Vince Smith et al. [PDF]
Taxonomists have been slow to adopt the web as a medium for building research communities. Yet, web-based communities hold great potential for accelerating the pace of taxonomic research. Here we describe a social networking application (Scratchpads) that enables communities of biodiversity researchers to manage and publish their data online. In the first year of operation 466 registered users comprising 53 separate communities have collectively generated 110,000 pages within their Scratchpads. Our approach challenges the traditional model of scholarly communication and may serve as a model to other research disciplines beyond biodiversity science.
This is a short note describing Scratchpads, which are built using the Drupal content management system (CMS). Scratchpads provide a simple way for taxonomists to get their content online. Based in large measure on the success of scratchpads, EOL will use Drupal as the basis of their "Lifedesks". There are numerous scratchpads online, although the amount and quality of content is, um, variable.

Managing Biodiversity Knowledge in the Encyclopedia of Life by Jen Schopf et al. [PDF]
The Encyclopedia of Life is currently working with hundreds of Content Providers to create 1.8 million aggregated species pages, consisting of tens of millions of data objects, in the next ten years. This article gives an overview of our current data management and Content Provider interactions.
This is a short note on EOL itself. I've given my views on EOL's progress (or, rather, lack thereof) elsewhere (here, here and here). The first author on this paper has left the project, and at least one of the other authors is leaving. It seems EOL has yet to find its feet (it certainly has no idea of how to use blogs).


Distributed Systems and Automated Biodiversity Informatics: Genomic Analysis and Geographic Visualization of Disease Evolution by Andrew Hill and Robert Guralnick [doi:10.1007/978-3-540-70504-8_28]
A core mission in biodiversity informatics is to build a computing infrastructure for rapid, real-time analysis of biodiversity information. We have created the information technology to mine, analyze, interpret and visualize how diseases are evolving across the globe. The system rapidly collects the newest and most complete data on dangerous strains of viruses that are able to infect human and animal populations. Following completion, the system will also test whether positions in the genome are under positive selection or purifying selection, a useful feature to monitor functional genomic charac-teristics such as, drug resistance, host specificity, and transmissibility. Our system’s persistent monitoring and reporting of the distribution of dangerous and novel viral strains will allow for better threat forecasting. This information system allows for greatly increased efficiency in tracking the evolution of disease threats.
This paper is was one of two contributions chosen to be proceedings BNCOD 2008 ("Sharing Data, Information and Knowledge", doi:10.1007/978-3-540-70504-8, ISBN 978-3-540-70503-1). Rob Guralnick has put a free version online (see his comment below). It describes the very cool system being developed to provide near real time visualisation of disease spread and evolution, and builds on some earlier work published in Systematic Biology (doi:10.1080/10635150701266848).

LSID Deployment in the Catalogue of Life by Ewen Orme et al. [PDF]
In this paper we describe a GBIF/TDWG-funded project in which LSIDs have been deployed in the Catalogue of Life’s Annual and Dynamic Checklist products as a means of identifying species and higher taxa in these large species catalogues. We look at the technical infras- tructure requirements and topology for the LSID resolution process and characteristics of the RDF (Resource Description Framework) metadata returned by the resolver. Such characteristics include the use of concepts and relationships taken from the TDWG (Taxonomic Database Working Group) ontology and how a given taxon LSID relates to others includ- ing those issued by database providers and those above and below it in the taxonomic tree. Finally we evaluate the pro ject and LSID usage in general. We also look to the future when the CoL LSID infrastructure will have to deal changing taxonomic information, annually in the case of the Annual Checklist and possibly much more frequently in the case of the Dynamic Checklist.

Although I was an early adopter of LSIDs (in my now defunct Taxonomic Search Engine doi:10.1186/1471-2105-6-48 and the very-much alive LSID Tester, doi:10.1186/1751-0473-3-2), I have some reservations about them. The Catalogue of Life uses UUIDs to generate the LSID identifier, which makes for rather ugly looking LSIDs, as David Shorthouse has complained. For example, the LSID for Pinnotheres pisum urn:lsid:catalogueoflife.org:taxon:ef0ae064-29c1-102b-9a4a-00304854f820:ac2008 (gack). Why these ugly UUIDs? Well, one advantage is that they can be generated in a distributed fashion and remain unique. This would make sense for a project like the Catalogue of Life, which aggregates names from a range of contributors, but in actual fact all the LSIDs at present are of the form "xxxxxxxx-29c1-102b-9a4a-00304854f820", indicating that they are being generated centrally (by MySQL's UUID function, in this case).

Ironically, when I was talking to Frank Bisby earlier this year, he implied that LSIDs would change with each release if the information about a name changed, thus failing to solve the existing, fundamental design flaw in the Catalogue of Life, namely the lack of stable identifiers! So, at first glance we are stuck with hideous-looking identifiers that may be unstable. Hmmm...

Workflow Systems for Biodiversity Researchers: Existing Problems and Potential Solutions by Russel McIver et al. [PDF]
In this paper we discuss the potential that scientific work- flow systems have to support biodiversity researchers in achieving their goals. This potential comes through their ability to harness distributed resources and set up complex, multi-stage experiments. However, there remain concerns over the usability of existing workflow systems and re- search still needs to be done to help match the functionality of the soft- ware to the needs of its users. We discuss some of the existing concerns regarding workflow systems and propose three potential interfaces in- tended to improve workflow usability. We also outline the software ar- chitecture that we have adopted, which is designed to make our proposed workflow interface software interoperable across key workflow systems.
Not sure what to make of this paper. Workflows seem to generate an awful lot of publications, and few tools that people actually use.


Visualisation to Aid Biodiversity Studies through Accurate Taxonomic Reconciliation by Martin Graham et al. [doi:10.1007/978-3-540-70504-8_29]
All aspects of organismal biology rely on the accurate identification of specimens described and observed. This is particularly important for ecological surveys of biodiversity, where organisms must be identified and labelled, both for the purposes of the original research, but also to allow reinterpretation or reuse of collected data by subsequent research projects. Yet it is now clear that biological names in isolation are unsuitable as unique identifiers for organisms. Much modern research in ecology is based on the integration (and re-use) of multiple datasets which are inherently complex, reflecting any of the many spatial and temporal environmental factors and organismal interactions that contribute to a given ecosystem. We describe visualization tools that aid in the process of building concept relations between related classifications and then in understanding the effects of using these relations to match across sets of classifications.
The second contribution published in the conference proceedings, but there is also free version available here from the project's blog. The paper describes TaxVis, a project developing visualisation techniques for comparing multiple taxonomic hierarchies.

The paper discusses taxonomic concepts and the difficulty of establishing what a taxonomist meant when they used a particular name. As much as I understand the argument, I can't shake the feeling that obsessing about taxonomic concepts is ultimately a dead end. It won't scale, and in an age of DNA barcoding, it becomes less and less relevant.

Releasing the content of taxonomic papers: solutions to access and data mining by Chris Lyal and Anna Weitzman [PDF]
Taxonomic information is key to all studies of biodiversity. Taxonomic literature contains vast quantities of that information, but it is under-utilised because it is difficult to access, especially by those in biodiverse countries and non-taxonomists. A number of initiatives are making this literature available on the Web as images or even as unstructured text, but while that improves accessibility, there is more that needs to be done to assist users in locating the publication; locating the relevant part of the publication (article, chapter etc) and locating the text or data required within the relevant part of the publication. Taxonomic information is highly structured and automated scripts can be used to mark-up or parse data from it into atomised pieces that may be searched and repurposed as needed. We have developed a schema, taXMLit that allows for mark-up of taxonomic literature in this way. We have also developed a prototype system, INOTAXA that uses literature marked up in taXMLit for sophisticated data discovery.
This is a nice overview of the challenge of extracting information from legacy literature. There are numerous challenges facing this work, including taks that are trivial for people, such as determining when an article starts and ends, but which are challenging for computers (see Lu et al. doi:10.1145/1378889.1378918, free copy here -- there is a job related to this question available now). Related efforts are the TaxonX markup being used by Plazi. My own view is that for legacy literature heavy markup is probably overkill, decent text mining will be enough. The real challenge is to stop the rot at source, and enable new taxonomic publications to be marked up as part of the authoring and publishing process.

An architecture to approach distributed biodiversity pollinators relational information into centralized portals based on biodiversity protocols by Pablo Salvanha et a. [PDF]
The present biodiversity distributed solution using DiGIR / TAPIR protocols and the Darwincore2 schema has been very valuable in the centralized portals, which that can provide distributed information in a very quickly way. Using the same concept this paper presents an architecture based on the case study of pollinators to bring the centralization of the relational information to those portals. This architecture is based on a technological structure to facilitate the implementation and extraction from the providers of that relational information, and proposes a model to make this information reliable to be used with the present specimens information on the portal database.
This is a short note on extending DarwinCore to include information about pollination relationships. The wisdom of doing this has been question (see Roger Hyam's comment on the proposal).

A Pan-European Species-directories Infrastructure (PESI) by Charles Hussey and Yde de Jong [PDF]
This communication introduces the rationale and aims of a new Europe-wide biodiversity informatics project. PESI defines and coordinates strategies to enhance the quality and reliability of European biodiversity information by integrating the infrastructural components of four major community networks on taxonomic indexing, namely those of marine life, terrestrial plants, fungi and animals, into a joint work programme. This will include functional knowledge networks of both taxonomic experts and regional focal points, which will collaborate on the establishment of standardised and authoritative taxonomic (meta-) data. In addition PESI will coordinate the integration and synchronisation of the European taxonomic information systems into a joint e-infrastructure and the creation of a common user-interface disseminating the pan- European checklists and associated user-services results.
This paper describes PESI, yet another mega-science project in biodiversity, complete with acronyms, work packages, and vacuous, buzzword-compliant statements. Just what the discipline needs...