Wednesday, December 08, 2010

First thoughts on CiteBank and BHL-Europe

This week saw the release of two tools from the Biodiversity Heritage Library, CiteBank and the BHL-Europe portal. Both have actually been quietly around for a while, but were only publicly announced last week.

In developing a new tool there are several questions to ask. Does something already exist that meets my needs? If it doesn't exist, can I build it using an existing framework, or do I need to start from scratch? As a developer it's awfully tempting sometimes to build something from scratch (I'm certainly guilty of this). Sometimes a more sensible approach is to build on something that already exists, particularly if what you are building upon is well supported. This is one of the attractions of Drupal, which underlies CiteBank and Scratchpads. In my own work I've used Semantic Mediawiki to support editable, versioned databases, rather than roll my own. Perhaps the more difficult question for a developer is whether they need to build anything at all. What if there are tools already out there that, if not exacty what you want, are close enough (or most likely will be by the time you finish your own tool).

CiteBank is an open access platform to aggregate citations for biodiversity publications and deliver access to biodiversity related articles. CiteBank aggregates links to content from digital libraries, publishers, and other bibliographic systems in order to provide a single point of access to the world’s biodiversity literature, including content created by its community of users. CiteBank is a project of the Biodiversity Heritage Library (BHL).

I have two reactions to CiteBank. Firstly, Drupal's bibliographic tools really suck, and secondly, why do we need this? As I've argued earlier (see Mendeley, BHL, and the "Bibliography of Life"), I can't see the rationale for having CiteBank separate from an existing bibliographic database such as Mendeley or Zotero. These tools are more mature, better supported, and address user needs beyond simply building lists of papers (e.g., citing papers when writing manuscripts).

For me, one of BHL's goals should be integrating the literature they have scanned into mainstream scientific literature, which means finding articles, assigning DOIs, and becoming in effect a digital publishing platform (like BioOne or JSTOR). Getting to this point will require managing and cleaning metadata for many thousands of articles and books. It seems to me that you want to gather this metadata from as many sources as possible, and expose it to as many eyes (and algorithms) as possible to help tidy it up. I think this is a clear case of it being better to use an existing tool (such as Mendeley), rather than build a new one. If a good fraction of the world's taxonomists shared their person bibliographies on Mendeley we'd pretty much have the world's taxonomic literature in one place, without really trying.

It's early days for BHL-Europe, and they've taken the "lets use an existing framework" approach, basing the BHL-Europe portal on DISMARC, the later being a EU-funded project to "encourage and support the interoperability of music related data".

BHL-Europe is the kind of web site only its developers could love. It's spectacularly ugly, and a classic example of what digital libraries came up with while Google was quietly eating their lunch. Here's the web site showing search results for "Zonosaurus":


Yuck! Why do these things have to be so ugly?. DISMARC was designed to store metadata about digital objects, specifically music. Look at commercial music interfaces such as iTunes, Spotify, and Or even academic projects such as mSpace.

To be useful BHL-Europe really needs to provide an interface that reflects what its users care about, for example taxonomic names, classification, and geography. It can't treat scientific literature as a bunch of lifeless metadata objects (but then again, DISMARC managed to do this for music).

Where next?
CiteBank and BHL-Europe seem further additions to the worthy but ultimately deeply unsatisfying attempts to improve access biodiversity literature. To date our field has failed to get to grips with aggregating metadata (outside of the library setting), creating social networks around that aggregation, and providing intuitive interfaces that enable users to search and browse productively. These are big challenges. I'd like to see the resources that we have put to better use, rather than being used to build tools where suitable alternatives already exist (CiteBank), or used to shoe horn data into generic tools that are unspeakably ugly (BHL-Europe portal) and not fit for purpose. Let's not reinvent the wheel, and let's not try and convince ourselves that squares make perfectly good wheels.