Thursday, October 21, 2010

Mendeley, BHL, and the "Bibliography of Life"

One of my hobby horses is the disservice taxonomic databases do their users by not linking to original scientific literature. Typically, taxonomic databases either don't cite primary literature, or regurgitate citations as cryptic text strings, leaving the user to try and find item being referred to. With the growing number of publishers that are digitising legacy literature and issuing DOIs, together with the Biodiversity Heritage Library's (BHL) enormous archive, there's really no excuse for this.

Taxonomic databases often cite references in abbreviated forms, or refer to individual pages, rather than citable units such as articles (see my Nomenclators + digitised literature = fail post for details). One way to translate these into links to articles would be to have a tool that could find a page within an article, or could match an abbreviated citation to a full one. This task would be relatively straightforward if we had the "bibliography of life," a freely accessible bibliography of every taxonomic paper ever published. Sadly, we don't...yet.

Bibliography of life

Mendeley is rapidly building a very large bibliography (although exactly how large is a matter of some dispute, see Duncan Hull's How many unique papers are there in Mendeley?), and I'm starting to explore using it as a way to store bibliographic details on a large scale. For example, an increasing number of smaller museum or society journals are putting lists of all their published articles on the web. Typically these are HTML pages rather than bibliographic data, but with a bit of scraping we can convert them to something useful, such as RIS format and import them in to Mendeley. I've started to do this, creating Mendeley groups for individual journals, e.g.:

These lists aren't necessarily complete nor error-free, but they contain the metadata for several thousand articles. If individual societies and museums made their list of publications freely available we would make significant progress towards building a bibliography of life. And with the social networking features of Mendeley, we could have groups of collaborators clean up any errors in the metadata.

Of course, this isn't the only way to do this. I suspect I'm rather atypical in building Mendeley groups containing articles from only one journal, as opposed to groups based on specific topics, and of course we could also tackle the problem by creating groups with a taxonomic focus (such as all taxonomic papers on amphibians). Furthermore, if and when more taxonomists join Mendeley and share their personal bibliographies, we will get a lot more relevant articles "for free." This is Mendeley's real strength in my opinion: it provides rich tools for users to do what they most want to do (manage their PDFs and cite them when they write papers), but off the back of that Mendeley can support larger tasks (in the same way that Flickr's ability to store geotagged photos has lead to some very interesting visualisations of aggregated data).

BioStor
cover.png
For some of the journals I've added to Mendeley I just have bibliographic data, the actual content isn't freely available on line, and in some cases isn't event digitised. But for some journals the content exists in BHL, it's "just" a matter of finding it. This is where my BioStor project comes in. For example, BHL has scanned most of the journal Spixiana. While BHL recognises individual volumes (see http://www.biodiversitylibrary.org/bibliography/40214) it has no notion of articles. To find these I scraped the tables of contents on the Spixiana web site and ran them through BioStor's OpenURL resolver. If you visit the BioStor page for the journal (http://biostor.org/issn/0341-8391) you will see that most of the articles have been identified in BHL, although there are a few holes that will need to be filled.
spixiana.png

These articles are listed in a Mendeley group for Spixiana, with the articles linked to BioStor wherever possible.

CiteBank and on not reinventing the wheel
If we were to use Mendeley as the primary platform for aggregating taxonomic publications, then I see this as the best way to implement "CiteBank". BHL have created CiteBank as an "an open access repository for biodiversity publications" using Drupal. Whatever one thinks of Drupal, bibliographic management is not an area where it shines. I think the taxonomic community should take a good look at Mendeley and ask themselves whether this is the platform around which they could build the bibliography of life.