Showing posts with label eLife. Show all posts
Showing posts with label eLife. Show all posts

Friday, July 12, 2013

Learning from eLife: GitHub as an article repository

Playing with my eLife Lens-inspired article viewer and some recent articles from ZooKeys I regularly come across articles that are incorrectly marked up. As a quick reminder, my viewer takes the DOI for a ZooKeys article (just append it to http://bionames.org/labs/zookeys-viewer/?doi=, e.g. http://bionames.org/labs/zookeys-viewer/?doi=10.3897/zookeys.316.5132), fetches the corresponding XML and displays the article.

Taking the article above as an example, I was browsing the list of literature cited and trying to find those articles in BioNames or BioStor. Sometimes an article that should have been found wasn't, and on closer investigation the problem was that the ZooKeys XML has mangled the citation. To illustrate, take the following XML:

<ref id="B112"><mixed-citation xlink:type="simple"><person-group><name name-style="western"><surname>Tschorsnig</surname> <given-names>HP</given-names></name><name name-style="western"><surname>Herting</surname> <given-names>B</given-names></name></person-group> (<year>1994</year>) <article-title>Die Raupenfliegen (Diptera: Tachinidae) Mitteleuropas: Bestimmungstabellen und Angaben zur Verbreitung und Ökologie der einzelnen Arten. Stuttgarter Beiträge zur Naturkunde.</article-title> <source>Serie A (Biologie)</source> <volume>506</volume>: <fpage>1</fpage>-<lpage>170</lpage>.</mixed-citation></ref>

I've highlighted the contents of the article-title (title) and source (journal) tags, respectively. Unfortunately the actual title and journal should look like this:

<ref id="B112"><mixed-citation xlink:type="simple"><person-group><name name-style="western"><surname>Tschorsnig</surname> <given-names>HP</given-names></name><name name-style="western"><surname>Herting</surname> <given-names>B</given-names></name></person-group> (<year>1994</year>) <article-title>Die Raupenfliegen (Diptera: Tachinidae) Mitteleuropas: Bestimmungstabellen und Angaben zur Verbreitung und Ökologie der einzelnen Arten. Stuttgarter Beiträge zur Naturkunde.</article-title> <source>Serie A (Biologie)</source> <volume>506</volume>: <fpage>1</fpage>-<lpage>170</lpage>.</mixed-citation></ref>

Tools to find articles that rely on accurately parsed metadata, such as OpenURL, will fail in cases like this. Of course, we could use tools that don't have this requirement, but we could also fix the XML so that OpenURL resolves will succeed.

This is where the example of the journal eLife comes in. They deposit article XML in GitHub where anyone can grab it and mess with it. Let's imagine we did the same for ZooKeys, created a GitHub repository for the XML, and then edited it in cases where the article metadata is clearly broken. A viewer like mine could then fetch the XML, not from ZooKeys, but from GitHub, and thus take advantage of any corrections made.

We could imagine this as part of a broader workflow that would also incorporate other sources of articles, such as BHL. We could envisage workflows that take BHL scans, convert them to editable XML, then repurpose that content (see BHL to PDF workflow for a sketch). I like the idea that there is considerable overlap between the most recent publishing ventures (such as eLife and ZooKeys) and the goal of bringing biodiversity legacy literature to life.

Wednesday, June 19, 2013

A new way to view taxonomic publications

One of goals of BioNames is to be more than simply another taxonomic database. In particular, I'm interested in the idea of having a platform for viewing taxonomic publications. One way to think about this is to consider the experience of viewing Wikipedia. For any given page in Wikipedia there will be links to other, related content in Wikipedia. Reading an article about a city, you can go and read about the country the city occurs in. Reading about a battle, you can discover more about the generals who fought it. The ability to discover all this interconnected information in one place is compelling.

I'd like something similar for taxonomy. Given that a taxonomic database is in essence a collection of taxonomic names and publications, and a taxonomic publication is in essence a collection of names and citations of taxonomic publications, why not embed the publication within the database and have the names and citations link to the corresponding entries in the database?

Based on some earlier efforts (e.g., Towards an interactive taxonomic article: displaying an article from ZooKeys) and inspired by the eLife Lens project, I've created a live demo of a way to view articles from the journal ZooKeys. Below is a screencast:



If you want to try this out, here are some live examples:

Note the pattern in the URL, just append the DOI for an article to http://bionames.org/labs/zookeys-viewer/?doi=

Everything is a bit rough, but it's working well enough for you to get the basic idea. Code is in github Essentially the viewer grabs the ZooKeys HTML, extracts the URL for the XML file, fetches that, then uses some XSLT style sheets to convert the XML into something viewable. There's a sprinkling of Javascript to call the BioNames API. Much of the code could be tweaked to accepted other NLM XML-based articles, such as content from PLoS and the BMC journals.

One direction this could go in is to make a viewer like this the default viewer in BioNames for ZooKeys articles, so that instead of being restricted to a PDF you can interactively navigate between the article and the cited literature. Indeed, the very action of locating cited references in BioNames builds citation links. We could imagine extending the approach to content that isn't in NLM XML, such as Zootaxa PDFs, or content from BHL. Eventually I'd like to have the taxonomic literature fully embedded in the database, not as PDF or image silos, but as documents linked to names and literature. The journal becomes a database.