Playing with my eLife Lens-inspired article viewer and some recent articles from ZooKeys I regularly come across articles that are incorrectly marked up. As a quick reminder, my viewer takes the DOI for a ZooKeys article (just append it to http://bionames.org/labs/zookeys-viewer/?doi=, e.g. http://bionames.org/labs/zookeys-viewer/?doi=10.3897/zookeys.316.5132), fetches the corresponding XML and displays the article.
Taking the article above as an example, I was browsing the list of literature cited and trying to find those articles in BioNames or BioStor. Sometimes an article that should have been found wasn't, and on closer investigation the problem was that the ZooKeys XML has mangled the citation. To illustrate, take the following XML:
<ref id="B112"><mixed-citation xlink:type="simple"><person-group><name name-style="western"><surname>Tschorsnig</surname> <given-names>HP</given-names></name><name name-style="western"><surname>Herting</surname> <given-names>B</given-names></name></person-group> (<year>1994</year>) <article-title>Die Raupenfliegen (Diptera: Tachinidae) Mitteleuropas: Bestimmungstabellen und Angaben zur Verbreitung und Ökologie der einzelnen Arten. Stuttgarter Beiträge zur Naturkunde.</article-title> <source>Serie A (Biologie)</source> <volume>506</volume>: <fpage>1</fpage>-<lpage>170</lpage>.</mixed-citation></ref>
I've highlighted the contents of the article-title (title) and source (journal) tags, respectively. Unfortunately the actual title and journal should look like this:
<ref id="B112"><mixed-citation xlink:type="simple"><person-group><name name-style="western"><surname>Tschorsnig</surname> <given-names>HP</given-names></name><name name-style="western"><surname>Herting</surname> <given-names>B</given-names></name></person-group> (<year>1994</year>) <article-title>Die Raupenfliegen (Diptera: Tachinidae) Mitteleuropas: Bestimmungstabellen und Angaben zur Verbreitung und Ökologie der einzelnen Arten. Stuttgarter Beiträge zur Naturkunde.</article-title> <source>Serie A (Biologie)</source> <volume>506</volume>: <fpage>1</fpage>-<lpage>170</lpage>.</mixed-citation></ref>
Tools to find articles that rely on accurately parsed metadata, such as OpenURL, will fail in cases like this. Of course, we could use tools that don't have this requirement, but we could also fix the XML so that OpenURL resolves will succeed.
This is where the example of the journal eLife comes in. They deposit article XML in GitHub where anyone can grab it and mess with it. Let's imagine we did the same for ZooKeys, created a GitHub repository for the XML, and then edited it in cases where the article metadata is clearly broken. A viewer like mine could then fetch the XML, not from ZooKeys, but from GitHub, and thus take advantage of any corrections made.
We could imagine this as part of a broader workflow that would also incorporate other sources of articles, such as BHL. We could envisage workflows that take BHL scans, convert them to editable XML, then repurpose that content (see BHL to PDF workflow for a sketch). I like the idea that there is considerable overlap between the most recent publishing ventures (such as eLife and ZooKeys) and the goal of bringing biodiversity legacy literature to life.
Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.
ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.
Showing posts with label eLife. Show all posts
Showing posts with label eLife. Show all posts
Friday, July 12, 2013
Wednesday, June 19, 2013
A new way to view taxonomic publications
One of goals of BioNames is to be more than simply another taxonomic database. In particular, I'm interested in the idea of having a platform for viewing taxonomic publications. One way to think about this is to consider the experience of viewing Wikipedia. For any given page in Wikipedia there will be links to other, related content in Wikipedia. Reading an article about a city, you can go and read about the country the city occurs in. Reading about a battle, you can discover more about the generals who fought it. The ability to discover all this interconnected information in one place is compelling.
I'd like something similar for taxonomy. Given that a taxonomic database is in essence a collection of taxonomic names and publications, and a taxonomic publication is in essence a collection of names and citations of taxonomic publications, why not embed the publication within the database and have the names and citations link to the corresponding entries in the database?
Based on some earlier efforts (e.g., Towards an interactive taxonomic article: displaying an article from ZooKeys) and inspired by the eLife Lens project, I've created a live demo of a way to view articles from the journal ZooKeys. Below is a screencast:
If you want to try this out, here are some live examples:
Note the pattern in the URL, just append the DOI for an article to http://bionames.org/labs/zookeys-viewer/?doi=
Everything is a bit rough, but it's working well enough for you to get the basic idea. Code is in github Essentially the viewer grabs the ZooKeys HTML, extracts the URL for the XML file, fetches that, then uses some XSLT style sheets to convert the XML into something viewable. There's a sprinkling of Javascript to call the BioNames API. Much of the code could be tweaked to accepted other NLM XML-based articles, such as content from PLoS and the BMC journals.
One direction this could go in is to make a viewer like this the default viewer in BioNames for ZooKeys articles, so that instead of being restricted to a PDF you can interactively navigate between the article and the cited literature. Indeed, the very action of locating cited references in BioNames builds citation links. We could imagine extending the approach to content that isn't in NLM XML, such as Zootaxa PDFs, or content from BHL. Eventually I'd like to have the taxonomic literature fully embedded in the database, not as PDF or image silos, but as documents linked to names and literature. The journal becomes a database.
I'd like something similar for taxonomy. Given that a taxonomic database is in essence a collection of taxonomic names and publications, and a taxonomic publication is in essence a collection of names and citations of taxonomic publications, why not embed the publication within the database and have the names and citations link to the corresponding entries in the database?
Based on some earlier efforts (e.g., Towards an interactive taxonomic article: displaying an article from ZooKeys) and inspired by the eLife Lens project, I've created a live demo of a way to view articles from the journal ZooKeys. Below is a screencast:
If you want to try this out, here are some live examples:
- http://bionames.org/labs/zookeys-viewer/?doi=10.3897/zookeys.183.3073 Description of Alpheus cedrici sp. n., a strikingly coloured snapping shrimp (Crustacea, Decapoda, Alpheidae) from Ascension Island, central Atlantic Ocean
- http://bionames.org/labs/zookeys-viewer/?doi=10.3897/zookeys.99.723 The spider family Selenopidae (Arachnida, Araneae) in Australasia and the Oriental Region
- http://bionames.org/labs/zookeys-viewer/?doi=10.3897/zookeys.154.1963 At the lower size limit for tetrapods, two new species of the miniaturized frog genus Paedophryne (Anura, Microhylidae)
Note the pattern in the URL, just append the DOI for an article to http://bionames.org/labs/zookeys-viewer/?doi=
Everything is a bit rough, but it's working well enough for you to get the basic idea. Code is in github Essentially the viewer grabs the ZooKeys HTML, extracts the URL for the XML file, fetches that, then uses some XSLT style sheets to convert the XML into something viewable. There's a sprinkling of Javascript to call the BioNames API. Much of the code could be tweaked to accepted other NLM XML-based articles, such as content from PLoS and the BMC journals.
One direction this could go in is to make a viewer like this the default viewer in BioNames for ZooKeys articles, so that instead of being restricted to a PDF you can interactively navigate between the article and the cited literature. Indeed, the very action of locating cited references in BioNames builds citation links. We could imagine extending the approach to content that isn't in NLM XML, such as Zootaxa PDFs, or content from BHL. Eventually I'd like to have the taxonomic literature fully embedded in the database, not as PDF or image silos, but as documents linked to names and literature. The journal becomes a database.
Labels:
BioNames,
eLife,
eLife Lens,
XML,
ZooKeys
Subscribe to:
Posts (Atom)