iPhylo: publication

Roderic D. M. Page

Showing posts with label publication. Show all posts

Friday, July 09, 2010

Wikipedia paper out

My short note on "Wikipedia as an Encyclopaedia of Life" has appeared in Organisms Diversity & Evolution (doi:10.1007/s13127-010-0028-9) (yes, I do occasionally write papers). A preprint of this paper is available on Nature Precedings (hdl: 10101/npre.2010.4242.1).

My presentation at iEvoBio covers much the same ground, and is included below, although the paper was written before I made the mapping from NCBI taxa to Wikipedia pages.

Phyloinformatics in the age of Wikipedia (warning, do not view if easily offended)

View more presentations from Roderic Page.

Thursday, May 06, 2010

Linnaeus meets the Internet: PLoS + Botany = #fail

To much fanfare (e.g., Nature News, "Linnaeus meets the Internet" doi:10.1038/news.2010.221), on May 5th PLoS ONE published Sandy Knapp's "Four New Vining Species of Solanum (Dulcamaroid Clade) from Montane Habitats in Tropical America" doi:10.1371/journal.pone.0010502. To quote the Nature News piece:

The paper represents the culmination of a campaign to institute the electronic publication of scientific names, a case Knapp and others have made in journals including Nature[doi:10.1038/446261a]. Allowing electronic publication should make accessing information easier for scientists worldwide — especially those in developing countries who may not have access to fully stocked libraries. This, in turn, will aid conservation efforts, Knapp says.

Given the profile of this paper, "...the first time new plant names have been published in a purely electronic journal and still complied with ICBN rules", you'd think the participants would ensure the electronic aspects of the publication worked. Sadly, this is not the case.

The four names in question have apparently been deposited in IPNI with the following LSID's:

Solanum aspersum: urn:lsid:ipni.org:names:77103633-1

Solanum luculentum: urn:lsid:ipni.org:names:77103634-1

Solanum sanchez-vegae: urn:lsid:ipni.org:names:77103635-1

Solanum sousae: urn:lsid:ipni.org:names:77103636-1

Today is May 6th. None of these names are returned by a search of IPNI, for example http://www.ipni.org/ipni/simplePlantNameSearch.do?find_wholeName= returns this:

Resolving the LSID returns this:


<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:dc="http://purl.org/dc/elements/1.1/" 
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:tn="http://rs.tdwg.org/ontology/voc/TaxonName#"
xmlns:tm="http://rs.tdwg.org/ontology/voc/Team#"    
xmlns:tcom="http://rs.tdwg.org/ontology/voc/Common#"    
xmlns:p="http://rs.tdwg.org/ontology/voc/Person#"    
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#">
<tn:TaxonName rdf:about="urn:lsid:ipni.org:names:77103633-1">	
<tcom:versionedAs rdf:resource="urn:lsid:ipni.org:names:77103633-1:1.2"/>
<tcom:Deleted>Yes</tcom:Deleted>
</tn:TaxonName>  
</rdf:RDF>

Hmmm, so apparently this record has been "deleted"?

The paper also states that:

The IPNI LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID contained in this publication to the prefix http://ipni.org/.

This sentence mirrors similar ones in other PLoS ONE papers saying we can resolve ZooBank LSIDs by appending the LSID to http://zoobank.org (e.g., see doi:10.1371/journal.pone.0001787).

Thing is, URLs such as http://ipni.org/urn:lsid:ipni.org:names:77103633-1 return a 404 from Kew (any IPNI LSID I've tried does this).

Update As per Alan Paton's comment below, the http://ipni.org prefix now works.

So, to recap:

The names aren't in IPNI

The LSIDs state the record has been deleted

The LSID's can't be resolved by the means stated in the paper

Now, I don't know what happened (perhaps IPNI wanted to hold off until the paper actually appeared before releasing the names), but the paper is out, the buzz in Nature is out, and IPNI doesn't have the resolver in place, yet alone the names.

Given the milestone this paper represents, and the fuss over the publication of the name Darwinius, you'd expect the bioinformatics side of it to be, you know, actually working. In these circumstances, how on Earth do we make the case that the LSID and name databasing side of taxonomic publication is useful?

Sunday, April 18, 2010

Elsevier Grand Challenge paper out

At long last the peer-reviewed version of the paper "Enhanced display of scientific articles using extended metadata" (doi:10.1016/j.websem.2010.03.004), in which I describe my entry in the Elsevier Grand Challenge, has finally appeared in the journal Web Semantics: Science, Services and Agents on the World Wide Web. The pre-print version of this paper has been online (hdl:10101/npre.2009.3173.1) for a year prior to appearance of the published version (24 April 2009 versus 3 April 2010), and the Challenge entry itself went online in December 2008. Unfortunately the published version has an awful typo in the title (that was in neither the manuscript nor the proofs).

Given this typo, the time lag between doing the work, writing the manuscript, and seeing it published, and the fact that I've already been to meetings where my invitation has been based the entry and the pre-print, I do wonder why on Earth would I bother with traditional publication (which is somewhat ironic, given the topic of the paper)?

Tuesday, July 14, 2009

How to publish a journal RSS feed

This morning I posted
this tweet:

Harvesting Nuytsia RSS http://science.dec.wa.gov.a... Non-trivial as links are not to individual articles #fail

My grumpiness (on this occasion, seems lots of things seem to make me grumpy lately) is that often journal RSS feeds leave a lot to be desired. As RSS feeds are a major source of biodiversity information (for a great example of their use see uBio's RSS, described in doi:10.1093/bioinformatics/btm109) it would be helpful if publishers did a few basic things. Some of these suggestions are in Lisa Roger's RSS and Scholarly Journal Tables of Contents: the ticTOCs Project, and Good Practice Guidelines for Publishers, but some aren't.

In the spirit of being constructive, here are some dos and don'ts.

Do

1. Validate the RSS feed

Fun your feed through Feed Validator to make sure it's valid. Your feed is an XML document, if it won't validate then aggregators may struggle with it. Testing it in your favourite web browser isn't enough, but if a browser fails to display it this may be a clue that something's wrong. For example, Safari won't display the ZooKeys RSS feed, and at the time of writing this feed is not valid.

2. Make sure your feed is autodiscoverable

When I visit your web page my browser should tell me that there is a feed available (typically with a RSS icon in the location bar). If there's no such icon, then I have to look at your page to find the feed (if it exists). The Nuytsia page is an example of a non-discoverable feed. To make your feed autodiscoverable is easy, just add a link tag inside the head tag on the page. For example, something like this:


<link rel="alternate" type="application/rss+xml"
title="RSS Feed for Nuytsia"
href="http://science.dec.wa.gov.au/nuytsia/nuytsia.rss.xml" />

3. Use standard identifiers as the links

If your journal has DOIs, use those in the links, not the URL of the article web page. The later is likely to change (the DOI won't, unless you are being naughty), and given a DOI I can harvest the metadata via CrossRef.

4. Each item link in the feed links to ONE article

This was the reason for my grumpy tweet. The journal Nuytsia has a RSS feed (great!), but the links are not to individual articles. Instead, they are database queries that may generate one or more results. For example, this link RYE, B.L., (2009). Reinstatement of the Western Australian genus Oxymyrrhine (Myrtaceae : Chamelauci... actually lists two papers, both authored by B. L. Rye. This breaks the underlying model where the feed lists individual articles.

5. Include lots of metadata in your feed

If you don't use DOIs, then include metadata about your article in your feed. That way, I don't need to scrape your web pages, all I need is already in the feed.

6. Make it possible to harvest metadata about your articles

If you don't use DOIs are your article identifier, or use the DOIs are the item links in your RSS feed, then make it easy for me to get the bibliographic details from either the RSS feed, or from the web page. If you use RSS 1.0, then ideally you are using PRISM and I can get the metadata from that. If not, you can embed the metadata in the HTML page describing the article using Dublin Core and meta and link tags. For example, if you resolve this doi:10.1076/snfe.38.2.115.15923 and view the HTML source you will see this:


<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />
<meta http-equiv="Content-Language" content="en-gb" />
<link rel="shortcut icon" href="/mpp/favicon.ico" />
<meta name="verify-v1" content="xKhof/of+uTbjR1pAOMT0/eOFPxG8QxB8VTJ07qNY8w=" />
<meta name="DC.publisher" content="Taylor & Francis" />
<meta name="DC.identifier" content="info:doi/10.1076/snfe.38.2.115.15923" />
<meta name="description" content="In this study we determined the effects of topography on the distribution of ground-dwelling ants in a primary terra-firme forest near Manaus, in cent..." />
<meta name="authors" content="Heraldo L. Vasconcelos ,Antônio C. C. Macedo,José M. S. Vilhena" />
<meta name="DC.creator" content="Heraldo L. Vasconcelos" />
<meta name="DC.creator" content="Antônio C. C. Macedo" />
<meta name="DC.creator" content="José M. S. Vilhena" />

Not pretty, but it enables me to get the details I want.

7. Support conditional HTTP GET

If you don't want feed readers and aggregators to hammer your service, support HTTP conditional GET (see here for details) so that feed readers only grab your feed if it has changed. Not many journal publishers do this, if they get overloaded by people grabbing RSS feeds they've only themselves to blame.

Don'ts

1. Sign up/log in

Don't ever ask me to sign up or log in to get the RSS feed (Cambridge University Press, I'm looking at you). If you think your content is so good/precious that I should sign up for it, you are sadly mistaken. Nature doesn't ask me to login, nor should you.

2. Break DOIs

Another major cause of grumpiness is the frequency with which DOIs break, especially for recently published articles (i.e., precisely those that will be encountered in RSS feeds). There is quite simply no excuse for this. If your workflow results in DOIs being put on web pages before they are registered with CrossRef, then you (or CrossRef) are incompetent.

Tuesday, August 12, 2008

Dinosaurs and the Cretaceous Terrestrial Revolution

Shameless plug. One of my former PhD students, Katie Davis, is second author on "Dinosaurs and the Cretaceous Terrestrial Revolution" (doi:10.1098/rspb.2008.0715), which came out recently in Proceedings of the Royal Society. The abstract:

The observed diversity of dinosaurs reached its highest peak during the mid- and Late Cretaceous, the 50 Myr that preceded their extinction, and yet this explosion of dinosaur diversity may be explained largely by sampling bias. It has long been debated whether dinosaurs were part of the Cretaceous Terrestrial Revolution (KTR), from 125–80 Myr ago, when ﬂowering plants, herbivorous and social insects, squamates, birds and mammals all underwent a rapid expansion. Although an apparent explosion of dinosaur diversity occurred in the mid-Cretaceous, coinciding with the emergence of new groups (e.g. neoceratopsians, ankylosaurid ankylosaurs, hadrosaurids and pachycephalosaurs), results from the ﬁrst quantitative study of diversiﬁcation applied to a new super tree of dinosaurs show that this apparent burst in dinosaurian diversity in the last 18 Myr of the Cretaceous is a sampling artefact. Indeed, major diversiﬁcation shifts occurred largely in the ﬁrst one-third of the group’s history. Despite the appearance of new clades of medium to large herbivores and carnivores later in dinosaur history, these new originations do not correspond to signiﬁcant diversiﬁcation shifts. Instead, the overall geometry of the Cretaceous part of the dinosaur tree does not depart from the null hypothesis of an equal rates model of lineage branching. Furthermore, we conclude that dinosaurs did not experience a progressive decline at the end of the Cretaceous, nor was their evolution driven directly by the KTR.

Now, if we could just get the bird supertree paper out the door...

Wednesday, April 30, 2008

Paper published

Bit of a rarity these days. My paper on identifiers in biodiversity informatics, which I mentioned earlier when I deposited the preprint at Nature Precedings, has been published in Briefings in Bioinformatics (doi:10.1093/bib/bbn022).

Here's the abstract:

A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.

Monday, February 18, 2008

LSID Tester, a tool for testing Life Science Identifier resolution services

My short note on the LSID Tester tool has been published in the Open Access journal Source Code for Biology and Medicine. The article has just come out so the DOI (doi:10.1186/1751-0473-3-2) isn't live yet, the direct link is http://www.scfbm.org/content/3/1/2/. Source code for the tester is available from Google Code.

Friday, May 18, 2007

TBMap paper out

My paper on mapping TreeBASE names to other databases is out as provisional PDF on the BMC Bioinformatics web site (doi:10.1186/1471-2105-8-158 -- not working yet).

The abstract:

TreeBASE is currently the only available large-scale database of published organismal phylogenies. Its utility is hampered by a lack of taxonomic consistency, both within the database, and with names of organisms in external genomic, specimen, and taxonomic databases. The extent to which the phylogenetic knowledge in TreeBASE becomes integrated with these other sources is limited by this lack of consistency.
Taxonomic names in TreeBASE were mapped onto names in the external taxonomic databases IPNI, ITIS, NCBI, and uBio, and graph G of these mappings was constructed. Additional edges representing taxonomic synonymies were added to G, then all components of G were extracted. These components correspond to "name clusters", and group together names in TreeBASE that are inferred to refer to the same taxon. The mapping to NCBI enables hierarchical queries to be performed, which can improve TreeBASE information retrieval by an order of magnitude.
TBMap database provides a mapping of the bulk of the names in TreeBASE to names in external taxonomic databases, and a clustering of those mappings into sets of names that can be regarded as equivalent. This mapping enables queries and visualisations that cannot otherwise be constructed. A simple query interface to the mapping and names clusters is available at: http://linnaeus.zoology.gla.ac.uk/~rpage/tbmap

The TBMap web site needs some work, it's really only intended to document the mapping. Once I've tweaked and updated the mapping, I hope to use it in my forthcoming all-sining, all-dancing, phylogeny database...