Monday, December 13, 2010

How do I know if an article is Open Access?

open-access-logo.jpg.png
One of my pet projects is to build a "Universal Article Reader" for the iPad (or similar mobile device), so that a reader can seemlessly move between articles from different publishers, follow up citations, and get more information on entities mentioned in those articles (e.g., species, molecules, localities, etc.). I've made various toys towards this, the latest being a HTML5 clone of Nature's iPhone app.

One impediment to this is knowing whether an article is Open Access, and if so, what representations are available (i.e., PDF, HTML, XML). Ideally, the "Universal Article Reader" would be able to look at the web page for an article, determine whether it can extract and redisplay the text (i.e., is the article Open Access) and if so, can it, for example, grab the article in XML and reformat it.

Some journals are entirely Open Access, so for these journals the first problem (is it Open Access?) is trivial, but a large number of journals have a mixed publishing model, some articles are Open Access, some aren't. One thing publishers could do that would be helpful would be to specify the access status of an article in a consistent manner. Here's a quick survey at how things stand at the moment.

JournalRights
PLoSOneEmbedded RDF, e.g. <license rdf:resource="http://creativecommons.org/licenses/by/2.5/" />
Nature Communications<meta name="access" content="Yes" /> for open, <meta name="access" content="No" /> for close
Systematic Biology<meta name="citation_access" content="all" /> for open, this tag missing if closed
BioOneNothing for article, Open Access icon next to open access articles in table of contents
BMC Evolutionary Biology<meta name ="dc.rights" content="http://creativecommons.org/licenses/by/2.0/" />
Philosophical Transactions of the Royal Society<meta name="citation_access" content="all" /> for open access
Microbial EcologyNo metadata (links and images in HTML)
Human Genomics and Proteomics<meta name ="dc.rights" content="http://creativecommons.org/licenses/by/2.0/" />


A bit of a mess. Some publishers embed this information in <meta> tags (which is good), some (such as PLoS) embed RDF (good, if a little more hassle), some leaves us in the dark, or give vidual clues such as logos (which mean nothing to a computer). In some ways this parallels the variety of ways journals have implemented RSS feeds, which has lead to some explicit Recommendations on RSS Feeds for Scholarly Publishers. Perhaps the time is right to develop equivalent recommendations for article metadata, so that apps to read the scientific literature can correctly determine whether they can display an article or not.