Tuesday, August 31, 2010

ReaderMeter: what's in a name?

Screen_shot_2010-08-30_at_22.37.31.pngDario Taraborelli has released ReaderMeter, an elegant app built on top of the Mendeley API. You enter an author's name and it summarises that authorship's readership in Mendeley. The app provides some summary statistics (mine are shown below), and if you click on the horizontal bar corresponding to a paper, you can see a visualisation of who is reading your paper, including a nice map.


As ever with author names, there are issues of people's name having more than one spelling. In Mendeley I'm known as Roderic D. M. Page, R. D. M. Page, Rod Page, Roderic Page, Roderic D. M. Page, and doubtless some others. Searching ReaderMeter using different spellings of my name gives different results. There are various approaches to tackling this problem, I've touched on one approach earlier.

However, there's a different way to tackle this problem in the context of apps like ReaderMeter, because if you're a Mendeley user you can assert that you are the author of a paper (these papers live in your "My Publications" collection). Using Mendeley's API, an app could retrieve this list of publications (providing the user gave it access), and we could compute readership statistics from the set of articles "known" to be authored (leaving aside the issue of people gaming the system by spuriously claiming authorship). In this way the app relies on the default behaviour of Mendeley users - uploading and self-identifying the articles they've written.

Implementing a feature like this posses two problems. The first is access to a user's data. Mendeley's API supports OAuth, so it could be done in such a way that only the account's user could authorise the app to access this list. The app could store the fact that the user has verified that the list of publications. Think of it as a bit like Amazon's Real Name™ feature.

The other obstacle is Mendeley's API, which returns readership statistics for public documents (i.e., those in the central aggregation). At present, using the API there is no way to link the global id for a Mendeley reference (e.g., ae7dd6a0-6d09-11df-936c-0026b95e484c) with the local id (e.g., 3582682802) that reference has in a user's collection, unless we resort to trying to match articles by searching by identifiers or article titles. If the API exposed these links, apps like ReaderMeter could become even more powerful (and personalised).

Viewing scientific articles on the iPad: iBooks

Apple's iBooks app is an ePub and PDF reader, and one could write a lengthy article about its interface. However, in the context of these posts on visualising the scientific article there's one feature that has particularly struck me. When reading a book that cited other literature the citations are hyper-links: click on one and iBooks forwards you (via the page turning effect) to the reference in the book's bibliography. This can be a little jarring (one minute you're reading the page, next you're in the bibliography), but to help maintain context the reference is preceded by the snippet of text in which it is cited:


To make this concrete, here's an example from Clarky Shirky's "Cognitive Surplus."


In the body of the text (left) the text "notes in his book The Success of Open Source" (which I've highlighted in blue) is a hyper-link. Click on it, and we see the source of the citation (right), together with the text that formed the hyper-link. This context helps remind you why you wanted to follow up the citation, and also provides the way back to the text: click on the context snippet and you're taken back to the original page.

Providing context for a citation is a nice feature, and there are various ways to do this. For example, the Elsevier Life Sciences Challenge entry by Wan et al. ("Supporting browsing-specific information needs: Introducing the Citation-Sensitive In-Browser Summariser", doi:10.1016/j.websem.2010.03.002, see also an earlier version on CiteSeer) takes a different approach. Rather than provide local context for a citation in an article (a la iBooks), Wan et al. provide context-sensitive summaries of the reference cited to help the the reader judge whether it's worth her time to fetch the reference and read it.

Both of these approaches suggest that we could be a lot more creative about how we display and interact with citations when viewing an article.

Viewing scientific articles on the iPad: Mendeley

Previously I've looked at the Nature, PLoS, and Papers apps, now it's the turn of the Mendeley iPad app. As before, this isn't a review of the app as such, I'm more interested in documenting how the app interface works, with a view to discovering if there are consistent metaphors we can use for navigating bibliographic databases.

Perhaps the key difference between Mendeley and the other apps is that Mendeley is cloud-based, in that the bibliography exists on Mendeley's servers, as well as locally on your desktop, iPad, or iPhone. Hence, whereas the Nature and PLoS apps consume a web stream of documents, and Papers enables you to sync collections between desktop and iOS devices, Mendeley syncs to central web server. At present this appears to be done over HTTPS. Mendeley recently released an API, which I've discussed at length. Mendeley's app doesn't use this API, which is a pity because if it did I suspect the API would be getting the love it needs from Mendeley's developers.

Like Papers, the Mendley app uses a split view, where the left-hand panel is used for navigation.


You can drill down to lists of references, and display basic details about an article.

The Mendeley app is a PDF viewer, but whereas the PLoS app has page turning, and the Papers app scrolls pages from left to right, the Mendeley app displays PDF pages vertically (which is probably the more natural way to scroll through content on the iPad):


It's clearly early days for the Mendeley app, but it's worth noting two of its most obvious limitations. Firstly, it depends entirely on the user's existing Mendeley bibliography - you can't add to this using the app, it's simply a viewer. Compare this to Papers which can access a suite of search engines from which you can download new papers (albeit with some limitations, for example the Papers iPad app doesn't seem to support extracting metadata via XMP, unlike the desktop version). Secondly, despite Mendeley having as one of its goals being a
research network that allows you to keep track of your colleagues' publications, conference participations, awards etc., and helps you discover people with research interests similar to yours

the Mendeley app lacks any social features, apart from sharing by email(!). I think designing social interactions in bibliographic apps will be a challenge. For an example of what social reading can look like, check out Flipboard.

Viewing scientific articles on the iPad: Papers

ipad_iphone.gifContinuing the series of posts about reading scientific articles on the iPad, here are some quick notes on perhaps the most polished app I've seen, Papers for iPad. As with earlier posts on the Nature and PLoS apps, I'm not writing an in-depth review - rather I'm interested in the basic interface design.

Papers is available for the Mac, as well as the iPhone and iPad. Unlike social bibliographic apps such as Zotero and Mendeley, Papers lacks a web client. Instead, all your PDFs are held on your Mac, which can be wirelessly synced with Papers on the iPad or iPhone.

Navigation popover

Papers makes extensive of use of the split view, in which the screen is split into two panes, the left-hand split becoming a popover when you hold the iPad in porrtait orientation. Almost all of the functionality of the iPhone version is crammed into the left-hand split. The popover displays the main interface categories (library, help, collections that you've put PDFs into), collections of documents, metadata for individual papers (which you can edit), as well as search results from a wide range of databases:


Some of these features you encounter as you drill down, say from library to list of papers, to details about a document, others you can access by clicking on the tab bar at the bottom.

PDF display
Like the PLoS app, Papers displays PDFs. It doesn't use a page-turning effect, rather you swipe through the pages from left to right, with the current page indicated below in a page control (what Sencha Touch describe as a carousel control).


Given that the document being displayed is a PDF there is no interaction with the images or citations, but you can add highlights and annotations.

Papers is the first of the iPad apps I've discussed that isn't limited to a single publisher. If and article is online, or in your copy of Papers for the Mac, then you can view it in Papers for iPad. It is the app that I use on a day to day basis, although the PDF viewer can feel a little clunky. I think anyone designing an application reader should play with Papers for a while, if only to see the level of functionality that can be embedded in the basic iPad split view.

Saturday, August 28, 2010

On being open: Mendeley and open data versus open source

Paulo Nuin, not the biggest fan of Mendeley wrote a blog post entitled Mendeley is going to be open source, in which he wrote:

After extensively researching some material online, analysing many blog posts and statements made by people linked to Mendeley, checking my sources, I reached the conclusion that soon Mendeley is going to be open source.

Among the essays Paulo read is Jason Hoyt's post on the Mendeley blog: Dear researcher, which side of history will you be on?. In response to a question about open sourcing the Mendeley client, Jason replied:
I get asked a lot about open sourcing Mendeley when I go to speaking events. I always state that we are open to the possibility, but then ask how many people know how to type a URL verus how many know how to program in C++? That’s why we went with the Open API first instead of open sourcing the desktop software. If you can type a URL, which is what the API is based upon, then you can build on top of Mendeley. You don’t need to know how to program.

Despite the fact that open sourcing the desktop client is the second most requested feature for Mendeley, I think Jason is right. I also think Paulo's campaign to make Mendeley open source is misguided. The client doesn't matter. OK, yes, it's probably the reason most people use Mendeley, but there are lots of competing clients (EndNote, Zotero, Papers, etc.), and there are several bibliographic data formats (RIS, EndNote XML, BibTeX) and essentially one document format (PDF) that they support, so individual users don't have to worry about locking their individual bibliographies into a proprietary format. Couple this with the existence of an API (albeit a pretty crap one), and whether an individual software client is closed or open source doesn't matter much.

Will the data be open?

However, what makes Mendeley different is the aggregation of bibliographic data (35 million references and counting).


I'd argue it's the fate of this aggregation that matters. In a comment on the Guardian's piece Mendeley 'most likely to change the world for the better', Jane Good wrote:
"World-changing potential"? This utopian fantasy stuff is a little much, no? After all, we're talking about a for-profit corporation using closed-source software to monitor private usage habits for monetary gain. And how exactly is this company meant to sustain its millions of dollars of annual burn on a few measly storage subscriptions? At some point the data will have to go up for sale to the highest bidder, plain and simple. The API, as it exists now, does not provide access to that data, and it probably never will, right, DrGinn[sic]?

Toning down the rhetoric, the question is Mendeley, Scopus, Talis – will you be making your data Open?:

But how can a company create an income stream from Open Scientific content? That’s the a question for me for this decade. If we can solve it we can transform the world. If however the linked Open data are all going to be through paywalls, portals, query engines then we regress into the feudal information possession of the past. I hope the companies present in this session can help solve this. It won’t be easy but it has to be done. So I now ask Mendeley, Elsevier/Scopus, Talis: Are your data Openly available for re-use?

For me the question of whether the source code for the Mendeley desktop will be made open source is a red herring, and ultimately a distraction from the real question — will the data be open?

Friday, August 27, 2010

Navigating the Encyclopedia of Life tree on the desktop and the iPhone

This week seems to be API week. The Encyclopedia of Life API Beta Test has been out since August 12th. By comparison with the Mendeley API that I've spent rather too much time trying to get to grips with, the EOL API release seems rather understated.

However, I've spent the last couple of days playing with it in order to build a simple tree navigating widget, which you can view at http://iphylo.org/~rpage/eoltree/.

The widget resembles Aaron Thompson's Taxonomy (formerly called KPCOFGS) iPhone app in that it uses the iPhone table view to list all the taxa at a given level in a taxonomic tree. Clicking on a row in this table takes you to the descendants of the corresponding taxon, clicking "Back" takes you back up the tree. if you've reached a leave node (typically a species) the widget displays a snippet of information about that taxon. It also resembles Javier de la Torre's taxonomic browser written in Flex.

Here's a screen shot of the widget running in a desktop web browser:


Here's the same widget in the iPhone web browser:

web.pngUsing the API
The EOL API is pretty straightforward. I call the http://www.eol.org/api/docs/hierarchy_entries API to get the tree rooted at a given node, then populate each child of that node using http://www.eol.org/api/docs/pages. The result is a simple JSON file that I cache locally to speed up performance and avoid hitting the EOL servers for the same information. because I'm locally caching the API calls I need a couple of PHP scripts to do this, but everything else is HTML and Javascript.

iPhone and iPad
I've not really developed this for the iPhone. I've cobbled together some crude Javascript to simulate some iPhone-like effects, but if I was serious about the phone I'd look into one of the Javascript kits available for iPhone development. However, I did want something that was similar in size to the iPhone screen. The reason is I'm looking at adding taxonomic browsing to the geographic browser I described in the post Browsing a digital library using a map, so I wanted something easy to use but which didn't take up too much space. In the same way that the Pygmybrowse tree viewer I played with in 2006 was a solution to viewing a tree on a small screen, I think developing for the iPhone forces you to strip things down to the bare essentials.

I'm also keeping the iPad in mind. In portrait mode some apps display lists in a popover like this:


This popover takes up a similar amount of screen space to the entire iPhone screen, so if I was to have a web app (or native app) that had taxonomic navigation, I'd want it to be about the size of the iPhone.

Let me know what you think. Meantime I need to think about bolting this onto the map browser, and providing a combined taxonomic and geographic perspective on a set of documents,

Thursday, August 26, 2010

Mendeley API PHP client

mendeley.pngFollowing on from my earlier post about the Mendeley API, I've bundled up my code for OAuth access to the Mendeley API for anyone who's interested in playing with the API using PHP. You can browse the code on Google Code, or grab a tarball here. You'll need a consumer key and a consumer secret from Mendeley for the demos to work, and if you're behind a HTTP proxy you'll have to tweak the code (this is explained in the ReadMe.txt file that comes with the code).

The code is pretty rough, and doesn't use all the Mendeley API calls, but I've other things to do, and it felt like a case of either bundle this up now, or it will get lost among a host of other projects. The Mendeley API still feels woefully under-developed. I'd be more interested in developing this client further if the API was powerful enough to do the kinds of things I'd like to do.

Tuesday, August 24, 2010

Browsing a digital library using a map

Every so often I revisit the idea of browsing a collection of documents (or specimens, or phylogenies) geographically. It's one thing to display a map of localities for single document (as I did most recently for Zootaxa), it's quite another to browse a large collection.

Today I finally bit the bullet and put something together, which you can see at http://biostor.org/maps/. The website comprises a Google Map showing localities extracted from papers in BioStor, and a list of the papers that have one or more points visible on the map.


In building this I hit a few obstacles. The first is the number of localities involved. I've extracted several thousand point localities from articles in BioStor. Displaying all these on a Google Map is going to be tedious. Fortunately, there's a wonderful library called MarkerCluster, part of the google-maps-utility-library-v3 that handles this problem. MarkerCluster cluster together markers based on zoom level. If you zoom out the markers cluster together, as you zoom in these clusters will start to resolve into their component points. Very, very cool.

The second challenge was to have the list of references update automatically as we move around or zoom in and out on the map. To do this I need to know the bounding box currently being displayed in the map, I can then query the MySQL database underlying BioStor for the localities within the bounding box, using MySQL's spatial extensions. The query is easy enough to implement using ajax, but the trick was knowing when to call it. Initially, listening for the bounds_changed event seemed a good idea. However, this event is fired as the map is being moved (i.e., if the user is panning or dragging the map a whole series of bounds_changed events are fired), whereas what I want is something that signals that the user has stopped moving the map, at which point I can query the database for articles that correspond to the region that map is currently displaying. Turns out that the event I need to listen for is idle (see Issue 1371: map.bounds_changed event fires repeatedly when the map is moving), so I have a function that captures that event and loads the corresponding set of articles.

Another "gotcha" occurs when the region being viewed crosses longitude 180° (or -180°) (see diagram below from http://georss.org/Encodings).


In this case the polygon used to query MySQL would be incorrectly interpreted, so I create two polygons, each with 180° or -180° as one of the boundaries, and merge the articles with points in either of those two polygons.

I've made a short video showing the map in action. Although I've implemented this for BioStor, the code is actually pretty generic, and could easily be adapted to other cases where we want to navigate through a set of objects geographically.

Viewing scientific articles on the iPad: the PLoS Reader

Continuing on from my previous post Viewing scientific articles on the iPad: towards a universal article reader, here are some brief notes on the PLoS iPad app that I've previously been critical of.

There are two key things to note about this app. The first is that it uses the page turning metaphor. The article is displayed as a PDF, a page at a time, and the user swipes the page to turn it over. Hence, the app is simulating paper on the iPad screen.


But perhaps more interesting is that, unlike the Nature app discussed earlier, the PLoS app doesn't use a custom API to retrieve articles. Instead the app uses RSS feeds from the PLoS site. PLoS provides journal-specific RSS feeds, as well as subject-specific feeds within journals (see, for example, the PLoS ONE home page). The PLoS Reader app takes these feeds and uses them to create a list of articles the reader can choose from.

A nice feature of the PLoS ATOM feeds is the provision of links to alternative formats for the article (unlike many journal RSS feeds, which provide just a DOI or a URL). For example, the feed item for the article "Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" doi:10.1371/journal.pone.0012303 contains links to the PDF and XML versions of the article:

<link rel="related"
title="(PDF) Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" />
<link rel="related"
title="(XML) Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" />

This makes the task of an article reader much easier. Rather than attempt to screen scrape the article web page, or rely on a rule for constructing the link to the desired file, the feed provides an explicit URL to the different available formats.

I've not seen this feature in other journal RSS feeds, although article web pages sometimes provide this information. BMC journals, for example, provide <link rel="alternate"> tags in the web page for each article, from which we can extract links to the XML and PDF versions, and some journals (BMC included) provide the Google Scholar metadata data tag <meta name="citation_pdf_url"> to link to the PDF. Hence, a generic article reader will need to be able to extract metadata tags from article web pages as it seeks formats suitable to display.

Monday, August 23, 2010

TreeView X now on Google Code

tv.pngTreeView X, the open source version of TreeView, has been slowly suffering bit rot as C++ compilers and operating systems change. Every so often I'd tweak the code to build on some Linux version or other, but this isn't something I've a lot of time for. Moreover, because of the hassle of rebuilding binaries and source tar balls the updated versions weren't uploaded to the TreeView X web site. Charles Plessy has been doing a nice job of keeping a Debian package working, but my own code wouldn't build on newer versions of Linux.

Today I managed to get my code to build on Fedora Core 13 using gcc 4.4.4 and the latest stable version of wxWidgets (2.8.11). The program runs OK, although there are some display glitches, and printing seems broken. However, my days of standalone application development are over, so I thought I should make the code accessible to anyone else who may find it of use. The code is now hosted on Google Code.

Viewing scientific articles on the iPad: towards a universal article reader

There are a growing number of applications for viewing scientific articles coming out for the iPhone and iPad. I'm toying with extending the experiments described in an earlier post when I took the PLoS iPad app to task for being essentially a PDF page-turner, so I thought I should take a more detailed look at the currently available apps. In particular, I'm interested in how the apps solve some basic tasks, and whether there is a consistent "vocabulary" for interacting with an article. Put less pretentiously, do the apps display things such as lists of articles, citations, references, figures, and bibliographic data in similar ways, or does the user have to learn new rules for each app? I'm also interested in how the apps treat the article (e.g., as a monolithic PDF, as a document with pages, or as a web document where pagination has no meaning), and how they get their content (from a publisher, from the user's social network, from the user's personal library).

In this post I'm going to look at Nature.com's app. Future posts will explore other apps. I'm interested in what people have done so far, and how we could improve the reading experience. Long term I'm interested in whether there's scope for a "universal article reader" that can take diverse formats (including XML, PDF, and page images) and display them in a consistent and useful way. In the diagrams below I'm using touch gesture symbols from Graffletopia (see Touch Gesture Reference Guide).

Nature's app is limited to articles published by Nature, and displays the available articles as a list with thumbnails of a figure from the article. The app fetches this list using Nature's mobile API. Up until April 30th the fulltext of an article was free, at at present you are limited to getting abstracts. It's interesting that the list of articles isn't retrieved using a RSS feed, I presume because Nature wanted to use some simple authentication to avoid users downloading all their closed-access content for free.


Article display
Nature's app, unlike all the others I've seen so far, doesn't use PDFs. Instead it uses ePub. Unlike many ePub book readers (including Apple's own iBooks), the Nature app doesn't render the article as a series of pages, but as one continuous document that you scroll down by dragging (it's essentially a web page). You can't zoom the text, but the text size is fine for reading.


Citations in the body of the article are links. If you tap them the full citation slides in from the right, with a link to the publisher's website. If you tap the link the app opens the website within the app. This can be a little jarring as you move from a customised view of an article to a web page designed for a desktop. In the case of a Nature article, it would be more elegant if the app recognised that the cited reference was a Nature article and rendered it natively in the app. More generally the transition between app and website might be less jarring if journal publishers developed mobile versions of their websites.


The figures aren't displayed directly in the body of the article, but each mention of a figure in the body of the text is a link. Tapping the link causes the figure to slide up from the bottom of the screen. A button in the top right hand corner enables you to toggle between displaying the figure and it's caption (shown as white text on a black background). You can use pinch and spread to zoom in and out of the figure, as well as save it to the photo library on your device.


I've started with the Nature app as I think it's the only one so far to seriously tackle the challenge of displaying an article on a mobile device. Instead of displaying PDFs it repackages the articles in ePub format and the result is much more interactive than a PDF.

I hope to explore other article viewing apps in later posts, but it's worth noting that we should also be looking at other apps for ideas. Personally I really like the Guardian's iPhone app, which I use as my main news reader. It has a nice gallery feature to display thumbnails of images (imagine a gallery of an article's figures), and uses tags effectively.

Monday, August 16, 2010

More on the Mendeley API

After playing with the public API for Mendeley over the weekend (see Social citations: using Mendeley API to measure citation readership) I've had a quick play with the user specific part of the API. This API enables apps to connect with a user's account, so you could imagine using it to personalise citations lists (as I mentioned in the previous post), or building apps to handle a user's reading list (to complement Mendeley's existing desktop and iPhone clients).

Once again, it's frustrating just how rough the API is. The documentation is incomplete and contains errors, and some of the API calls simply don't work (see this post). I know I'm sounding like a broken record, but this API really needs a test suite. The quickest way to annoy potential users of the API is to get them to find really obvious bugs for you.

With a test suite in mind, I've created a simple app that enables you to connect to your Mendeley account and perform a bunch of simple tasks. The hardest part of getting this working was getting my head around OAuth. Luckily, @abraham has written a PHP library to support OAuth access to Twitter's API, so I grabbed that and replaced Twitter-specific code with the equivalent code for Mendeley.

You can try the app here: http://iphylo.org/~rpage/mendeley/moauth/.

The first time you go to the app is shows a button to connect to Mendeley. If you click on it you'll see something like this:
(if you're not already logged in to Mendeley it may ask you to log in — note that all of this happens on Mendeley's site, my app never knows your username or password details). If you're willing to try the app, allow it to connect to your account. You'll then see a bunch of API requests and results. All but one of the requests is simply displaying information. One request does try to add a test document (the one listed on the Mendeley developer's site), but at the moment this part of the API doesn't seem to work (nor does the call to get the list of papers that you've authored).

If and when Mendeley get the API working fully (and documented) there's a lot of scope here. But what I'd really like to see is Mendeley develop a test suite that runs through every API call and checks that the methods work as advertised.

Saturday, August 14, 2010

Social citations: using Mendeley API to measure citation readership

Quick note on an app I threw together using the Mendeley API that I discussed in the previous post. This app is crude, and given that the Mendeley API is rate-limited and in flux it might not work for you.

The basic idea is to embellish make the list of literature cited in an article with information that might help a reader decide whether a given citation is worth reading. One clue might be how many people on Mendeley are reading that article. So, my app takes an article, extracts the list of cited literature, and for each article with a PubMed identifier it asks Mendeley "how many readers does this article have?" For now the app is restricted to using articles from the BiomedCentral series as these have Open Access XML with literature cited lists that contain PubMed numbers (PLoS articles, for instance, don't have these, for now I'm avoiding the overhead of finding identifiers for the articles). I'm using PubMed identifiers as the Document Details method in the Mendeley API doesn't handle DOIs at present.

The app is at http://iphylo.org/~rpage/mendeley/, and the default article I've chosen to demonstrate the app is Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta doi:10.1186/1471-2156-10-16, but you can enter the DOI of any BMC article to give it a try. Below is a screenshot of part of the list of literature cited by this paper, together with readership numbers:


The default article has 4 readers in Mendeley. The readership of the articles it cites varies, but one article stands out with 208 readers.

There are huge limitations with this app (it doesn't cache the Mendeley results, so repeated use will exceed the rate limits), it is limited to citations in PubMed (could add support for DOIs and title searches), and only BMC articles can be processed.

What would be interesting is to extend this in other directions. For example, if the user had a Mendeley account, it would be nice to flag which articles the reader already had in their library (and perhaps have the ability to add those that weren't to the library). To personalise the citation readership display I'd need to add support for OAuth, which Mendeley uses to authorise access to user accounts.

If Mendeley were to provide more social features in their API then we could add flags indicating whether any of user's contacts have any of these articles in their libraries (your decision to read a paper might be influenced by whether a contact of yours has read it -- think of it as a resembling the Facebook "Like" button). Or we could display the readers themselves, so you could discover people with potentially similar interests to your own.

My twitter stream has been full of complaints about the Mendeley API — life on the bleeding edge is not always fun. But the API does have the potential to support some cool applications, once it gets the kinks ironed out.

Thursday, August 12, 2010

Mendeley API: we'll bring the awesome if you bring the documentation

mendeley.pngMenedeley's API has been publicly launched at http://dev.mendeley.com/, accompanied by various announcements such as:
Mendeley's Research API is now open to the public. Developers, go forth and bring the awesome :) http://dev.mendeley.com/ (@subcide)

Finally saw the awesome Easter Egg that @subcide hid on the new dev.mendeley.com Developer Portal! Whoaaa! (@mendeley_com)

All good fun to be sure, but it's a pity more effort has been spent on Easter eggs than on documenting and testing the API. If you visit the API development site there's precious little in the way of documentation, and few examples. As well as making a developer's life harder, adding examples would have helped catch some bugs, such as the failure of the API calls to return details such as volume, issue, and page numbers for articles, and the inability to retrieve a document using a DOI (the '/' that a DOI contains breaks the API). These are fairly obvious things. If resources are limiting, perhaps the Mendeley API team should open up the development web site to others to help create documentation and examples. A wiki would be one way to do this.

Menedeley is a great idea, but on occasion the hype gets ahead of reality. The product has a lot of potential, but also has some significant problems. Using the search API you pretty quickly encounter its number one problem: duplicates. I get the sense that Mendeley is about three things:

  1. Managing personal bibliographies and generating citations (desktop client)

  2. Networking ("the Last.fm of research") (web site)

  3. Bibliographic data

Number 3 is, I suspect, the hardest problem to tackle, and it is where the ultimate value lies (think citation networks, audience data, iTunes-like business model for selling articles, etc.). I'd like Mendeley a lot more if I was confident that they had a good handle on the complexities of bibliographic data (and didn't drop pagination from API calls). Good places to start are "Are your citations clean?" (doi:10.1145/1323688.1323690) and "Learning metadata from the evidence in an on-line citation matching scheme" (doi:10.1145/1141753.1141817), both currently duplicated in Mendeley (try searching for Are your citations clean and Learning metadata from the evidence in an on-line citation matching scheme).

Friday, August 06, 2010

Extracting semantic goodness from Zootaxa articles


I've just come back from a holiday in New Zealand, during which time I spent a morning chatting with Zhi-Qiang Zhang (@Zootaxa, editor of Zootaxa) and Stephen Thorpe (stho002, a major contributor to Wikispecies).

Fresh from playing with PLoS XML to explore ways of redisplaying articles (described in my commentary on the PLoS iPad app), I was extolling the virtues of the XML mark-up that underlies PLoS (and other Open Access journals, such as the BMC series). These publishers provide Open Access XML versions of their papers that are quite richly marked up: internal citations, links to figures, the bibliography, etc. are all clearly identified, although they don't have the semantic mark-up of TaxPub, used in some recent Zookeys papers.

Talking to Zhi-Qiang Zhang is always a useful reality check. Zootaxa describes itself as the
World's foremost journal in taxonomy; publisher of 15,421 new taxa in 141,518 pages by 7,385 authors worldwide since 2001

This is taxonomic publishing on a grand scale, averaging more than an article a day. Since 2004 Zootaxa has published 12.60% percent of the new taxa recorded in Zoological Record, an order of magnitude more it's nearest rival. The journal is being tightly run, and doesn't have cash to spare (it has nothing like the funding PLoS has, for example). Any change to the basic work flow (author submits Word file, this is imported into Adobe Framemaker, which creates the PDF files displayed on the Zootaxa web site) requires compelling justification. Furthermore, any change would have to scale. The level of work required to embellish articles using custom mark-up, such as TaxPub, just isn't feasible.

Zhi-Qiang waxed enthusiastically about Google Books' interface, where basic information such as keywords, geographic location, and references are extracted automatically. Google Books was one inspiration for the article display I use in BioStor, so I wondered how hard it would be to take some of the work I've been doing on BioStor and on adding mark-up to PLoS XML and apply it to Zootaxa PDFs. After some fussing with regular expressions, the bioGUID OpenURL resolver and uBio's FindIT taxonomic name tool, I've some scripts that automate extracting basic information from a Zootaxa PDF, such as the abstract, localities, taxonomic names, GenBank sequences, and the bibliography. You can see some examples at http://iphylo.org/~rpage/zootaxa/. It's all a bit crude, and isn't the same as being able to mark-up the actual text (which could be done, but with rather more effort), but there's potential here to create nice interfaces to Zootaxa papers, as well as extract the data needed to do some interesting queries.