iPhylo: Connotea

Roderic D. M. Page

Showing posts with label Connotea. Show all posts

Monday, April 20, 2009

Connotea tags

For fun I quickly programmed a little tool for bioGUID that makes use of Connotea's web api. When an article is displayed, the page loads a Javascript script that makes a call to a simple web service that looks up a reference in Connotea and displays a tag cloud if the reference is found. For example, the paper announcing Zoobank (doi:10.1038/437477a) looks like this:

The reference has been bookmarked by 6 people, using 15 tags, some more popular than others. The tags and users are linked to Connotea.

This service is can be accessed at http://bioguid.info/services/connotea.php?uri=<doi here>, for example http://bioguid.info/services/connotea.php?uri=doi:10.1038/437477a. By default it returns JSON (you can also set the name of the callback function by add a &callback= parameter), but you can get HTML by adding &format=html. The HTML is also included in the JSON result, if you want to quickly display something, rather than roll your own.

Basically the service takes the DOI you supply, converts it to an MD5 hash, then looks it up in Connotea. There were a few little "gotcha's", such as the fact that the Connotea user may have bookmarked "doi:10.1038/43747" or the proxied version "http://dx.doi.org/10.1038/43747", and these have different MD5 hashes. My service tries both variations and merges the results.

Thursday, January 31, 2008

How Shall I Integrate Thee? Let Me Count the Ways...

Leigh Dodds has a nice post How Shall I Integrate Thee? Let Me Count the Ways... about different ways to integrate data.

The one where we share identifiers
The one where we're describing the same thing
The one where we're speaking different languages
The one where we're using different units
The one where we're speaking at different levels of abstraction

Apart from the suggestion that Leigh has been watching way too much Friends, there's much food for thought here. I suspect that "The one where we're describing the same thing" is the one I'll be making most use of.

In Rethinking LSIDs versus HTTP URI I argued that most applications will use HTTP URIs, which makes them accessible, but not terribly useful as identifiers, the reason being that I think it is unlikely that people will reuse HTTP URIs ("The one where we share identifiers"). A good example is Connotea, which has its own URIs for each paper its users bookmark. I won't use these URIs as identifiers in my database (if only because if a user resolves them, they get taken to Connotea's web site, not mine). However, I will store any PubMed and DOI identifiers, so that somebody aggregating information from Connotea (say to retrieve user tags) and my database (say, to get links to sequences and specimens) can work out that the Connotea URI and my URI are talking about the same thing.