- Nobody uses LSIDs except the biodiversity informatics crowd, have we missed something?
- LSIDs don't play nice with the Linked Data/Semantic Web world, which is much bigger than us
- If we adopt HTTP URIs, will this send the wrong message to data providers (LSIDs imply a commitment to persistence, URLs don't)
- The community has invested a lot in LSIDs, it's too late to change course now
I've been twittering (@rdmpage) about some of this, and Pierre Lindenbaum blogged about my earlier paper on testing LSIDs (doi:10.1186/1751-0473-3-2), so I decided to return to one of the original goals of my bioGUID project, namely providing a tool to resolve existing identifiers in a consistent way (see the now moribund bioGUID blog, I now blog about bioGUID here on iPhylo). One of the goals of bioGUID was to take an identifier and return RDF. I also had an underlying triple store that was populated with this RDF. After a hardware crash I took the opportunity to rebuild bioGUID from scratch, focussing on OpenURL access to literature. Now, I'm looking at LSIDs again.
The standard response to the concern that the rest of the world has gone down the HTTP URI route is to say that we can stick a HTTP proxy on the front of the LSID (e.g., http://lsid.tdwg.org/urn:lsid:indexfungorum.org:names:21364) and play ball with the Linked Data crowd, who are rapidly linking diverse data sets together:
However, sticking a HTTP proxy on an LSID isn't enough. As outlined in the document Cool URIs for the Semantic Web, we need a way of distinguishing between a HTTP URI that identifies real-world objects or concepts (such as a person or a car), and documents describing those things (put another way, if I put a HTTP URI for Angelina Jolie into a web browser, I expect to get a document describing her, not Ms Jolie herself) . One solution (and the one that is gaining traction) is to use 303 redirects to make this explicit:
A client resolving a URI for a thing will get a 303 status code, telling them that the URI identifies an object. They can get the appropriate representation via content negotiation (a web browser wants HTML, a linked data browser wants RDF).
Data URIs. So, in order to get LSIDs to play ball with Linked Data we need a HTTP proxy that supports 303 redirects (as Roger Hyam pointed out). I've implemented a simple one as part of bioGUID. If you append a LSID to http://bioguid.info/ you get a HTTP URI that passes the
Vapour Linked Data validator tests. For example, http://bioguid.info/urn:lsid:indexfungorum.org:names:21364 resolves to a web page in a browser, but clients that ask for RDF will get that. You can see the steps involved in resolving this Cool URI here. Vapour provides a nice graphical overview of the process:
The TDWG LSID proxy doesn't validate, so this is something that should be addressed.