Sunday, February 12, 2006

Rob McCool on Rethinking the Semantic Web

Having read Rob McCool's articles on Rethinking the Semantic Web (brought to my attention by Bob McMorris' comment on my earlier post on globally unique identifiers), I think he makes very interesting points, but they are not all relevant to whether biodiversity informatics adopts RDF.

In terms of whether the dream of the Semantic Web will happen, I suspect he is right - technologies such as tags and microformats will be a lot easier to adopt, and will make more effective use of existing tools. I'm not writing the Semantic Web off, but McCool's point about keeping things very simple is, I think, on the money.

Much of the work on RDF and the Semantic Web has been done in academia, and most examples concern things such as relationships between people and projects (typically computer science projects in, you guessed it, the Semantic Web). Within a small academic community there is often a small problem scope, consistent vocabulary (or at least, it is tractable to develop either a vocabulary or a mapping between vocabularies), obvious identifiers, experience with ontologies, and a limited set of problems. My sense is that biodiversity informatics fits this model. If the goal is to integrate databases of integrate taxonomic names, specimens, images, character data, DNA sequences, and publications, and make inferences based on this aggregation of information, then I feel the use of Semantic Web techniques will be quite tractable, indeed productive.

In the same way, much of the scepticism about whether ontologies are actually be useful in the real world (see Clay Shirky's brilliant Ontology is Overrated -- Categories, Links, and Tags, or listen to a MP3) is probably well founded. Again, I think the issue is one of scope. Biologists are used to ontologies, after all what is taxonomic classification but a large ontology with well developed rules for its construction and maintenance?

That said, there are areas in our field where insistence on RDF, controlled vocabularies, and ontologies will probably be counterproductive. Ontologies for morphological characters will, I suspect, prove hard to sell. Even though we have a history of shared terminology (think of papers establishing consistent numbering schemes for setae on insect heads), these shared vocabularies tend to have limited applicability unless they are very general (matching setae on the head of a fly and a louse is tricky), and if they are general (e.g., "legs") they are very low level. There is also the thorny issue that many aspects of morphology are not homologous in evolutionary terms (in what sense are the wings of a fly and a bird both "wings"?). Leaving aside the conceptual issues, this is one area where I think people will balk if it becomes a pain to use ontologies. It's hard enough getting people to use scientific names (never mind remembering that species names such as Homo sapiens should be written in italics). I suspect this is one area (along with scientific literature) where tagging will be a compelling alternative. For an example of the power of tagging literature see Connotea.


McCool's articles are available here: