Wednesday, July 30, 2008

iSpecies gets automated tagging

Given that the clones are hot on my heels, I feel the need to add more bells and whistles to iSpecies. The first new feature is automated tagging, and uses Yahoo's Term Extraction API. I send the titles of any papers found, and the Wikipedia snippet, and Yahoo returns keywords ("tags").


As an example, here are the tags for one of my favourite animals, Helice crassa.


mud crabs mangrove estuary muddy sediments mud crab sea coasts mud flats sex ratios habitat preferences activity patterns laboratory conditions estuarine gills burrows endemic respiration original article morphology ventilation dana biology


I think these give a nice sense of what we know about this crab.

I'm storing the tags for future analysis. I think there are some interesting ideas to explore, such as clustering the tags into meaningful groups. I'm also interested in how much we can learn about an organism based on these keywords. Can we automatically infer something about the ecology of the organism?

There is also scope for adding some semantics. Some of these tags are taxon names, and some refer to geographic places. Some are concepts, which could be linked to the relevant page in Wikipedia (Faviki is an example of this approach). At present the tags aren't clickable (i.e., you can't query by tag), but that would be a useful feature. One could get taxa that were tagged with a given term, such as "estuarine". For now, it's a quick way to get a sense of what we know about a taxon.