Wednesday, July 30, 2008

iSpecies gets automated tagging

Given that the clones are hot on my heels, I feel the need to add more bells and whistles to iSpecies. The first new feature is automated tagging, and uses Yahoo's Term Extraction API. I send the titles of any papers found, and the Wikipedia snippet, and Yahoo returns keywords ("tags").


As an example, here are the tags for one of my favourite animals, Helice crassa.


mud crabs mangrove estuary muddy sediments mud crab sea coasts mud flats sex ratios habitat preferences activity patterns laboratory conditions estuarine gills burrows endemic respiration original article morphology ventilation dana biology


I think these give a nice sense of what we know about this crab.

I'm storing the tags for future analysis. I think there are some interesting ideas to explore, such as clustering the tags into meaningful groups. I'm also interested in how much we can learn about an organism based on these keywords. Can we automatically infer something about the ecology of the organism?

There is also scope for adding some semantics. Some of these tags are taxon names, and some refer to geographic places. Some are concepts, which could be linked to the relevant page in Wikipedia (Faviki is an example of this approach). At present the tags aren't clickable (i.e., you can't query by tag), but that would be a useful feature. One could get taxa that were tagged with a given term, such as "estuarine". For now, it's a quick way to get a sense of what we know about a taxon.

7 comments:

  1. Dear Rod
    Yes you are raising a very interestng researh and search feature in near future I would say! Also it is useful for teaching about certain species using its tags. I am afraid little and little sentenses we will use in future writing language.

    ReplyDelete
  2. That's a good idea, but would, I suggest, work better if the tags were links.

    For example, the tag "endemic" could link to, say:

    http://ispecies.org/tags/enedemic

    which would be a page with a list of all iSpecies entries tagged as
    "endemic".

    If you don't want to implement that, you could make the links point to, say:

    http://en.wikipedia.org/wiki/Endemic

    or:

    http://www.technorati.com/tag/endemic

    Finally, you could then implement the "rel-tag" microformat:

    http://microformats.org/wiki/rel-tag

    by adding rel="tag" to each link, thus:

    [a href="http://ispecies.org/tags/enedemic" rel="tag"]enedemic[/a]

    ReplyDelete
  3. Andy,

    Yes, there's a lot more I could do with tags, but this is something I knocked together in the early hours of the morning when I couldn't sleep.

    I'll add clickable links at some point, once I decide how to display the search results. As started in the last paragraph, I'll also need to refine the tags (e.g., using stemming) to group tags such as "parasite" and "parasitic", and it would be handy to map them onto external URIs, especially in Wikipedia and, say, Geonames.

    My priority right now is to explore adding ecological links (such as host-parasite associations), adding geography from sources outside GBIF, and handling scientific names properly. This all requires some serious data mining.

    Thanks for the suggestions. When the tags are clickable I'll certainly make the URLs nice and tidy.

    ReplyDelete
  4. Dear Rod,
    Since the clones should keep in pace with the developments of iSpecies, I feel compelled to add this feature to e-Species too... (grim)

    ReplyDelete
  5. Hi Rod,

    I have to say that ispecies is getting more and more interesting and it has been for a while the first place I visit to find out about a taxon. I think this tagging is cool stuff. It might be interesting to use these tags to do a simple summarization of the documents. You could for example pull out the 2 or 3 sentences from the files that contain most of the tags.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete

Note: only a member of this blog may post a comment.