Saturday, May 06, 2006
A triple store for ants is all very well, but it contains just the information available when the triple store was created. What about updating it? What about doing this automatically? Here are some ideas:
Connotea provides semantically rich RSS feeds. We could subscribe to a feed using a tag (such as Formicidae), and extract recent posts. Could use HTTP conditional GET, or parse the Connotea feed and use XPath to extract references more recent than a given date. Connotea makes extensive use of RDF in their RSS feeds, so it's easy to dump this into the triple store.
uBio's taxononmically intelligent RSS feed reader could be used to monitor publications on ants (e.g., Formicidae). uBio uses RSS 2.0, which doesn't include RDF (see Wikipedia entry for RSS). One option would be to parse the RSS and see what we can extract from the links (e.g., if they contain DOIs, are Ingenta feeds, etc.). If there are DOIs we could use CrossRef's OpenURL lookup. Or we could use the Connotea Web API. We'd upload the URLs, and get Connotea to see what it can do with them, then we make use of their RSS feed. This also makes the information available to everybody for tagging.
We could also track new sequences in GenBank (to do).