Friday, July 10, 2009

NCBI RDF

Following on from the last post, I've now set up a trivial NCBI RDF service at bioguid.info/taxonomy/ (based on the ISSN resolver I released yesterday and announced on the Bibliographic Ontology Specification Group).

If you visit it in a web browser it's nothing special. However, if you choose to display XML you'll see some simple RDF. I've mapped some NCBI fields to corresponding terms in ttp://rs.tdwg.org/ontology/voc/TaxonConcept# (including the deprecated rankString term, which really shouldn't be deprecated, IMHO). I've also extracted what LSIDs I can from any linkouts. For example, a name that appears in Index Fungorum will have the corresponding LSID, likewise for IPNI. URLs are simply listed as rdfs:seeAlso.

Here's the RDF for NCBI taxon 101855 (you can grab this from http://bioguid.info/taxonomy/101855):


<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:tcommon="http://rs.tdwg.org/ontology/voc/Common#"
xmlns:tc="http://rs.tdwg.org/ontology/voc/TaxonConcept#">
<tc:TaxonConcept rdf:about="taxonomy:101855 ">
<dcterms:title>Lulworthia uniseptata</dcterms:title>
<dcterms:created>1999-08-16</dcterms:created>
<dcterms:modified>2005-01-19</dcterms:modified>
<dcterms:issued>1999-09-14</dcterms:issued>
<tc:nameString>Lulworthia uniseptata</tc:nameString><tc:rankString>species</tc:rankString>
<tcommon:taxonomicPlacementFormal>cellular organisms, Eukaryota, Fungi/Metazoa group, Fungi, Dikarya, Ascomycota, Pezizomycotina, Sordariomycetes, Sordariomycetes incertae sedis, Lulworthiales, Lulworthiaceae, Lulworthia</tcommon:taxonomicPlacementFormal>
<tc:hasName rdf:resource="urn:lsid:indexfungorum.org:names:105488"/>
<rdfs:seeAlso rdf:resource="http://www.marinespecies.org/aphia.php?p=taxdetails&id=100407"/>
<rdfs:seeAlso rdf:resource="http://www.mycobank.org/MycoTaxo.aspx?Link=T&Rec=105488"/>
<rdfs:seeAlso rdf:resource="http://www.indexfungorum.org/Names/namesrecord.asp?RecordId=105488"/>
<rdfs:seeAlso rdf:resource="http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=194551"/>
<rdfs:seeAlso rdf:resource="http://www.mycobank.org/MycoTaxo.aspx?Link=T&Rec=341143"/>
</tc:TaxonConcept>
</rdf:RDF>


Note the tc:hasName link to urn:lsid:indexfungorum.org:names:105488.

All a bit crude. The NCBI lookup is live (i.e., it's not served from a local copy of the database). I'll look at fixing this at some point, as well as caching the linkout lookups (one advantage of the live query is you can get the three dates (created, modified, and published). But for now it's a starting point to start to play with SPARQL queries across NCBI taxonomy, Index Fungorum, and IPNI using a common vocabulary.