Saturday, January 14, 2017

Displaying taxonomic classifications from Wikidata using d3js and SPARQL

Sahelanthropus tchadensis TM 266 01 060 1 Following on from previous posts The Semantic Web made fun: d3sparql and The Biodiversity Heritage Library meets Wikidata via Wikispecies: adding author identifiers to BioStor I've put together an example query that can be used to extract a taxonomic classification from Wikidata. The query is inspired by the http://biohackathon.org/d3sparql/ example, and uses the wikidata property P171 ("parent taxon") which is subproperty of rdfs:subClassOf (the property used in the d3sparql example which queries the Uniprot taxonomy).

The following SPARQL query generates a list of nodes in the tree representing the classification of Hominini (humans, chimps, and their extinct relatives):

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?root_name ?parent_name ?child_name WHERE
{
 VALUES ?root_name {"Hominini"}
 ?root wdt:P225 ?root_name .
 ?child wdt:P171+ ?root .
 ?child wdt:P171 ?parent .
 ?child wdt:P225 ?child_name .
 ?parent wdt:P225 ?parent_name .
}

Using https://query.wikidata.org/sparql as the endpoint, in http://biohackathon.org/d3sparql/ this generates the following diagram:

Screenshot 2017 01 14 11 41 55

There are some obvious issues with this classification, such as genera that lack descendant species (e.g., Cyphanthropus). Indeed, we could imagine developing SPARQL queries to flag up such errors (see A use case for RDF in taxonomy). But the availability and accessibility of Wikidata and its SPARQL interface makes it a great playground to explore the utility of SPARQL for exploring taxonomic data.