Friday, August 28, 2009

Visualising the Wikipedia classification of mammals

As part of my on-going experiments with Wikipedia as a repository of taxonomic information, I've extracted mammal pages from Wikipedia. There's a lot to be done with these, but the first thing I wanted to ask was whether the Wikipedia pages would form a tree (i.e., had the authors of these pages managed to ensure the pages formed a single, coherent taxonomic classification). The answer, as shown in the graph below, is no.
m.jpg


The graph contains 7750 nodes, each one representing a Wikipedia page with a Taxobox containing the class Mammalia. A node is connected to the node corresponding to its parent in the mammalian classification.

If it formed a single classification there would be just one component. Instead, it contains 841 distinct components, many of which you can see at the bottom. If you want to explore the graph, I've made an image map here using the wonderful graph editor yEd. You'll need to move the browser's scroll bars to see the graph. If you click on the node you'll be taken to the corresponding Wikipedia page.

Note: The graph has been laid out using yEd's organic layout command, so it won't look tree-like. The diagram is intended to testing for connectedness only.

Some of these components may be due to errors in my parser, but many are due to inconsistencies in Wikipedia. Typical problems are Taxoboxes containing taxa for which there is no page in Wikipedia (these are visible as redlinks), or monotypic taxa where the pages for the genus and species are the same).

Of course, the joy of Wikipedia is that these problems can be easily fixed, but the trick is discovering the problems in the first place. There is a distinct lack of tools to enable Wikipedia editors to view the entire classification of interest and identify areas that need fixing (something Roger Hyam alluded to in his comment on an earlier posting). It would, of course, be great to be able to edit the graph shown above and have those changes automatically transmitted to Wikipedia.