Friday, September 04, 2009

Visualising edit history of a Wikipedia page

Quick post (really should be doing something else). Reading Jeff Atwood's post Mixing Oil and Water: Authorship in a Wiki World lead me to IBM's wonderful history flow tool to visualise the edit history of a Wikipedia page.

Imagine a scenario where three people will make contributions to a Wiki page at different points in time. Each person edits the page and then saves their changes to what becomes the latest version of that page.


History Flow connects text that has been kept the same between consecutive versions. Pieces of text that do not have correspondence in the next (or previous) version are not connected and the user sees a resulting "gap" in the visualization; this happens for deletions and insertions. (animated GIF from Jeff Atwood's post).

There's a nice paper describing history flow (doi:10.1145/985692.985765, free PDF here). Inspired by this I decided to try and implement history flow in PHP and SVG. Here's a preliminary result:


This is the edit history for the Afrotheria page. Click on the image above (or here to see the SVG image -- you need a decent web browser for this, IE uses will need a SVG plugin).

The SVG image is clickable. The columns represent revisions, click on those to go to that revision. The columns are evenly spaced (i.e., the gaps don't correspond to time). The bands between revisions trace individual blocks of text (in this case lines in the Wikipedia page source). If you click on a band you get taken to that Wikipedia user's page.

This is all done in a rush, but it gives an idea of what can be done. The history flow carries all sorts of information about how an article has developed over time, major changes (such as the introduction of Taxoboxes), and makes the content of a page traceable, in the sense that you can see who contributed what to a page.