Tuesday, December 09, 2008

Table lens view of data matrix

Among the many weaknesses of my challenge demo is the way it simply dumps out a list of sequences (see comments on the demo. I decided to take a look at table lens after reading BiblioViz: a system for visualizing bibliography information -- see also Rao and Card's 1994 paper (doi:10.1145/191666.191776, there is a free PDF on Ramono Rao's web site), and DateLens (another product of the University of Maryland's Human -Computer Interaction Lab, who also gave us treemaps). I've hacked together some crude Javascript and CSS, taking some suggestions on Stack Overflow as a starting point (seems to work in Safari and Firefox, doesn't in IE6).

The idea is to display a table in a fixed space. As you mouse over a cell, the contents of the cell, and the relevant row and column labels become visible. This enables you to get an overview of the full table, but still see individual items:


It's easier to show than explain. For example, take a look at The amphibian tree of life, or watch this short screencast:





There are some things to fix. Firstly, I group all sequences by NCBI taxon and gene "features". If there's more than one sequence for the same gene and taxon, I just show one of them (an obvious solution is to add a popup menu if there's more than one sequence). Secondly, the gene "names" are extracted from GenBank feature tables, and will include synonyms and duplicates (for example, a sequence may have a gene feature "RAG-1" and a CDS feature "recombination activating protein 1"). I've stored all of these as not every sequence is consistently labelled, so excluding one class of feature may loose all labels from a sequence. At some point it would be useful to cluster gene names (a task for another day).