Saturday, December 02, 2006

Folksonomies - why philosophy is a bad thing

The November 2006 issue of D-Lib magazine contains an article by Elaine Peterson entitled "Beneath the Metadata: Some Philosophical Problems with Folksonomy" (doi:10.1045/november2006-peterson). She writes:

The choice to use folksonomy for organizing information on the Internet is not a simple, straightforward decision, but one with important underlying philosophical issues. Although folksonomy advocates are beginning to correct some linguistic and cultural variations when applying tags, inconsistencies within the folksonomic classification scheme will always persist...Most information seekers want the most relevant hits when keying in a search query. Folksonomy is a scheme based on philosophical relativism, and therefore it will always include the failings of relativism. A traditional classification scheme will consistently provide better results to information seekers.


This article is one of the most irritating things I've read in a while, and as much as I like philosophy, it reinforces my prejudice that invoking philosophy is almost always a bad idea. Casting the discussion about folksonomy versus classification as a clash between "Aristotelian categories" and "philosophical relativism" just substitutes name calling for analysis, and the paper makes unsubstantiated claims such as "A traditional classification scheme based on Aristotelian categories yields search results that are more exact", and "A traditional classification scheme will consistently provide better results to information seekers." Er, how do we know this? Do we have data to support this? And, um, what classification scheme does Google use, exactly?

Now, I'm a fan of classifications, and would argue that biological taxonomy has one of the largest, most elaborate classifications that is actively used, complete with detailed rules governing it's maintenance. Indeed, much of this iPhylo blog is about a project to add classification to a database (TreeBASE) that eschews classification (to its detriment). However, classification is problematic — there are competing classifications, and within biological taxonomy there is much discussion about how names relate to classifications (see earlier posts More on names (and frogs) and Synonomy and kinds of name). Despite being armed with one of the best developed classifications around, biologists also use informal names to refer to groups, partly because our knowledge of the real world changes, and hence our classifications change (but often lagging behind the latest research).

Classifications can also constrain the kinds of questions we can ask. For example, NCBI's classification of animals lacks the Ecdysozoa, a group whose existence is controversial, but I guess most zoologists would accept. Despite this broad acceptance, NCBI prevents users asking questions such as "how many sequences have been obtained from members of the Ecdysozoa?" To see this, try typing "Ecdysozoa" as a search term in the NCBI's Taxonomy Browser. If you want to ask this question, you need to construct a complex query that specifies all the groups belonging to the Ecdysozoa. This problem motivated a paper Gabriel Valiente and I wrote (doi:10.1186/1471-2105-6-208) that suggested using edit scripts to modify trees so that users can generate their preferred classification using the NCBI tree as a starting point. The other motivation was that the NCBI tree is continually growing as the NCBI database grows.

Given these issues, the flexibility of folksonomies may offer some advantages. Indeed, I think the notion of "tagging" may prove a useful way to think about taxonomic names. Guy and Tonkin's article "Folksonomies: Tidying up tags?" (doi:10.1045/january2006-guy) offers a rather more sensible perspective:

We agree with the premise that tags are no replacement for formal systems, but we see this as being the core quality that makes folksonomy tagging so useful.

Seems like a case of "the genius of AND".

6 comments:

Cassandra said...

Interesting stuff indeed. Curious what the future of the tag will be and what odd classifications we'll see. Keep it up!

David Marjanović said...

Indeed, much of this iPhylo blog is about a project to add classification to a database (TreeBASE) that eschews classification (to its detriment).

Classification? Or just nomenclature, which would IMHO be enough?

Rod Page said...

But, how would you answer a question such as "find me all studies in TreeBASE that contain birds"? Nomenclature isn't enough.

David Marjanović said...

But, how would you answer a question such as "find me all studies in TreeBASE that contain birds"?

This was discussed at some length at the 2nd PhyloCode meeting last summer.

- What a bird is depends on the definition of "bird" (which will be fixed by the PhyloCode) and the phylogeny. You can't just tag every OTU as being a bird or not and then simply search for that tag. OK, it would probably work for neontologists that ask "find me all TreeBASE studies that contain extant birds", because among the living it's always obvious and never in dispute whether anything is a bird or not, but that's the exception, not the rule.
- Thus, the question can only be answered unambiguously if the specifiers (anchors) of the definition happen to be in the phylogeny.
- Not being an informatician, I don't remember much of the discussion; however, supertree methods will be required if the anchors are not included in the phylogeny in question.

Rod Page said...

"...however, supertree methods will be required if the anchors are not included in the phylogeny in question. "

This is exactly my point - nomenclature is not enough, unless, as you point out, "if the specifiers (anchors) of the definition happen to be in the phylogeny."

I suggest that for most trees in a database like TreeBASE this will not be the case.

David Marjanović said...

OK. Nomenclature is not enough if there's no tree to which it can be applied. So, you need nomenclature applied to a phylogeny. What I'm really trying to say is that classifications are even more volatile than phylogenies; people can agree on the phylogeny and still disagree on the classification that is to be derived from it.