Note to self about a possible project. This PLoS ONE paper:
Tibély, G., Pollner, P., Vicsek, T., & Palla, G. (2013, December 31). Extracting Tag Hierarchies. (P. Csermely, Ed.)PLoS ONE. Public Library of Science (PLoS). http://doi.org/10.1371/journal.pone.0084133describes a method for inferring a hierarchy from a set of tags (and cites related work that is of interest). I've grabbed the code and data from http://hiertags-beta.elte.hu/home/ and put it on GitHub.
Possible projectUse Tibély et al. method (or others) on taxonomic names extracted from BHL text (or other) and see if we can reconstruct taxonomic classifications. ow do classifications compare to those in databases? Can we enhance existing databases using this technique (e.g., extract classifications from literature for groups pporly represented in existing databases)? Could be part of larger study of what we can learn from co-occurrence of taxonomic names, e.g. Automatically extracting possible taxonomic synonyms from the literature.
Note to anyone reading this: if this project sounds interesting, by all means feel free to do it. These are just notes about things that I think would be fun/interesting/useful to do.