Friday, June 06, 2014

Finding citations using full text search

Note to self on citation matching.

Looking for this paper "Fishes of the Marshall and Marianas islands. Vol. I. Families from Asymmetrontidae through Siganidae" I Googled it, adding "bistro" as a search term to see if I'd already added it to BioStor. The Google search:

https://www.google.co.uk/?gws_rd=ssl#q=Fishes+of+the+Marshall+and+Marianas+islands.+Vol.+I.+Families+from+Asymmetrontidae+through+Siganidae+biostor

found several hits in BioStor:

Google
What is interesting is that these hits are to full text of references that cite the article I'm after, not the article itself. I'm sure many have had this experience, where you are searching for an obscure article and you keep finding papers that cite it, rather than the actual paper you're after. But this suggests another strategy for building the citation graph for an article. If you have a decent corpus of full text articles, search for the article (using, say title, journal, pagination) in the text of those articles and store the hits. Those are the references that cite the article (OK, not all, but some of them). This may be a more attractive way of building the citation graph, rather than parsing citations in articles and trying to locate them. Indeed, it could be extended to help marking up those citations. Imagine grabbing blocks of text from near the end of an article, searching for those in a database of citations, using close matches to flag the corresponding block as a citation.

Need to think about this a little more...

Update



The paper is:

Polepeddi, L., Agrawal, A., & Choudhary, A. (n.d.). Poll: A Citation Text Based System for Identifying High-Impact Contributions of an Article. 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE. doi:10.1109/icdmw.2011.136/blockquote>