<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-16081779</id><updated>2012-01-30T23:46:20.208Z</updated><category term='merging'/><category term='clustering'/><category term='extraction'/><category term='linked data. Zitgist'/><category term='ViBRANT'/><category term='Fedora'/><category term='OpenURL'/><category term='Zemanta'/><category term='Creative Commons'/><category term='Semantic Web'/><category term='SVG'/><category term='MarkMail'/><category term='tag tree'/><category term='Wine'/><category term='Wikispecies'/><category term='specimens'/><category term='ranking'/><category term='Cladistics'/><category term='Apple'/><category term='ants'/><category term='scratchpads'/><category term='parasites'/><category term='bryozoa'/><category term='OTU'/><category term='Uniprot'/><category term='dark taxa'/><category term='Open source'/><category term='Freebase'/><category term='PMID'/><category term='long tail'/><category term='Zotero'/><category term='career suicide'/><category term='Social media'/><category term='license'/><category term='Mac OS X'/><category term='MOBOT'/><category term='Apache'/><category term='hack4knowledge'/><category term='rant'/><category term='vocabulary'/><category term='symbiome'/><category term='power law'/><category term='spelling correction'/><category term='Nuytsia'/><category term='lazy load'/><category term='ZooKeys'/><category term='visualization'/><category term='cospeciation'/><category term='names'/><category term='RDF'/><category term='CSS'/><category term='Catalogue of Life'/><category term='talk'/><category term='stratigraphy'/><category term='DNA barcoding'/><category term='Flipboard'/><category term='data cleaning'/><category term='systematics'/><category term='Mediawiki'/><category term='MPE'/><category term='mailing list'/><category term='nomenclators'/><category term='test suite'/><category term='squid'/><category term='interview'/><category term='touch screen'/><category term='patent'/><category term='amber'/><category term='Firefox'/><category term='iTunes'/><category term='OpenRef'/><category term='taxonomic concept'/><category term='mobile tagging'/><category term='design'/><category term='Internet Explorer'/><category term='millipedes'/><category term='Google books'/><category term='supertree'/><category term='stained glass'/><category term='FishBase'/><category term='ruby'/><category term='Vista'/><category term='Microsoft'/><category term='podcast'/><category term='list'/><category term='false positive'/><category term='bibliometrics'/><category term='displacement'/><category term='demo'/><category term='specimen codes'/><category term='logo'/><category term='Clay Shirky'/><category term='Steve Jobs'/><category term='Atlas of Living Australia'/><category term='scraping'/><category term='Peter Norvig'/><category term='Ideator'/><category term='ATOM'/><category term='Carmen Electra'/><category term='&quot;data wars&quot;'/><category term='services'/><category term='mammals'/><category term='CGI'/><category term='genus'/><category term='code'/><category term='timemap'/><category term='impact factor'/><category term='Japanese'/><category term='TreeBASE'/><category term='teaching'/><category term='touch'/><category term='uBio'/><category term='Mac OSX'/><category term='business model'/><category term='hack'/><category term='preprint'/><category term='&quot;sea level&quot;'/><category term='navigation'/><category term='extensions'/><category term='DjVu'/><category term='Kew'/><category term='speaking'/><category term='Semantic Mediawiki'/><category term='Fungi'/><category term='shape files'/><category term='Index Fungorum'/><category term='reCAPTCHA'/><category term='PLoS'/><category term='ArcGIS'/><category term='parasite'/><category term='imagination'/><category term='TaxPub'/><category term='vizbi'/><category term='number of species'/><category term='PHP'/><category term='PRISM'/><category term='NLM DTD'/><category term='Pando'/><category term='ReaderMeter'/><category term='frogs'/><category term='TreeView'/><category term='twitter'/><category term='Proxy'/><category term='evolutionary biology'/><category term='Flickr'/><category term='mod_rewrite'/><category term='Angelina Jolie'/><category term='index'/><category term='Entomologica Scandinavica'/><category term='replication'/><category term='BBC'/><category term='GIS'/><category term='Species-ID'/><category term='plans'/><category term='UTF8'/><category term='GeoRSS'/><category term='data mining'/><category term='encoding'/><category term='E O Wilson'/><category term='Insect Systematics and Evolution'/><category term='Google Docs'/><category term='domain names'/><category term='predictions'/><category term='art'/><category term='grant'/><category term='presentation'/><category term='library'/><category term='stackoverflow'/><category term='disaambiguation'/><category term='NHM'/><category term='USIN'/><category term='georeferencing'/><category term='Mendeley'/><category term='tiles'/><category term='CERN'/><category term='family'/><category term='iphylo'/><category term='OAI'/><category term='Wiley'/><category term='citation'/><category term='hOCR'/><category term='swine flu'/><category term='GBIF'/><category term='RTFM'/><category term='bibliographic coupling'/><category term='Mesquite'/><category term='OCLC'/><category term='digital library'/><category term='digitising'/><category term='H1N1'/><category term='jQuery'/><category term='Darwin Core riplet'/><category term='iCal'/><category term='UTM grid reference'/><category term='Metacafe'/><category term='XML'/><category term='history flow'/><category term='Strumigenys'/><category term='regular expression'/><category term='cyberscience'/><category term='TinyURL'/><category term='Bibliography of Life'/><category term='URL shortening'/><category term='PygmyBrowse'/><category term='Papers'/><category term='CouchDB'/><category term='microcitations'/><category term='Linked data'/><category term='Wellcome'/><category term='Handles'/><category term='CiteBank'/><category term='Drupal'/><category term='integration'/><category term='OpenHandle'/><category term='Tree of Life'/><category term='Bio2RDF'/><category term='software'/><category term='conversation'/><category term='conservation status'/><category term='Pyramica'/><category term='memcached'/><category term='release'/><category term='TDWG'/><category term='bit.ly'/><category term='AntWeb'/><category term='iBook'/><category term='articles'/><category term='Blackwell'/><category term='visulaisation'/><category term='Google Maps'/><category term='users'/><category term='published'/><category term='LSID'/><category term='ISSN'/><category term='javascript'/><category term='to do'/><category term='timeline'/><category term='DOI trees'/><category term='map'/><category term='text mining'/><category term='Talk Science'/><category term='crazy'/><category term='press'/><category term='museum'/><category term='help'/><category term='twittervision'/><category term='longest common substring'/><category term='citation mutation'/><category term='metacrap'/><category term='Google Earth'/><category term='Perceptive Pixel'/><category term='metrics'/><category term='trees'/><category term='browser'/><category term='Wallace'/><category term='Web Hooks'/><category term='BLAST'/><category term='phylogeny'/><category term='digitisation'/><category term='background'/><category term='matching'/><category term='Wired'/><category term='Yahoo'/><category term='WoRMS'/><category term='crash'/><category term='vision'/><category term='data quality'/><category term='XMP'/><category term='NDE'/><category term='social citation'/><category term='iBooks'/><category term='WebDAV'/><category term='lucene'/><category term='quantum treemap'/><category term='2010'/><category term='Dublin Core'/><category term='BHL-Europe'/><category term='API'/><category term='MIT'/><category term='PLoS Hubs'/><category term='TaxonRank'/><category term='SOAP'/><category term='Gallica'/><category term='Connotea'/><category term='tags'/><category term='Sherborn'/><category term='search'/><category term='aggregation'/><category term='slideshare'/><category term='Handle'/><category term='zoomify'/><category term='maps'/><category term='iPad'/><category term='&quot;Guy Kawasaki&quot;'/><category term='failure'/><category term='Solr'/><category term='metadata'/><category term='ION'/><category term='PhyLoTA'/><category term='RAxML'/><category term='biogeography'/><category term='KML'/><category term='paywall'/><category term='books'/><category term='collaboration'/><category term='taxonomists'/><category term='Google Spreadsheets'/><category term='IUCN'/><category term='NSF'/><category term='fonts'/><category term='wow'/><category term='Æ'/><category term='Windows'/><category term='NCBI'/><category term='Evolution2010'/><category term='bioinformatics'/><category term='Wikisource'/><category term='Australian Systematic Botany'/><category term='duplicates'/><category term='classification'/><category term='Linkout'/><category term='taxonomic name'/><category term='Tasmania'/><category term='git'/><category term='ebio09'/><category term='species'/><category term='DiGIR'/><category term='reliability'/><category term='spider'/><category term='video'/><category term='HomeBrew'/><category term='close to the bone'/><category term='New Category'/><category term='mashup'/><category term='fossil'/><category term='pagerank'/><category term='SICI'/><category term='OCR'/><category term='EvolDir'/><category term='e-Biosphere'/><category term='Plant List'/><category term='Darwin'/><category term='table'/><category term='prize'/><category term='Nature'/><category term='we feel fine'/><category term='snakes'/><category term='host'/><category term='workshop'/><category term='TBMap'/><category term='PDF'/><category term='Google Code'/><category term='success'/><category term='bibliographies'/><category term='transitive reduction'/><category term='Mammal Species of the World'/><category term='Perl'/><category term='joy'/><category term='forking data'/><category term='Challenge'/><category term='hyperbolic tree'/><category term='computers'/><category term='&quot;Social Graph API&quot;'/><category term='exhaustion'/><category term='&quot;rock pools&quot;'/><category term='WorldCat'/><category term='output'/><category term='annotation'/><category term='iPhone'/><category term='GrandChallenge'/><category term='DBpedia'/><category term='bioguid'/><category term='clusterfuck'/><category term='Taxobox'/><category term='Wordle'/><category term='Broad Institute'/><category term='error'/><category term='tree'/><category term='GeoCouch'/><category term='Xanadu'/><category term='JSTOR'/><category term='biodiversity informatics'/><category term='space tree'/><category term='postphylogenetics'/><category term='thesis'/><category term='Open Science'/><category term='OAuth'/><category term='citation context'/><category term='Encylcopedia of Life'/><category term='BioStar'/><category term='github'/><category term='ngram'/><category term='&quot;table lens&quot;'/><category term='URI'/><category term='identifiers'/><category term='Genbank'/><category term='interface'/><category term='BNCOD2008'/><category term='visualisation'/><category term='Systematic Biology'/><category term='Wikipedia'/><category term='article 2.0'/><category term='TAXACOM'/><category term='user interface'/><category term='Stephen Colbert'/><category term='c-squares'/><category term='canvas'/><category term='&quot;author names&quot;'/><category term='JSON'/><category term='markup'/><category term='Facebook'/><category term='identifier'/><category term='Nature Precedings'/><category term='ICZN'/><category term='citation matching'/><category term='citation needed'/><category term='Nomenclator Zoologicus'/><category term='Top 10'/><category term='taxonomic intelligence'/><category term='data preservation'/><category term='TAPIR'/><category term='Edinburgh'/><category term='&quot;web service&quot;'/><category term='Google'/><category term='PaperID'/><category term='MSW'/><category term='publishing'/><category term='dechronization'/><category term='literature'/><category term='The Plant List'/><category term='tvwidget'/><category term='identity'/><category term='Linux'/><category term='sucks'/><category term='data entry'/><category term='BMC Bioinformatics'/><category term='AVATOL'/><category term='md5'/><category term='BIOONE'/><category term='Europe'/><category term='pPod'/><category term='Science Commons'/><category term='iEvoBio'/><category term='Open Acess'/><category term='DSpace'/><category term='Chromis'/><category term='Scripting life'/><category term='phyloinformatics'/><category term='SKOS'/><category term='author names'/><category term='triple store'/><category term='topological sorting'/><category term='Open access'/><category term='COinS'/><category term='rewrite'/><category term='EOL'/><category term='RSS'/><category term='Science 2.0'/><category term='DeepDyve'/><category term='treemap'/><category term='Mac'/><category term='jellyfish'/><category term='microformat'/><category term='Cool URIs'/><category term='sparklines'/><category term='Quora'/><category term='duplication'/><category term='blogs'/><category term='taxonomy'/><category term='future'/><category term='contest'/><category term='TV'/><category term='ZooBank'/><category term='Open Calais'/><category term='MySQL'/><category term='XSLT'/><category term='Google Scholar'/><category term='ePub'/><category term='CVS'/><category term='Vince Smith'/><category term='algorithm'/><category term='cloud'/><category term='Parallels'/><category term='JSONP'/><category term='filesystem'/><category term='AgeNames'/><category term='Webdot'/><category term='linking'/><category term='PLoS Currents'/><category term='iSpiders'/><category term='data coupling'/><category term='geography'/><category term='authorship'/><category term='Australian Faunal Directory'/><category term='Enhydris punctata'/><category term='version control'/><category term='panbiogeography'/><category term='Cooliris'/><category term='indirection'/><category term='NESCent'/><category term='JACC'/><category term='screencast'/><category term='SPARQL'/><category term='wiki'/><category term='ideology'/><category term='AMNH'/><category term='Atypon'/><category term='ITIS'/><category term='BMC'/><category term='AppleScript'/><category term='Poly9'/><category term='open data'/><category term='TreeView X'/><category term='dot'/><category term='conference'/><category term='Taylor and Francis'/><category term='Dryad'/><category term='string'/><category term='C++'/><category term='IPNI'/><category term='zoom'/><category term='n-grams'/><category term='jQueryMobile'/><category term='Auckland'/><category term='Venter'/><category term='modelling'/><category term='iSpecies'/><category term='EAV'/><category term='phylowidget'/><category term='DOI'/><category term='Android'/><category term='Charles Sherbon'/><category term='science'/><category term='database'/><category term='HTTP URI'/><category term='dinosaurs'/><category term='&quot;word for the day&quot;'/><category term='red lionfish'/><category term='ajax'/><category term='Phthiraptera'/><category term='R-tree'/><category term='Photosynth'/><category term='programming'/><category term='post-taxonomic'/><category term='tutorial'/><category term='BioStor'/><category term='GUIDs'/><category term='Graphviz'/><category term='mapping'/><category term='OBIS'/><category term='book'/><category term='BHL'/><category term='Gregg&apos;s paradox'/><category term='PhyloWS'/><category term='CiNii'/><category term='deep zoom'/><category term='Sun'/><category term='British Library'/><category term='3D'/><category term='matrix'/><category term='icon'/><category term='CrossRef'/><category term='Gene Wiki'/><category term='publication'/><category term='Begonia'/><category term='Life and Literature'/><category term='collections'/><category term='tagging'/><category term='fail'/><category term='PubMed Central'/><category term='data'/><category term='Elsevier'/><category term='Zootaxa'/><category term='distribution'/><category term='HS_ALIAS'/><title type='text'>iPhylo</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://iphylo.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default?start-index=101&amp;max-results=100'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>474</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-16081779.post-5901535091373784967</id><published>2012-01-30T17:19:00.001Z</published><updated>2012-01-30T17:19:52.990Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='phylogeny'/><category scheme='http://www.blogger.com/atom/ns#' term='SVG'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='ajax'/><category scheme='http://www.blogger.com/atom/ns#' term='BLAST'/><category scheme='http://www.blogger.com/atom/ns#' term='phyloinformatics'/><title type='text'>BLAST a sequence and get a tree</title><content type='html'>For this weeks sessions of my &lt;a href="http://iphylo.org/~rpage/phyloinformatics/"&gt;phyloinformatics course&lt;/a&gt; I'm developing some phylogeny tools. The first is a simple AJAX-based BLAST tool. I've always wanted a quick way to see a GenBank sequence in its phylogenetic context, so I've built a simple tool to that takes a GenBank accession number or GI number, submits a BLAST job, retrieves the sequences, aligns them using CLUSTALW,  builds a quick and dirty neighbour-joining tree using PAUP*, then displays the tree using SVG (if your browser doesn't support this you won't see the tree). One use for this is to quikcly get a sense of whether an unnamed ("dark") taxon is related to sequences that have been identified.&lt;br /&gt;&lt;br /&gt;Nothing fancy, but it was a chance to display the whole process in the browser without opening new windows or refreshing the page. Here's an example for the GenBank sequence &lt;a href="http://www.ncbi.nlm.nih.gov/nucleotide/FJ559186"&gt;FJ559186&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://player.vimeo.com/video/35895870?title=0&amp;amp;byline=0&amp;amp;portrait=0" width="398" height="244" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;For the technically-minded, the calls to BLAST and the alignment and tree construction tools all use AJAX, and there's a simple Javascript timer to countdown the seconds that the NCBI BLAST web service estimates the BLAST job will take, before we poll NCBI to see if the job has in fact finished. The code is in &lt;a href="https://github.com/rdmpage/phyloinformatics"&gt;GitHub&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5901535091373784967?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5901535091373784967'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5901535091373784967'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/blast-sequence-and-get-tree.html' title='BLAST a sequence and get a tree'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2340342747346374591</id><published>2012-01-26T12:30:00.001Z</published><updated>2012-01-26T12:43:43.567Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='data mining'/><category scheme='http://www.blogger.com/atom/ns#' term='specimen codes'/><category scheme='http://www.blogger.com/atom/ns#' term='museum'/><category scheme='http://www.blogger.com/atom/ns#' term='Darwin Core riplet'/><title type='text'>Extracting museum specimen codes from text</title><content type='html'>Quick note about a tool I've cobbled together as part of the &lt;a href="http://iphylo.org/~rpage/phyloinformatics/"&gt;phyloinformatics course&lt;/a&gt;, which addresses a long standing need I and others have to extract specimen codes from text. I've had this code kicking around for a while (as part of various never-finished data mining projects), but never got around to releasing it, until now. It is very crude (basically a bunch of regular expressions), and there's a lot which could be done to improve it (not least starting with a complete list of museum specimen codes, rather than just those I've come across in, say &lt;i&gt;Zootaxa&lt;/i&gt; and &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;You can try the tool at &lt;a href="http://iphylo.org/~rpage/phyloinformatics/services/specimenparser.php"&gt;http://iphylo.org/~rpage/phyloinformatics/services/specimenparser.php&lt;/a&gt;. Paste in some text and it will try and extract museum codes. The tool tries to handle ranges of specimens (e.g., MHNSM 1808-09), and some of the more common specimen numbering schemes.&lt;br /&gt;&lt;br /&gt;Comments welcome. If you are looking for a source of text, papers in &lt;i&gt;Zookeys&lt;/i&gt; or &lt;i&gt;Zootaxa&lt;/i&gt; are a good place to start (especially papers on vertebrates where specimen numbers are often used). BioStor is also a good source: if you're looking at a paper in BioStor click on the "Text" link to get the OCR text for an article and paste that into the form at . For example, the text for &lt;a href="http://biostor.org/reference/97426"&gt;Systematics of the Bufo coccifer complex (Anura: Bufonidae) of Mesoamerica&lt;/a&gt; is available at &lt;a href="http://biostor.org/reference/97426.text"&gt;http://biostor.org/reference/97426.text&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The extraction tool can also be called as a web service using POST to get back the results in JSON.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2340342747346374591?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2340342747346374591'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2340342747346374591'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/extracting-museum-specimen-codes-from.html' title='Extracting museum specimen codes from text'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4245769225469318038</id><published>2012-01-23T12:41:00.001Z</published><updated>2012-01-23T12:41:13.260Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='teaching'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='phyloinformatics'/><title type='text'>Open course on phyloinformatics</title><content type='html'>As part of a postgraduate course here at the &lt;a href="http://www.gla.ac.uk/"&gt;University of Glasgow&lt;/a&gt; I'm teaching five sessions on "phyloinformatics", which I've decided to define broadly enough to encompass most of biodiversity informatics.&lt;br /&gt;&lt;br /&gt;Given that this module is being developed on the fly, and will make use of lots of little "toys" I've developed and discussed on this blog, I've decided to put the course notes online, along with the interactive demos and the source code. So, if you want to follow along for the next couple of weeks, here are the links:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://iphylo.org/~rpage/phyloinformatics/"&gt;Course home page&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.org/~rpage/phyloinformatics/course/"&gt;Course notes and exercises&lt;/a&gt; (currently just the introductory session)&lt;/li&gt;&lt;li&gt;&lt;a href="https://github.com/rdmpage/phyloinformatics"&gt;Source code on GitHub&lt;/a&gt; (including code for my &lt;a href="http://iphylo.blogspot.com/2012/01/eol-ipad-web-app-using-jquerymobile.html"&gt;EOL iPad webapp&lt;/a&gt;)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Each course page supports comments (see the bottom of the page), so feel free to add comments, or suggestions. The notes are at a crude stage, and will be developed over the duration of the course (2 weeks). I'm also endeavouring to get all the source code for the demonstration apps into GitHub. None of these demos is polished, but they will hopefully provide some ideas for taking them further. There will be iSpecies-like mashups, iPad webapps, classification visualisations, TreeBASE search tools, geophylogenies and other phylogeny viewers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4245769225469318038?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4245769225469318038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4245769225469318038'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/open-course-on-phyloinformatics.html' title='Open course on phyloinformatics'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-1754743869325390809</id><published>2012-01-19T16:35:00.001Z</published><updated>2012-01-19T16:35:17.036Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='jQueryMobile'/><category scheme='http://www.blogger.com/atom/ns#' term='iPad'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='EOL'/><title type='text'>EOL iPad web app using jQueryMobile</title><content type='html'>As part of a course on "phyloinformatics" that I'm about to teach I've been making some visualisations of classifications. Here's one I've put together using &lt;a href="http://jquerymobile.com/"&gt;jQuery Mobile&lt;/a&gt; and the Encyclopedia of Life &lt;a href="http://eol.org/api"&gt;API&lt;/a&gt;. It's pretty limited, but is a simple way to explore EOL using three different classifications. You can view this live at &lt;a href="http://iphylo.org/~rpage/phyloinformatics/eoliphone/"&gt;http://iphylo.org/~rpage/phyloinformatics/eoliphone/&lt;/a&gt; (looks best on an iPad or iPhone). Once I've tidied it up I'll put the code online. Meantime here's a quick demo:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe src="http://player.vimeo.com/video/35321521?title=0&amp;amp;byline=0&amp;amp;portrait=0&amp;amp;autoplay=0" width="398" height="587" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-1754743869325390809?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1754743869325390809'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1754743869325390809'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/eol-ipad-web-app-using-jquerymobile.html' title='EOL iPad web app using jQueryMobile'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5628455436846736390</id><published>2012-01-18T13:22:00.001Z</published><updated>2012-01-18T13:22:03.004Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='TAXACOM'/><category scheme='http://www.blogger.com/atom/ns#' term='specimens'/><category scheme='http://www.blogger.com/atom/ns#' term='identifiers'/><category scheme='http://www.blogger.com/atom/ns#' term='collections'/><category scheme='http://www.blogger.com/atom/ns#' term='citation'/><title type='text'>Yet another reason why we need specimen identifiers, now!</title><content type='html'>This &lt;a href="http://markmail.org/message/opv2we7fkmro2nen"&gt;message&lt;/a&gt; appeared on the TAXACOM mailing list:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;It is getting more and more necessary for taxonomists to demonstrate&lt;br /&gt;that they are useful and used. This does not only apply to the&lt;br /&gt;individual scientists, but also to institutions with taxonomic&lt;br /&gt;collections, such as museums and herbaria. &lt;br /&gt;&lt;br /&gt;In an attempt to live up to that increasing demand for documentation,&lt;br /&gt;the leadership of the Natural History Museum of Denmark has issued an&lt;br /&gt;order to its curatorial staff - The staff members are requested to&lt;br /&gt;document which publications from 2011, written entirely by external&lt;br /&gt;scientists, that in one way or another are based on material in the&lt;br /&gt;collections of the Museum. &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Given that most specimens lack resolvable digital identifiers (a theme I've harped on about before, most recently in the context of &lt;a href="http://iphylo.blogspot.com/2011/12/dna-barcoding-darwin-core-triplet-and.html"&gt;DNA barcoding&lt;/a&gt;), answering this kind of query ends up being a case of searching publications for text strings that contain the acronym of the collection. The sender of the message, &lt;a href="http://www.nathimus.ku.dk/bot/vip/friis.htm"&gt;Ib Friis&lt;/a&gt;, is alarmed at this prospect:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;In publications, material from our herbarium at "C" is normally referred&lt;br /&gt;to in text strings of one of the following forms: "(C)", "(C, ", ", C,"&lt;br /&gt;or " C)". But a search in for example Google Scholar or other search&lt;br /&gt;engines  result in overflow of thousands and thousands of hits, even&lt;br /&gt;when these text strings are combined with other relevant words such as&lt;br /&gt;"botany", "plants", etc.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;In an earlier paper "Biodiversity informatics: the challenge of linking data and the role of shared identifiers" (&lt;a href="http://dx.doi.org/10.1093/bib/bbn022"&gt;http://dx.doi.org/10.1093/bib/bbn022&lt;/a&gt;) (free preprint available here: &lt;a href="http://hdl.handle.net/10101/npre.2008.1760.1"&gt;hdl:10101/npre.2008.1760.1&lt;/a&gt;) I argued that having resolvable identifiers for specimens could enable measures of "citation" to be computed for specimens (and data derived from those specimens). Just as we have citation counts for articles and impact factors for journals, we could have equivalent measures for specimens and collections. These measures may keep administrators happy, for scientists I think the real benefits will be the ability to trace the provenance of some data, and the fate of data they themselves have collected or published.&lt;br /&gt;&lt;br /&gt;For things such as publications it is trivial to track their usage. For example, to find the number of times the article "Biodiversity informatics: the challenge of linking data and the role of shared identifiers" has been cited, I simply enter the DOI into Google Scholar, e.g. &lt;a href="http://scholar.google.co.uk/scholar?q=10.1093/bib/bbn022"&gt;http://scholar.google.co.uk/scholar?q=10.1093/bib/bbn022&lt;/a&gt;. Imagine being able to do the same for specimens?&lt;br /&gt;&lt;br /&gt;For this to happen, museum specimens need digital identifiers. If museums are serious about quantifying the impact of their collections, they should make assigning digital identifiers a priority.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5628455436846736390?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5628455436846736390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5628455436846736390'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/yet-another-reason-why-we-need-specimen.html' title='Yet another reason why we need specimen identifiers, now!'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-618105247816636021</id><published>2012-01-17T11:01:00.001Z</published><updated>2012-01-17T11:01:03.640Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='CiteBank'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><title type='text'>Mendeley as CiteBank: some ideas</title><content type='html'>Here are some quick notes on how BHL could use Mendeley as a "CiteBank".&lt;br /&gt;&lt;br /&gt;&lt;b&gt;As a repository of bibliographic data&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If the goal is to assemble a "bibliography of life" then there are various ways this could be done.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Taxon-specific bibliographies&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Create groups that are taxon-specific (or find existing groups in Mendeley. For example, I've created groups for amphibias (&lt;a href="http://www.mendeley.com/groups/795441/amphibian-species-of-the-world/" target="_new"&gt;Amphibian Species of the World&lt;/a&gt;) and reptiles (&lt;a href="http://www.mendeley.com/groups/725961/tigr-jcvi-reptile-database/" target="_new"&gt;TIGR/JCVI Reptile Database&lt;/a&gt;) based on the Amphibian Species of the World and TIGR/JCVI Reptile Database, respectively. Taxon-specific groups are probably going to be attractive to users, but the quality of bibliographic metadata can be variable. However, a bibliography for a specific taxonomic group that is populated with links to BHL content would be very useful.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Journal-specific bibliographies&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;This is where I've spent most of my efforts. I've created around 300 groups for various journals (see list below, or go directly to &lt;a href="http://dl.dropbox.com/u/639486/groups.html"&gt;http://dl.dropbox.com/u/639486/groups.html&lt;/a&gt;). In some cases I've managed to populate these with the complete set of articles published in that journal, typically harvested from the journal's own web site. Typically the metadata from journal sites is high quality, although one has to be wary of &lt;a href="http://iphylo.blogspot.com/2011/09/orwellian-metadata-making-journals.html"&gt;Orwellian metadata&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://dl.dropbox.com/u/639486/groups.html" width="400" height="300"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;I use these groups in two ways. The first is as a source of metadata for extracting articles from BHL using &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt;. If you have article-level metadata finding articles in BHL becomes easier, and can be automated so that 1000's can be added in a few minutes.&lt;br /&gt;&lt;br /&gt;The second is for the &lt;a href="http://iphylo.blogspot.com/2011/11/mapping-names-to-literature-closing-in.html"&gt;taxon-literature mapping&lt;/a&gt; project, where one strategy is to use approximate string mapping to find equivalent citations in Mendeley and the ION database. Ultimately I'd like to link to the Mendeley citations as they tend to be higher quality than those in the original ION database.&lt;br /&gt;&lt;br /&gt;BHL could create Mendeley groups for journals it has scanned, and populate those.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;As an article-level index to BHL&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is perhaps the most direct way BHL could use Mendeley is as follows:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Create a BHL account.&lt;/li&gt;&lt;li&gt;For each BHL title create a Mendeley group (the name would be the BHL TitleID).&lt;br /&gt;&lt;li&gt;For each item in that title create a folder in the corresponding group (the folder name would be the ItemID).&lt;/li&gt;&lt;li&gt;Within each folder list the articles, book chapters or other component parts. If these aren't available yet, encourage people to add them. Some of these could be pre-populated with content from BioStor.&lt;/li&gt;&lt;li&gt;Harvest the contents of these groups to provide an article-level index to BHL (which for me is the single biggest impediment to using BHL). &lt;a href="http://iphylo.blogspot.com/2011/11/recently-ive-been-thinking-about-best.html"&gt;Previously I've suggested a way to easily add article data to BHL&lt;/a&gt;, Mendeley title/item groups and folders might be way to facilitate this process.&lt;/li&gt;&lt;/ol&gt;&lt;b&gt;PDF storage&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Although Mendeley offers PDF storage, this is one feature I'd be less inclined to use. Mendeley's rule for sharing PDFs and making them publicly available are too restrictive (they often don't know whether a PDF can, in fact, be shared). Plus you want tools to visualise, index, and archive PDFs. In effect a big file store with added features. I have some ideas on how this can be implemented (and have a rough working version to support &lt;a href="http://iphylo.org/~rpage/itaxon"&gt;http://iphylo.org/~rpage/itaxon&lt;/a&gt;). Alternatively, one could use Internet Archive services.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;As I've often argued, given the success of tools like Mendeley it seems pointless for anyone to try and build yet another online bibliographic database. The trick is to figure out how to leverage what Mendeley provides to support what the taxonomic (and broader biodiversity) community needs.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-618105247816636021?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/618105247816636021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/618105247816636021'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/mendeley-as-citebank-some-ideas.html' title='Mendeley as CiteBank: some ideas'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6199426103259984277</id><published>2012-01-10T14:58:00.001Z</published><updated>2012-01-10T15:04:18.394Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL-Europe'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomic name'/><title type='text'>Journals I'd like BHL to scan</title><content type='html'>I've recently updated my &lt;a href="http://iphylo.blogspot.com/2011/11/mapping-names-to-literature-closing-in.html"&gt;database of links between animal taxonomic names and literature identifiers&lt;/a&gt;, which now has over 280,000 names linked to some form of identifier (127,000 of these being DOIs). You can see the current version here:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As an experiment I've added a feature to list the number of names for each journal. Based on &lt;a href="http://iphylo.org/~rpage/itaxon/?journals"&gt;this list&lt;/a&gt; (limited to journals that I've found an ISSN for) here are some journals I'd like to see digitised by the &lt;a href="http://biodiversitylibrary.org"&gt;Biodiversity Heritage Library (BHL)&lt;/a&gt;. Note that by digitised I mean beyond the 1923 cutoff applied to many journals. This will mean negotiating with the journal publishers, but in a number of cases these are scientific societies or institutions, some associated with BHL. Given that major partners in BHL have made post-1923 content available, it would nice to extend this to other key taxonomic journals.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Revue Suisse de Zoologie&lt;/i&gt;&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;i&gt;Revue Suisse de Zoologie&lt;/i&gt; has published nearly 10,000 taxonomic names but has essentially zero digital presence, which is extraordinary. Another Swiss journal, &lt;i&gt;Entomologica Basiliensia&lt;/i&gt; is also an obvious candidate.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Revue de Zoologie et de Botanique Africaines&lt;/i&gt;&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;i&gt;Revue de Zoologie et de Botanique Africaines&lt;/i&gt; has published over 5,000 names, and given the interest in providing information resources for Africa (e.g., &lt;a href="http://www.mendeley.com/groups/1681811/bhl-africa/"&gt;http://www.mendeley.com/groups/1681811/bhl-africa/&lt;/a&gt;) this seems an obvious journal to scan completely.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Bulletin of the British Museum (Natural History) journals and books&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The Natural History Museum [formerly British Museum (Natural History)] is a member of BHL so I'd expect it to have better coverage of it's own publications in BHL. There are gaps in journals such as &lt;i&gt;Bulletin of the British Museum (Natural History) Entomology&lt;/i&gt;, which means there is a significant chunk of research published by Museum staff that simply doesn't exist digitally. At one point The Natural History Museum renamed the journals and moved them to Cambridge University Press, resulting in further gaps in digitisation. It's interesting that museums that haven't changed the title of their publications (such as the American Museum of Natural History and the Australian Museum) have better digital coverage than the NHM, which has flirted with various title changes in the last few decades. The Museum also published a series of monographs in the 20th century, many of these aren't in BHL.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;i&gt;Memoirs of the Queensland Museum&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;Memoirs of the Queensland Museum&lt;/i&gt; is an important journal (&gt; 3,000 names) but has only early issues scanned in BHL and recent issues as PDFs on the Museum web site (vulnerable to link rot when the site gets redesigned, as I've discovered to my cost).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Russian journals&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Russian journals contain large numbers of taxonomic descriptions, but their digital presence is patchy. Springer has started to publish translations online (e.g., &lt;a href="http://dx.doi.org/10.1134/S0013873810050155"&gt;http://dx.doi.org/10.1134/S0013873810050155&lt;/a&gt; in &lt;i&gt;Entomological Review&lt;/i&gt;, which is a translation of an article in &lt;i&gt;Zoologicheskii Zhurnal&lt;/i&gt;), but much of the Russian literature seems unavailable in digital form. BHL has spread from it's US-UK origins to BHL-Europe, BHL_China, and BHL_Australia, maybe it's time for BHL-Russia?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There are huge holes in the availability of taxonomic literature (where I equate "availability" with being digitised and online, free or otherwise). But on the other hand I've been pleasantly surprised by just how much taxonomic literature is online. It looks quite feasible to link at least 300,000 animal names to digital publications.&lt;br /&gt;&lt;br /&gt;The journals I've highlighted are just a few obvious candidate for scanning. I suspect that as one goes down the list of taxonomic journals the rate of return will decline, to the point where scanning entire journals will be less efficient than scanning targeted articles.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6199426103259984277?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6199426103259984277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6199426103259984277'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2012/01/journals-i-like-bhl-to-scan.html' title='Journals I&amp;#39;d like BHL to scan'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5805266637238859298</id><published>2011-12-19T20:33:00.001Z</published><updated>2012-01-03T11:03:04.059Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='NLM DTD'/><category scheme='http://www.blogger.com/atom/ns#' term='ZooKeys'/><category scheme='http://www.blogger.com/atom/ns#' term='PLoS'/><category scheme='http://www.blogger.com/atom/ns#' term='iPad'/><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><category scheme='http://www.blogger.com/atom/ns#' term='XSLT'/><title type='text'>Towards an interactive taxonomic article: displaying an article from ZooKeys</title><content type='html'>One of the things I keep revisiting is the way we display scientific articles. Apart from Nature's excellent &lt;a href="http://iphylo.blogspot.com/2010/08/viewing-scientific-articles-on-ipad.html"&gt;iPhone&lt;/a&gt; and iPad apps, most efforts to re-imagine how we display articles are little more than glorified PDF viewers (e.g., &lt;a href="http://iphylo.blogspot.com/2010/08/viewing-scientific-articles-on-ipad_24.html"&gt;the PLoS iPad app&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Part of the challenge is that if we make the article more interactive we immediately confront the problem of how to link to other content. For example, we may have a lovingly crafted ePub view (e.g., Nature's apps), but what happens when the user clicks on a citation to another paper? If the paper is published by the same journal, then potentially it could be viewed using the same viewer, but if not then we are at the mercy of the other publisher. They will have their own ideas of how to display articles, so the simplest fallback is to display the cited article in a web browser view. The problem with this is that it breaks the user experience - the other publisher is unlikely to follow the same conventions for displaying an article and its links. If we are lucky the cited article might be published in an Open Access journal that provides, say, XML based on the NLM DTD standard. Knowing whether an article is &lt;a href="http://iphylo.blogspot.com/2010/12/how-do-i-know-if-article-is-open-access.html"&gt;Open Access or not is not straightforward&lt;/a&gt;, and different journals have their own unique interpretation of the NLM standard.&lt;br /&gt;&lt;br /&gt;Then there is the issue of other kinds of content, such as taxonomic names, specimens, DNA sequences, geographic localities, etc. We lack &lt;a href="http://iphylo.blogspot.com/2010/04/time-for-some-decent-service.html"&gt;decent services&lt;/a&gt; for many of these objects, as a result efforts like &lt;a href="http://hubs.plos.org/web/biodiversity/"&gt;PLoS Biodiversity Hub&lt;/a&gt; end up being underwhelming collections of reformatted journal articles, rather then innovative integrations of biodiversity knowledge.&lt;br /&gt;&lt;br /&gt;With these issues in mind I've started playing with &lt;i&gt;ZooKeys&lt;/i&gt; XML, initially looking at ways to display the article beyond the conventional format. Ultimately I'd like to embed the article in a broader web of citations and data. &lt;i&gt;ZooKeys&lt;/i&gt; articles are available in PDF, HTML, and XML. The HTML has links to taxon pages, maps, etc., which is nice, but I personally find this a little jarring because it interrupts the reading experience. The &lt;i&gt;ZooKeys&lt;/i&gt; web site also surrounds the article with all paraphernalia of a publisher's web site:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-JZnFYtcUgh4/Tu-fhL7FRgI/AAAAAAAABGw/cVGuFOK4wos/zookeys.png?imgmax=800" alt="Zookeys" title="zookeys.png" border="0" width="450" height="298" /&gt;&lt;br /&gt;As a first experiment, I've taken the XML for article &lt;b&gt;At the lower size limit for tetrapods, two new species of the miniaturized frog genus Paedophryne (Anura, Microhylidae)&lt;/b&gt;&lt;a href="http://dx.doi.org/10.3897/zookeys.154.1963"&gt; http://dx.doi.org/10.3897/zookeys.154.1963&lt;/a&gt; and used a XSLT style sheet to reformat the article. I've borrowed some ideas from Nature's apps, such as the font for the title, displaying the abstract in bold, and showing all the figures in the article as thumbnails near the top. I've also added some basic interactivity, which you can see in the video below. Instead of figures being in one place in the article, wherever a figure is mentioned in the article (e.g., "Fig. 1") if you click on the reference to the figure it appears. If the article display a point locality using latitude and longitude, instead of launching a separate browser window with a Google map, click on the locality and the map appears. The idea is that the flow of reading isn't interrupted, figures, maps, and citations all appear in the text.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;iframe src="http://player.vimeo.com/video/33921915?title=0&amp;amp;byline=0&amp;amp;portrait=0&amp;amp;autoplay=0" width="398" height="587" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;This demo (which you can see live at &lt;a href="http://iphylo.org/~rpage/zookeys"&gt;http://iphylo.org/~rpage/zookeys&lt;/a&gt;) is limited, but most of its functionality comes from simply reformatting XML using XSLT. There's a little bit of jQuery for animation, and I ended up having to write a PHP script to convert verbatim latitude and longitude coordinates to the decimal coordinates expected by Google Maps, but it's all very light weight. It wouldn't take much to add some JSON queries to make the taxon names clickable (e.g., showing a summary of a taxon from EOL). Because &lt;i&gt;ZooKeys&lt;/i&gt; uses the NLM DTD for its XML, some of this code could also be applied to other journals, such as PLoS, so we could start to grow a library of linked, interactive taxonomic articles.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5805266637238859298?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5805266637238859298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5805266637238859298'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/12/towards-interactive-taxonomic-article.html' title='Towards an interactive taxonomic article: displaying an article from ZooKeys'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-JZnFYtcUgh4/Tu-fhL7FRgI/AAAAAAAABGw/cVGuFOK4wos/s72-c/zookeys.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8281560032801319335</id><published>2011-12-12T13:06:00.002Z</published><updated>2011-12-15T09:25:11.034Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Australian Faunal Directory'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='list'/><category scheme='http://www.blogger.com/atom/ns#' term='CouchDB'/><title type='text'>Exporting data from Australian Faunal Directory on CouchDB</title><content type='html'>Quick note to self about exporting data from my &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt; project. To export data from a CouchDB view you can use a list function (see &lt;a href="http://wiki.apache.org/couchdb/Formatting_with_Show_and_List"&gt;Formatting with Show and List&lt;/a&gt;). Following the example on the &lt;a href="http://blog.kanapeside.com/csv-export-via-couchdb-list-function"&gt;Kanapes IDE&lt;/a&gt; blog, I created the following list function:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;{&lt;br /&gt;"_id": "_design/publication",&lt;br /&gt;"_rev": "14-467dee8248e97d874f1141411f536848",&lt;br /&gt;"language": "javascript",&lt;br /&gt;"lists": {&lt;br /&gt;"tsv": "function(head,req) {&lt;br /&gt;var row;&lt;br /&gt;start({&lt;br /&gt;'headers': {&lt;br /&gt;'Content-Type': 'text/tsv'&lt;br /&gt;}&lt;br /&gt;});&lt;br /&gt;while(row = getRow()) {&lt;br /&gt;send(row.value + '\\t' + row.key + '\\n');&lt;br /&gt;}}"&lt;br /&gt;},&lt;br /&gt;"views": {&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I can use this function with the view below, which lists Australian Faunal Directory publications by UUID ("value"), indexed by DOI ("key").&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-wsNeSZucCJI/TuX8WtDuW3I/AAAAAAAABGg/ulfCL-PjDuY/couch.png?imgmax=800" alt="Couch" border="0" width="400" /&gt;&lt;br /&gt;&lt;br /&gt;I can get the tab-delimited dump from &lt;a href="http://localhost:5984/afd/_design/publication/_list/tsv/doi"&gt;http://localhost:5984/afd/_design/publication/_list/tsv/doi&lt;/a&gt;. Note that instead of, say, &lt;code&gt;/afd/_design/publication/_view/doi&lt;/code&gt; to get the view, we use &lt;code&gt;/afd/_design/publication/&lt;b&gt;_list/tsv/&lt;/b&gt;doi&lt;/code&gt; to get the tab-delimited dump.&lt;br /&gt;&lt;br /&gt;I've created files listing &lt;a href="http://iphylo.org/~rpage/afd/downloads/doi.txt"&gt;DOIs&lt;/a&gt; and &lt;a href="http://iphylo.org/~rpage/afd/downloads/biostor.txt"&gt;BioStor&lt;/a&gt; ids for publications in the Australian Faunal Directory. I'll play with lists a bit more, specially as I would like to extract the mapping from the &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt; project and add it to the  &lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;iTaxon&lt;/a&gt; project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8281560032801319335?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8281560032801319335'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8281560032801319335'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/12/exporting-data-from-australian-faunal.html' title='Exporting data from Australian Faunal Directory on CouchDB'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-wsNeSZucCJI/TuX8WtDuW3I/AAAAAAAABGg/ulfCL-PjDuY/s72-c/couch.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2055505828008508188</id><published>2011-12-11T15:10:00.001Z</published><updated>2011-12-11T15:10:06.787Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='specimens'/><category scheme='http://www.blogger.com/atom/ns#' term='identifiers'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='DNA barcoding'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenURL'/><category scheme='http://www.blogger.com/atom/ns#' term='GBIF'/><category scheme='http://www.blogger.com/atom/ns#' term='Darwin Core riplet'/><title type='text'>DNA Barcoding, the Darwin Core Triplet, and failing to learn from past mistakes</title><content type='html'>&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-PgSI5SCfIeU/TuTHyjE__NI/AAAAAAAABGM/GziDc0euhkg/banner05.jpg?imgmax=800" alt="Banner05" title="banner05.jpg" border="0" width="100%"  /&gt;&lt;br /&gt;Given various discussions about identifiers, dark taxa, and DNA barcoding that have been swirling around the last few weeks, there's one notion that is starting to bug me more and more. It's the "Darwin Core triplet", which creates identifiers for voucher specimens in the form &amp;lt;institution-code&amp;gt;:&amp;lt;OPTIONAL collection-code&amp;gt;:&amp;lt;specimen-id&amp;gt;. For example, &lt;br /&gt;&lt;br /&gt;MVZ:Herp:246033&lt;br /&gt;&lt;br /&gt;is the identifier for specimen &lt;b&gt;246033&lt;/b&gt; in the &lt;b&gt;Herp&lt;/b&gt;etology collection of the &lt;b&gt;M&lt;/b&gt;useum of &lt;b&gt;V&lt;/b&gt;ertebrate &lt;b&gt;Z&lt;/b&gt;oology (see &lt;a href="http://arctos.database.museum/guid/MVZ:Herp:246033"&gt;http://arctos.database.museum/guid/MVZ:Herp:246033&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;On the face of it this seems a perfectly reasonable idea, and goes some way towards addressing the problem of linking GenBank sequences to vouchers (see, for example, &lt;a href="http://dx.doi.org/10.1016/j.ympev.2009.04.016"&gt;http://dx.doi.org/10.1016/j.ympev.2009.04.016&lt;/a&gt;, preprint at &lt;a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2739410/"&gt;PubMed Central&lt;/a&gt;). But I'd argue that this is a hack, and one which potentially will create the same sort of mess that citation linking was in before the widespread use of DOIs. In other words, it's a fudge to postpone adopting what we really need, namely persistent resolvable identifiers for specimens.&lt;br /&gt;&lt;br /&gt;In many ways the Darwin Core triplet is analogous to an article citation of the form &amp;lt;journal&amp;gt;, &amp;lt;volume&amp;gt;:&amp;lt;starting page&amp;gt;. In order to go from this "triplet" to the digital version of the article we've ended up with OpenURL resolvers, which are basically web services that take this triple and (hopefully) return a link. In practice building OpenURL resolvers gets tricky, not least because you have to deal with ambiguities in the &amp;lt;journal&amp;gt; field. Journal names are often abbreviated, and there are various ways those abbreviations can be constructed. This leads to lists of standard abbreviations of journals and/or tools to map these to standard identifiers for journals, such as ISSNs.&lt;br /&gt;&lt;br /&gt;This should sound familiar to anybody dealing with specimens. Databases such as the &lt;a href="http://www.biorepositories.org/"&gt;Registry of Biological Repositories&lt;/a&gt; and the &lt;a href="http://www.biodiversitycollectionsindex.org/"&gt;Biodiversity Collectuons Index&lt;/a&gt; have been created to provide standardised lists of collection abbreviations (such as MVZ = Museum of Vertebrate Zoology). Indeed, one could easily argue that the what we need is an &lt;a href="http://iphylo.blogspot.com/2008/10/openurl-for-specimens.html"&gt;OpenURL for specimens&lt;/a&gt; (and I've done exactly that).&lt;br /&gt;&lt;br /&gt;As much as there are advantages to OpenURL (nicely articulated in Eric Hellman's post &lt;a href="http://go-to-hellman.blogspot.com/2010/04/when-shall-we-link.html"&gt;When shall we link?&lt;/a&gt;), ultimately this will end in tears. Linking mechanisms that depend on metadata (such as museum acronyms and specimen codes, or journal names) are prone to break as the metadata changes. In the case of journals, publishers can rename entire back catalogues and change the corresponding metadata (see &lt;a href="http://iphylo.blogspot.com/2011/09/orwellian-metadata-making-journals.html"&gt;Orwellian metadata: making journals disappear&lt;/a&gt;), journals can be renamed, merged, or moved to new publishers. In the same way, museums can be rebranded, specimens moved to new institutions, etc. By using a metadata-based identifier we are storing up a world of hurt for someone in the future. Why don't we look at the publishing industry and learn from them? By having unique, resolvable, widely adopted identifiers (in this case DOIs) scientific publishers have created an infrastructure we now take for granted. I can read a paper online, and follow the citations by clicking on the DOIs. It's seamless and by and large it works.&lt;br /&gt;&lt;br /&gt;On could argue that a big advantage of the Darwin Core triplet is that it can identify a specimen even if it doesn't have a web presence (which is another way of saying that maybe it doesn't have a web presence now, but it might in the future). But for me this is the crux of the matter. Why don't these specimens have a web presence? Why is it the case that biodiversity informatics has failed to tackle this? It seems crazy that in the context of digital data (DNA sequences) and digital databases (GenBank) we are constructing unresolvable text strings as identifiers.&lt;br /&gt;&lt;br /&gt;But, of course, much of the specimen data we care about is online, in the form of aggregated records hosted by GBIF. It would be technically trivial for GBIF to assign a decent identifier to these (for example, a DOI) and we could complete the link between sequence and specimen. There are ways this could be done such that these identifiers could be passed on to the home institutions if and when they have the infrastructure to do it (see &lt;a href="http://iphylo.blogspot.com/2009/04/gbif-and-handles-admitting-that.html"&gt;GBIF and Handles: admitting that "distributed" begets "centralized"&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;But for now, we seem determined to postpone having resolvable identifiers for specimens. The Darwin Core triplet may seem a pragmatic solution to the lack of specimen identifiers, but it seems to me it's simply postponing the day we actually get serious about this problem.&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2055505828008508188?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2055505828008508188'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2055505828008508188'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/12/dna-barcoding-darwin-core-triplet-and.html' title='DNA Barcoding, the Darwin Core Triplet, and failing to learn from past mistakes'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-PgSI5SCfIeU/TuTHyjE__NI/AAAAAAAABGM/GziDc0euhkg/s72-c/banner05.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4750439326317100939</id><published>2011-12-06T13:33:00.001Z</published><updated>2011-12-06T13:33:06.645Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='Pando'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Google doesn't like BioStor anymore</title><content type='html'>According to Google Analytics &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; has experienced a big drop in traffic since the start of October:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-9yvNwHg0W3I/Tt4Zj_t87JI/AAAAAAAABFQ/VYDHK7RJvY4/panda.png?imgmax=800" alt="Panda" border="0" width="420" height="134" /&gt;&lt;br /&gt;&lt;br /&gt;At one point I'm getting something like 4500 visits a week, now it's just over a thousand a week. I'm guessing this is due to &lt;a href="http://www.guardian.co.uk/technology/2011/dec/05/google-panda-update-endangered-species"&gt;Google's 'Panda' update&lt;/a&gt;. I suspect part of the problem is that in terms of text content BioStor is actually pretty thin. For each article there is some metadata and a few links, so it probably looks a little like a link farm. The bulk of the content is in the page images, which of course, Google can't read.&lt;br /&gt;&lt;br /&gt;I'd be interested to know of any other sites in the field that have been affected in the same way (or, indeed, sites which have seen no change in their traffic since October).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4750439326317100939?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4750439326317100939'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4750439326317100939'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/12/google-doesn-like-biostor-anymore.html' title='Google doesn&amp;#39;t like BioStor anymore'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-9yvNwHg0W3I/Tt4Zj_t87JI/AAAAAAAABFQ/VYDHK7RJvY4/s72-c/panda.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8479138915542251215</id><published>2011-12-05T16:15:00.001Z</published><updated>2011-12-05T16:15:55.180Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><title type='text'>These are my species -  finding the taxonomic names I published using Mendeley</title><content type='html'>The latest addition to my mapping of taxonomic names to the literature (&lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt;) is the ability for authors with Mendeley accounts to find the names they've published. This is an extension of the "I wrote that" tool I &lt;a href="http://iphylo.blogspot.com/2011/06/i-wrote-that-asserting-authorship-using.html"&gt;developed earlier&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Let's say I want to show the names that a given author has published. I could search by that author's name, but that raises all sorts of issues (see my earlier posts &lt;a href="http://iphylo.blogspot.com/2010/08/readermeter-what-in-name.html"&gt;ReaderMeter: what's in a name?&lt;/a&gt; and &lt;a href="http://iphylo.blogspot.com/2009/01/equivalent-author-names.html"&gt;Equivalent author names&lt;/a&gt;), especially for this database where I have incomplete citations and in many cases lack author names beyond surname.&lt;br /&gt;&lt;br /&gt;Another way to tackle the problem is if I have a list of publications for an author, then all I need to do is match that list to the publications in my taxonomic database. If both lists have identifiers for the publications, such as DOIs, then the task is trivial. But, where do I get these lists? &lt;br /&gt;&lt;br /&gt;An obvious source is &lt;a href="http://www.mendeley.com"&gt;Mendeley&lt;/a&gt;, where people are building lists of their own publications (as well as other publications that they are interested in). For example, my publications are listed at &lt;a href="http://www.mendeley.com/profiles/roderic-page/"&gt;http://www.mendeley.com/profiles/roderic-page/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But I don't want to have to get these lists myself, I'd much rather that a Mendeley user could go to my taxonomic database, say "I have this Mendeley account, show me the names I've published". One reason I'd like to do this is that if I want people to engage with this project it would be nice to be able to offer an immediate reward, in this case, a place where you can show your contribution to the task of cataloguing life on this planet.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Finding my taxonomic names&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If you have a Mendeley account here's what you do:&lt;br /&gt;&lt;br /&gt;Go to &lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt;. At the top right you will see a "Sign in using Mendeley" link.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-ihvKfVh8hno/TtzuIgfI2CI/AAAAAAAABEc/6deCRW7EXQ8/m1.png?imgmax=800" alt="M1" border="0" width="256" height="118" /&gt;&lt;br /&gt;Click this and you will be taken to Mendeley where you will be asked if you'd like to allow &lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt; to connect to your account (if you're already logged in to Mendeley then you'll see an &lt;b&gt;Accept&lt;/b&gt; button, otherwise Mendeley will ask you to log in).&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-YSqscP5n1mM/TtzuJu-eE2I/AAAAAAAABEk/JPC9aAXQwlQ/m2.png?imgmax=800" alt="M2" border="0" width="475" height="195" /&gt;&lt;br /&gt;If you click on &lt;b&gt;Accept&lt;/b&gt; then you will be taken back to my site and you should now see your profile name and picture on the top right:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-89ieXi7nP40/TtzuKneCfXI/AAAAAAAABEo/osYUEzlAQ9E/m3.png?imgmax=800" alt="M3" border="0" width="261" height="146" /&gt;&lt;br /&gt;&lt;br /&gt;If you click on the &lt;b&gt;Profile&lt;/b&gt; link then my site will talk to Mendeley and get a list of your papers and look for them in my database. If it find a paper it outputs the taxonomic names published in that paper. For example, here is my profile:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-mH43oq-rizg/TtzuNxAyPFI/AAAAAAAABFE/5tJobPlGUZA/m4.png?imgmax=800" alt="M4" border="0" width="400" height="393" /&gt;&lt;br /&gt;&lt;br /&gt;Listed are the species of bird lice in the genus &lt;i&gt;Dennyus&lt;/i&gt; described in a paper on which I was a coauthor (&lt;a href="http://dx.doi.org/10.1046/j.1365-3113.1996.d01-13.x"&gt;http://dx.doi.org/10.1046/j.1365-3113.1996.d01-13.x&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;This list is incomplete as earlier papers of mine on crab and isopod taxonomy aren't listed because these lack identifiers. This is something I need to work on, but for now this seems like a simple way to enable someone to go to the &lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt; mapping between taxonomic names and literature&lt;/a&gt; and find the names they've authored.&lt;br /&gt;&lt;br /&gt;If you have a Mendeley account, and your list of publications in Mendeley includes papers describing new animal species, go to &lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt; and try it out.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8479138915542251215?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8479138915542251215'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8479138915542251215'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/12/these-are-my-species-finding-taxonomic.html' title='These are my species -  finding the taxonomic names I published using Mendeley'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-ihvKfVh8hno/TtzuIgfI2CI/AAAAAAAABEc/6deCRW7EXQ8/s72-c/m1.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6984892698505250039</id><published>2011-11-29T14:37:00.001Z</published><updated>2011-11-29T14:37:17.851Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='PDF'/><category scheme='http://www.blogger.com/atom/ns#' term='PMID'/><category scheme='http://www.blogger.com/atom/ns#' term='ION'/><category scheme='http://www.blogger.com/atom/ns#' term='identifiers'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='mapping'/><category scheme='http://www.blogger.com/atom/ns#' term='literature'/><category scheme='http://www.blogger.com/atom/ns#' term='Handles'/><title type='text'>Mapping names to literature: closing in on 250,000 names</title><content type='html'>Following on from my earlier post &lt;a href="http://iphylo.blogspot.com/2011/10/linking-taxonomic-names-to-literature.html"&gt;Linking taxonomic names to literature: beyond digitised 5×3 index cards&lt;/a&gt; I've been slowly updating my latest toy:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/itaxon"&gt;http://iphylo.org/~rpage/itaxon&lt;/a&gt;&lt;a href="http://iphylo.org/~rpage/itaxon/?search=Alpheus+wickstenae"&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/-Vr3nyvG0OAM/TtTuGW8S35I/AAAAAAAABEQ/b_FseR8gArU/alpheus.png?imgmax=800" alt="Alpheus" title="alpheus.png" border="0" width="400" height="318" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This site displays a database mapping over 200,000 animal names to the primary literature, using a mix of identifiers (DOIs, Handles, PubMed, URLs) as well as links to freely available PDFs where they are available. Lots still to do as about a third of the 1.5 million names in the database have citations that my code hasn't been able to parse. There are also lots of gaps that need to be filled in, for example missing DOIs or PubMed identifiers, and a lot of the earlier names are linked by "microcitations" to names, and I'll need to handle those (using code from my earlier project &lt;a href="http://iphylo.blogspot.com/2011/03/nomenclator-zoologicus-meets.html"&gt;Nomenclator Zoologicus meets Biodiversity Heritage Library: linking names directly to literature&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The mapping itself is stored in a database that I'm constantly editing, so this is far from production quality, but I've found it eye-opening just how much literature is available. There is a lot of scope for generating customised lists of papers, for example, primary taxonomic sources for taxa currently on the &lt;a href="http://www.iucnredlist.org/"&gt;IUCN Red List&lt;/a&gt;, or those taxa which have sequences in GenBank (building on the mapping of &lt;a href="http://iphylo.org/linkout"&gt;NCBI taxa onto Wikipedia&lt;/a&gt;). Given that a lot of the relevant literature is in BHL, or available as PDFs, we could do some data mining, such as extracting geographical coordinates, taxonomic names, and citations. And if linked data is your thing, the 110,000 DOIs and nearly 9,000 CiNiii URLs all serve RDF (albeit not without a &lt;a href="http://iphylo.blogspot.com/2011/09/linked-data-that-isn-failings-of-rdf.html"&gt;few problems&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;I've set a "goal" of having 250,000 names mapped to the primary literature, at which point the database interface will get some much-needed attention, but for now have a look for your favourite animal and see if it's original description has been digitised.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6984892698505250039?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6984892698505250039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6984892698505250039'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/mapping-names-to-literature-closing-in.html' title='Mapping names to literature: closing in on 250,000 names'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-Vr3nyvG0OAM/TtTuGW8S35I/AAAAAAAABEQ/b_FseR8gArU/s72-c/alpheus.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2307915492641510639</id><published>2011-11-29T09:41:00.001Z</published><updated>2011-12-01T07:55:28.952Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Bibliography of Life'/><category scheme='http://www.blogger.com/atom/ns#' term='ION'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikispecies'/><category scheme='http://www.blogger.com/atom/ns#' term='ViBRANT'/><title type='text'>Towards the bibliography of life</title><content type='html'>David King et al.'s paper "Towards the bibliography of life" &lt;a href="http://dx.doi.org/10.3897/zookeys.150.2167"&gt;http://dx.doi.org/10.3897/zookeys.150.2167&lt;/a&gt; has just appeared in a special issue of &lt;i&gt;ZooKeys&lt;/i&gt;. I've written a &lt;a href="http://iphylo.blogspot.com/2010/12/first-thoughts-on-citebank-and-bhl.html"&gt;number&lt;/a&gt; of &lt;a href="http://iphylo.blogspot.com/2010/10/mendeley-bhl-and-of-life.html"&gt;posts&lt;/a&gt; on this topic, so I've a few comments.&lt;br /&gt;&lt;br /&gt;King et al. survey some of the issues, but don't really tackle the big issue of how we're going to build this. If we define the "bibliography of life" somewhat narrowly as the list of all papers that have published a scientific name (or a new combination, such as moving a species from one genus to another), then this is a large, but measurable undertaking. According to ION's &lt;a href="http://www.organismnames.com/metrics.htm"&gt;metrics page&lt;/a&gt;, these are the numbers involved (for animals and protozoa):&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;Total New Names&lt;/td&gt;&lt;td align="right"&gt;1,510,402&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total New Genera / Subgenera&lt;/td&gt;&lt;td align="right"&gt;215,242&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total New Species / Subspecies&lt;/td&gt;&lt;td align="right"&gt;1,192,366&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total Other New Names&lt;/td&gt;&lt;td align="right"&gt;102,794&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total New Combinations&lt;/td&gt;&lt;td align="right"&gt;241,296&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total New Synonyms&lt;/td&gt;&lt;td align="right"&gt;260,544&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;Even in the worse case scenario of one name per publication (&lt;a href="http://www.organismnames.com/metrics.htm?page=tsp"&gt;clearly not the case&lt;/a&gt;) this is big, but not insurmountable, task.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Publications not taxa&lt;/b&gt;&lt;br /&gt;Part of the challenge is figuring out the best way to tackle the problem. In the past, most efforts at building taxonomic bibliographies have focussed on specific taxa, which is natural — the bibliographies are being built by taxonomists and they specialise in particular groups. But I'd argue that this is not the most efficient way to tackle the problem. Because the taxonomic literature is so widely dispersed, after the obvious "low hanging fruit" have been collected, considerable effort must be spent tracking down the harder to find citations. There are few economies of scale in this approach. In contrast, if we focus on publications at, say, the level of journal, then we can build a bibliography much more quickly. Once we've found the source, say, for one article, often we could use that information to harvest many articles from the same source (e.g., write scripts to harvest from a digital repository such as a DSpace server, or a digital library such as Gallica). But if we are focussed on a particular taxon, we will ignore the other articles in that journal ("what do I care about fish, &lt;a href="http://www.youtube.com/watch?v=CMNry4PE93Y"&gt;I like turtles&lt;/a&gt;"). &lt;br /&gt;&lt;br /&gt;Put another way, if we imagine a taxa × publication matrix, then we can either go after rows (i.e., a bibliography for a specific taxonomic group), or columns (a list of articles in a specific journal). The article-based approach will be faster, albeit at the cost of finding articles that aren't necessarily relevant to taxonomy. This is why I'm spending what feels like far too much time &lt;a href="http://iphylo.blogspot.com/2010/10/mendeley-bhl-and-of-life.html"&gt;harvesting article lists and uploading these to Mendeley&lt;/a&gt;. It is also one reason BHL has been so successful. They've simply gone after scanning the literature wholesale, rather than focussing on particular taxonomic groups.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-r2Yr_agFMCY/TtSorEQcFOI/AAAAAAAABD8/s3THyqPgQ6U/taxapublicationmatrix.png?imgmax=800" alt="Taxapublicationmatrix" title="taxapublicationmatrix.png" border="0" width="404" height="362" /&gt;&lt;img src="http://lh5.ggpht.com/-L_RsG4MUpbQ/TtSosDu4iEI/AAAAAAAABEA/CEuGYhlOHLc/Wikispecies-logo-en.png?imgmax=800" alt="Wikispecies logo en" title="Wikispecies-logo-en.png" border="0" width="125" height="160" style="float:right;" /&gt;&lt;b&gt;Crowd sourcing and Wikispecies&lt;/b&gt;&lt;br /&gt;Crowd sourcing often strikes me as a euphemism for "we can't be bothered doing the tedious stuff, lets get the public to do it for us (plus it will look like we're engaged with the public)." I'm not denying can work, but I suspect it's not a magic bullet. Perhaps the best crowd sourcing is not to try and bring the crowd to a project, but go where the crowd has already gathered. In this case, an obvious crowd is the &lt;a href="http://species.wikimedia.org/"&gt;Wikispecies&lt;/a&gt; community. Working with the ION database for &lt;a href="http://iphylo.blogspot.com/2011/10/sherborn-presentation-on-open-taxonomy.html"&gt;my Sherborn presentation&lt;/a&gt;, it's clear that the quality of bibliographic data in ION is variable, and rather poor for older references. In contrast, the reference lists on Wikispecies can be very good (e.g., &lt;a href="http://species.wikimedia.org/wiki/George_Albert_Boulenger"&gt;the bibliography for George Boulenger&lt;/a&gt;). There are some issues with Wikispecies, notably the lack of a decent bibliographic template (unlike Wikipedia) so parsing references can be *cough* interesting, but there is scope here to use it to improve other databases. Citation matching can be a challenge, but in this case we have citations indexed by taxonomic name (in both ION and Wikispecies), which greatly reduces the scope of possible matches.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;br /&gt;I think building the "bibliography of life" needs a combination of aggressive data gathering, and avoiding building additional tools unless absolutely needed. There are great tools and communities that can already be leveraged (e.g., Mendeley, Wikispecies), let's make use of them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2307915492641510639?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2307915492641510639'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2307915492641510639'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/towards-bibliography-of-life.html' title='Towards the bibliography of life'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-r2Yr_agFMCY/TtSorEQcFOI/AAAAAAAABD8/s3THyqPgQ6U/s72-c/taxapublicationmatrix.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3693978516971265826</id><published>2011-11-24T07:16:00.001Z</published><updated>2011-11-24T07:31:32.776Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='EOL'/><category scheme='http://www.blogger.com/atom/ns#' term='citation'/><category scheme='http://www.blogger.com/atom/ns#' term='linking'/><title type='text'>BHL needs to engage with publishers (and EOL needs to link to primary literature)</title><content type='html'>Browsing &lt;a href="http://eol.org"&gt;EOL&lt;/a&gt; I stumbled upon the recently described fish &lt;i&gt;Protoanguilla palau&lt;/i&gt;, shown below in an image by &lt;a href="http://www.flickr.com/photos/64191321@N06/"&gt;rairaiken2011&lt;/a&gt;:&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;a href="http://www.flickr.com/photos/64191321@N06/6254171445/" title="Palauan Primitive Cave Eel by rairaiken2011, on Flickr"&gt;&lt;img src="http://farm7.staticflickr.com/6106/6254171445_62fe7afee6.jpg" width="500" height="367" alt="Palauan Primitive Cave Eel"&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Two things struck me, the first is that the &lt;a href="http://eol.org/pages/23402420"&gt;EOL page for this fish&lt;/a&gt; gives absolutely no clue as to where you would to find out more about this fish (apart from an unclickable link to the Wikipedia page &lt;a href="http://en.wikipedia.org/wiki/Protoanguilla"&gt;http://en.wikipedia.org/wiki/Protoanguilla&lt;/a&gt; - seriously, a link that isn't clickable?), despite the fact this fish has been recently described in an Open Access publication ("A 'living fossil eel (Anguilliformes: Protanguillidae, fam. nov.) from an undersea cave in Palau", &lt;a href="http://dx.doi.org/10.1098/rspb.2011.1289"&gt;http://dx.doi.org/10.1098/rspb.2011.1289&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Now that I've got my customary grumble about EOL out of the way, let's look at the article itself. On the first page of the PDF it states:&lt;br /&gt;&lt;blockquote&gt;This article cites 29 articles, 7 of which can be accessed free&lt;br /&gt;&lt;a href="http://rspb.royalsocietypublishing.org/content/early/2011/09/16/rspb.2011.1289.full.html#ref-list-1"&gt;http://rspb.royalsocietypublishing.org/content/early/2011/09/16/rspb.2011.1289.full.html#ref-list-1&lt;/a&gt;&lt;/blockquote&gt;&lt;br /&gt;So 22 of the articles or books cited in this paper are, apparently, not freely available. However, looking at the &lt;a href="http://rspb.royalsocietypublishing.org/content/early/2011/09/16/rspb.2011.1289.full.html#ref-list-1"&gt;list of literature cited&lt;/a&gt; it becomes obvious that rather more of these citations are available online than we might think. For example, there are articles that are in the &lt;a href="http://ww.biodiversitylibrary.org"&gt;Biodiversity Heritage Library&lt;/a&gt; (BHL), e.g.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Regan C. T. 1912 The osteology and classification of the teleostean fishes of the order Apodes. Ann. Mag. Nat. Hist. Ser. 8, 377–387 &lt;a href="http://biostor.org/reference/98316"&gt;http://biostor.org/reference/98316&lt;/a&gt;, &lt;a href="http://www.biodiversitylibrary.org/page/15618586"&gt;http://www.biodiversitylibrary.org/page/15618586&lt;/a&gt;, &lt;a href="http://dx.doi.org/10.1080/00222931208693250"&gt;http://dx.doi.org/10.1080/00222931208693250&lt;/a&gt;&lt;/li&gt;&lt;li&gt;McCosker J. E. 1977 The osteology, classification, and relationships of the eel family Ophichthidae. Proc. Calif. Acad. Sci. 41, 1–123 &lt;a href="http://biostor.org/reference/59597"&gt;http://biostor.org/reference/59597&lt;/a&gt;&lt;/li&gt;&lt;a href="http://www.biodiversitylibrary.org/page/15691453"&gt;http://www.biodiversitylibrary.org/page/15691453&lt;/a&gt;&lt;/ul&gt;&lt;br /&gt;Then there are articles that are available in other digitising projects&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Hay O. P. 1903 On a collection of Upper Cretaceous fishes from Mount Lebanon, Syria, with descriptions of four new genera and nineteen new species. Bull. Am. Mus. Nat. Hist. N. Y. 19, 395–452. &lt;a href="http://hdl.handle.net/2246/1500"&gt;http://hdl.handle.net/2246/1500&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Nelson G. J. 1966 Gill arches of fishes of the order Anguilliformes. Pac. Sci. 20, 391–408. &lt;a href="http://hdl.handle.net/10125/7805"&gt;http://hdl.handle.net/10125/7805&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Furthermore, there are articles that aren't necessarily free, but which have been digitised and have DOIs that have been missed by the publisher, such as the Regan paper above, and&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt; Trewavas E. 1932 A contribution to the classification of the fishes of the order Apodes, based on the osteology of some rare eels. Proc. Zool. Soc. Lond. 1932, 639–659. &lt;a href="http://dx.doi.org/10.1111/j.1096-3642.1932.tb01089.x"&gt;http://dx.doi.org/10.1111/j.1096-3642.1932.tb01089.x&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;So, the &lt;i&gt;Proceedings of the Royal Society&lt;/i&gt; has underestimated just how many citations the reader can view online. The problem, of course, is how does a publisher discover these additional citations? Some have been missed because of sloppy bibliographic data. The missing DOIs are probably because the Regan citation lacks a volume number, and the Trewavas paper uses a different volume number to that used by Wiley (who digitised &lt;i&gt;Proc. Zool. Soc. Lond.&lt;/i&gt;). But the content in BHL and other digital archives will be missed because finding these is not part of a publisher's normal workflow. Typically citations are matched by using services ultimately provided by &lt;a href="http://www.crossref.org"&gt;CrossRef&lt;/a&gt;, and the bulk of BHL content is not in CrossRef.&lt;br /&gt;&lt;br /&gt;So it seems there's an opportunity here for someone to provide a service for publishers that adds value to their content in at least three ways:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Add missing DOIs due to problematic citations for older literature&lt;/li&gt;&lt;li&gt;Add links to BHL content&lt;/li&gt;&lt;li&gt;Add links to content in additional digitisation projects, such as journal archives in DSpace respositories&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;For readers this would enhance their experience (more of the literature becomes accessible to them), and for BHL and the repositories it will drive more readers to those repositories (how many people reading the paper on &lt;i&gt;Protoanguilla palau&lt;/i&gt; have even heard of BHL?). I've said &lt;a href="http://iphylo.blogspot.com/2010/04/biodiversity-informatic-fail-and-what.html"&gt;most of this before&lt;/a&gt;, but I really think there's an opportunity here to provide services to the publishing industry, and we don't seem to be grasping it yet.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3693978516971265826?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3693978516971265826'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3693978516971265826'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/bhl-needs-to-engage-with-publishers-and.html' title='BHL needs to engage with publishers (and EOL needs to link to primary literature)'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6427304865508932650</id><published>2011-11-23T16:36:00.001Z</published><updated>2011-11-23T16:36:23.655Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='history flow'/><category scheme='http://www.blogger.com/atom/ns#' term='SVG'/><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><title type='text'>Wikipedia History Flow tool now in GitHub</title><content type='html'>Inspired by a &lt;a href="http://iphylo.blogspot.com/2009/09/visualising-edit-history-of-wikipedia.html#comment-370438935"&gt;comment on my post &lt;a href="http://iphylo.blogspot.com/2009/09/visualising-edit-history-of-wikipedia.html"&gt;Visualising edit history of a Wikipedia page&lt;/a&gt;, the code I use to make history flow diagrams like the one below is now in GitHub at &lt;a href="https://github.com/rdmpage/wikihistoryflow"&gt;https://github.com/rdmpage/wikihistoryflow&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-lq2atbxqTms/Ts0hBPfgs2I/AAAAAAAABDs/EQtPNzv-_tM/historyflow.png?imgmax=800" alt="Historyflow" border="0" width="400" height="192" /&gt;&lt;br /&gt;&lt;br /&gt;There is also a live version at &lt;a href="http://iphylo.org/~rpage/wikihistoryflow"&gt;http://iphylo.org/~rpage/wikihistoryflow&lt;/a&gt;. If you enter the name of a Wikipedia page the tool will display the edit history with columns representing page versions and individual contributors (people and bots) distinguished by different colours.&lt;br /&gt;&lt;br /&gt;This tool will fall over for pages with a lengthy history of edits, and requires a web browser that can support SVG, but it's a fun visualisation, and may inspire someone to do this properly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6427304865508932650?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6427304865508932650'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6427304865508932650'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/wikipedia-history-flow-tool-now-in.html' title='Wikipedia History Flow tool now in GitHub'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-lq2atbxqTms/Ts0hBPfgs2I/AAAAAAAABDs/EQtPNzv-_tM/s72-c/historyflow.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6714944392061494559</id><published>2011-11-22T13:28:00.001Z</published><updated>2011-11-22T13:28:18.175Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='stackoverflow'/><category scheme='http://www.blogger.com/atom/ns#' term='JSONP'/><category scheme='http://www.blogger.com/atom/ns#' term='mod_rewrite'/><category scheme='http://www.blogger.com/atom/ns#' term='Apache'/><category scheme='http://www.blogger.com/atom/ns#' term='jQuery'/><title type='text'>Apache mod_rewrite and question marks "?"</title><content type='html'>Quick note to self in case I (inevitably) forget later. If you are using Apache mod_rewrite to make nice, clean URLs, and are also supporting JSONP, you may run into the situation where you have code that wants to append "?callback=xxx" to your URL (e.g., a cross-domain AJAX call in jQuery). Imagine you have a nice clean URL &lt;code&gt;/user/123&lt;/code&gt;, which actually corresponds to &lt;code&gt;user.php?id=123&lt;/code&gt;. If you append &lt;code&gt;?callback=xxx&lt;/code&gt; to the URL then chances are the code will break, because mod_rewrite will rewrite the URL to something like &lt;code&gt;user.php?id=123?callback=xxx&lt;/code&gt;. What you actually want to send to your web server is &lt;code&gt;user.php?id=123&amp;callback=xxx&lt;/code&gt; (note the &amp; before "callback"). After much grief trying to figure out how to coerce Apache mod_rewrite into handling this situation I found &lt;a href="http://stackoverflow.com/questions/822421/match-question-mark-in-mod-rewrite-rule-regex"&gt;the answer, of course, on Stack Overflow&lt;/a&gt;. If you use the &lt;code&gt;[QSA]&lt;/code&gt; flag, Apache will append the additional callback parameter onto the end of the rewritten URL, so JSONP will now work. Once again, Stack Overflow turned a show-stopper into a learning experience.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6714944392061494559?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6714944392061494559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6714944392061494559'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/quick-note-to-self-in-case-i-inevitably.html' title='Apache mod_rewrite and question marks &amp;quot;?&amp;quot;'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-1533435313657223737</id><published>2011-11-18T14:25:00.001Z</published><updated>2011-11-18T14:25:33.553Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Gallica'/><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='articles'/><category scheme='http://www.blogger.com/atom/ns#' term='Google books'/><title type='text'>Adding article-level metadata to BHL</title><content type='html'>Recently I've been thinking about the best ways to make article-level metadata from &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; more widely available. For example, for someone visiting the &lt;a href="http://www.biodiversitylibrary.org"&gt;BHL site&lt;/a&gt; there is no easy way to find articles, which are the basic unit for much of the scientific literature. How hard would it be to add articles to BHL? In the past I've wanted an all-singing all dancing article-level interface to BHL content (sort of BioStor on steroids), but that's a way off, and ideally would have a broader scope than BHL. So instead I've been thinking of ways to add articles to BHL without requiring a lot of re-engineering of BHL itself.&lt;br /&gt;&lt;br /&gt;Looking at other digital archive projects like &lt;a href="http://gallica.bnf.fr/"&gt;Gallica&lt;/a&gt; and &lt;a href="http://books.google.co.uk/"&gt;Google Books&lt;/a&gt; it strikes me that if the BHL interface to a scanned item had a "Contents" drop down menu then users would be able to go to individual articles very easily. Below is a screen shot of how Gallica does this (see &lt;a href="http://gallica.bnf.fr/ark:/12148/bpt6k61331684/f57"&gt;http://gallica.bnf.fr/ark:/12148/bpt6k61331684/f57&lt;/a&gt;). &lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-QmCQYqFQdS4/TsZq07Fh2GI/AAAAAAAABDM/LoclO4DZQ2g/gallica.png?imgmax=800" alt="Gallica" title="gallica.png" border="0" width="400" height="255" /&gt;&lt;br /&gt;&lt;br /&gt;There's also a screen shot of something similar in Google Books (see &lt;a href="http://books.google.co.uk/books?id=PkvoRnAM6WUC"&gt;http://books.google.co.uk/books?id=PkvoRnAM6WUC&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-A231Xj5ObjE/TsZq2r3GEFI/AAAAAAAABDc/BmUMa3CkdgM/contents.png?imgmax=800" alt="Contents" title="contents.png" border="0" width="400" height="304" /&gt;&lt;br /&gt;&lt;br /&gt;The idea would be that if BioStor had found articles within a scanned item, they would be listed in the contents menu (title, author, starting page), and if the user clicked on the article title then the BHL viewer would jump to that page. If there were no known articles, but the scanned item had a table of contents flagged (e.g., &lt;a href="http://www.biodiversitylibrary.org/item/25703"&gt;http://www.biodiversitylibrary.org/item/25703&lt;/a&gt;) then the menu could function as a button that takes you to that page. If there are no articles or contents, then the menu could be grayed out, or simply not displayed. This way the interface would work for books, monographs, and journal volumes.&lt;br /&gt;&lt;br /&gt;Now, admittedly this is not the most elegant interface, and it treats articles as fragments of books rather than individual units, but it would be a start. It would also require minimal effort both on the part of BHL (who need to add the contents button), and myself (it would be easy to create a dump of the article titles indexed by scanned item). &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-1533435313657223737?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1533435313657223737'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1533435313657223737'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/recently-ive-been-thinking-about-best.html' title='Adding article-level metadata to BHL'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-QmCQYqFQdS4/TsZq07Fh2GI/AAAAAAAABDM/LoclO4DZQ2g/s72-c/gallica.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5501567101829349220</id><published>2011-11-18T12:51:00.001Z</published><updated>2011-11-18T12:51:47.901Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='jQueryMobile'/><category scheme='http://www.blogger.com/atom/ns#' term='javascript'/><category scheme='http://www.blogger.com/atom/ns#' term='Nature'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><title type='text'>Nature iPhone app clone in GitHub</title><content type='html'>One thing I'm increasingly conscious of is that I've a lot of demos and toy projects hanging around and the code for most of these isn't readily available. So, I plan to clean these up and put them in &lt;a href="http://github.com"&gt;GitHub&lt;/a&gt; so others can explore the code, and reuse it if they see fit.&lt;br /&gt;&lt;br /&gt;First up is the code to create a HTML+Javascript clone of Nature's iPhone app, as described in an &lt;a href="http://iphylo.blogspot.com/2010/12/viewing-scientific-articles-on-ipad.html"&gt;earlier post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQDKnlA2n7I/AAAAAAAAAzg/smFsn6PMkzA/photo.PNG?imgmax=800" alt="photo.PNG" border="0" width="200" /&gt;&lt;/td&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQDK4xYdlKI/AAAAAAAAAzk/C0aeQvSfd8c/photo.PNG?imgmax=800" alt="photo.PNG" border="0" width="200"  /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;There's a live version of the clone here &lt;a href="http://iphylo.org/~rpage/natureiphone/"&gt;here&lt;/a&gt;. and the code is now available from GitHub at &lt;a href="https://github.com/rdmpage/natureiphone"&gt;https://github.com/rdmpage/natureiphone&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5501567101829349220?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5501567101829349220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5501567101829349220'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/11/one-thing-im-increasingly-conscious-of.html' title='Nature iPhone app clone in GitHub'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_Gct8lVAxKqQ/TQDKnlA2n7I/AAAAAAAAAzg/smFsn6PMkzA/s72-c/photo.PNG?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-559832431372609656</id><published>2011-10-28T22:05:00.001+01:00</published><updated>2011-10-28T22:05:51.015+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ION'/><category scheme='http://www.blogger.com/atom/ns#' term='Open access'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='Sherborn'/><category scheme='http://www.blogger.com/atom/ns#' term='biodiversity informatics'/><category scheme='http://www.blogger.com/atom/ns#' term='ICZN'/><title type='text'>Sherborn presentation on Open Taxonomy</title><content type='html'>Here is my presentation from today's &lt;a href="http://iczn.org/content/anchoring-biodiversity-information-sherborn-21st-century-and-beyond"&gt;Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond&lt;/a&gt; meeting.&lt;br /&gt;&lt;div style="width:425px" id="__ss_9929092"&gt;&lt;strong style="display:block;margin:12px 0 4px"&gt;&lt;a href="http://www.slideshare.net/rdmpage/open-taxonomy" title="Open taxonomy" target="_blank"&gt;Open taxonomy&lt;/a&gt;&lt;/strong&gt;&lt;iframe src="http://www.slideshare.net/slideshow/embed_code/9929092" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"&gt;&lt;/iframe&gt;&lt;div style="padding:5px 0 12px"&gt; View more &lt;a href="http://www.slideshare.net/" target="_blank"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/rdmpage" target="_blank"&gt;Roderic Page&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;All the presentations will be posted online, along with podcasts of the audio. Meantime, presentations by &lt;a href="http://www.slideshare.net/DavidRemsen/remsen-sherborne"&gt;Dave Remsen&lt;/a&gt; and &lt;a href="http://www.slideshare.net/chrisfreeland/approaches-to-preserving-digitized-taxonomic-data"&gt;Chris Freeland&lt;/a&gt; are already online.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-559832431372609656?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/559832431372609656'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/559832431372609656'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/sherborn-presentation-on-open-taxonomy.html' title='Sherborn presentation on Open Taxonomy'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7535141816798096827</id><published>2011-10-27T17:56:00.001+01:00</published><updated>2011-10-27T17:56:36.494+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='ION'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='linking'/><category scheme='http://www.blogger.com/atom/ns#' term='literature'/><title type='text'>Linking taxonomic names to literature: beyond digitised 5×3 index cards</title><content type='html'>&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-vN4MtQw0JOQ/TqmNQqeSKjI/AAAAAAAABB4/TZKJbuI0uzQ/pubs.png?imgmax=800" alt="Pubs" title="pubs.png" border="0" width="394" height="360" /&gt;&lt;br /&gt;Tomorrow is the &lt;a href="http://iczn.org/content/anchoring-biodiversity-information-sherborn-21st-century-and-beyond"&gt;Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond&lt;/a&gt; meeting. It should be an interesting gathering, albeit overshadowed by the &lt;a href="http://markmail.org/message/mdsp4teoismve42m"&gt;sudden death of Frank Bisby&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'm giving a talk entitled "Open Taxonomy", in which I argue that most taxonomic databases are little more than digitised collections of 5×3 index cards, where literature is treated as dumb citation strings rather than as resources with digital identifiers. To make the discussion concrete I've created a mapping between the &lt;a href="http://www.organismnames.com/"&gt;Index to Organism Names (ION)&lt;/a&gt; database and a range of bibliographic sources, such as CrossRef (for DOIs), &lt;a href="http://biostor.org/"&gt;BioStor&lt;/a&gt;, JSTOR, etc. &lt;br /&gt;&lt;br /&gt;This mapping is online at &lt;a href="http://iphylo.org/~rpage/itaxon/"&gt;http://iphylo.org/~rpage/itaxon/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So far I've managed to link some 200,000 animal names to a literature identifier, and a good fraction of these articles are freely available, either as images in BioStor and Gallica (for I've created a simple viewer) or as PDFs (which are displayed using Google Docs.&lt;br /&gt;&lt;br /&gt;Some examples are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://iphylo.org/~rpage/itaxon/?search=Geothelphusa%20marmorata"&gt;&lt;i&gt;Geothelphusa marmorata&lt;/i&gt;&lt;/a&gt; (BioStor)&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.org/~rpage/itaxon/?search=Rhopalione"&gt;&lt;i&gt;Rhopalione&lt;/i&gt;&lt;/a&gt; (Gallica)&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.org/~rpage/itaxon/?search=Potamotrygon%20garouaensis"&gt;&lt;i&gt;Potamotrygon garouaensis&lt;/i&gt;&lt;/a&gt; (PDF)&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.org/~rpage/itaxon/?search=Endacusta%20kirrimurra"&gt;&lt;i&gt;Endacusta kirrimurra&lt;/i&gt;&lt;/a&gt; (Google Books)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;The site is obviously a work in progress, and there's a lot to be done to the interface, but I hope it conveys the key point: a significant fraction of the primary taxonomic literature is online, and we should be linking to this. The days of digitised 5×3 index cards are past.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7535141816798096827?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7535141816798096827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7535141816798096827'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/linking-taxonomic-names-to-literature.html' title='Linking taxonomic names to literature: beyond digitised 5×3 index cards'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-vN4MtQw0JOQ/TqmNQqeSKjI/AAAAAAAABB4/TZKJbuI0uzQ/s72-c/pubs.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6443887464465485508</id><published>2011-10-21T09:35:00.001+01:00</published><updated>2011-10-21T09:35:36.726+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TDWG'/><category scheme='http://www.blogger.com/atom/ns#' term='Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><category scheme='http://www.blogger.com/atom/ns#' term='geography'/><title type='text'>Final thoughts on TDWG RDF challenge</title><content type='html'>Quick final comment on the &lt;a href="http://iphylo.blogspot.com/2011/10/tdwg-challenge-what-is-rdf-good-for.html"&gt;TDWG Challenge - what is RDF good for?&lt;/a&gt;. As I noted in the &lt;a href="http://iphylo.blogspot.com/2011/10/reflections-on-tdwg-rdf.html"&gt;previous post&lt;/a&gt;, Olivier Rovellotti (&lt;a href="http://twitter.com/orovellotti"&gt;@orovellotti&lt;/a&gt;) and Javier de la Torre (&lt;a href="http://twitter.com/jatorre"&gt;@jatorre&lt;/a&gt;) have produced some nice visualisations of the frog data set:&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/-L2ZOHEYtA2M/TqEu1iT8ifI/AAAAAAAABBU/gi_fNkzoC-g/cartodb.png?imgmax=800" alt="Cartodb" title="cartodb.png" border="0" width="400" height="254" /&gt;&lt;br /&gt;Nice as these are, I can't help feeling that they actually help make my point about the current state of RDF in biodiversity informatics. The only responses to my challenge have been to use geography, where the shared coordinate system (latitude and longitude) facilitates integration. Having geographic coordinates means we don't need to have shared identifiers to do something useful, and I think it's no accident that &lt;a href="http://www.gbif.org"&gt;GBIF&lt;/a&gt; is one of the most important resources we have. Geography is also the easiest way to integrate across other fields (e.g., climate). &lt;br /&gt;&lt;br /&gt;But what of the other dimensions? What I'm really after are links across datasets that enable us to make new inferences, or address interesting questions. The challenge is still there...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6443887464465485508?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6443887464465485508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6443887464465485508'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/final-thoughts-on-tdwg-rdf-challenge.html' title='Final thoughts on TDWG RDF challenge'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-L2ZOHEYtA2M/TqEu1iT8ifI/AAAAAAAABBU/gi_fNkzoC-g/s72-c/cartodb.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7195339787892934087</id><published>2011-10-20T09:57:00.001+01:00</published><updated>2011-10-20T18:25:03.321+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TDWG'/><category scheme='http://www.blogger.com/atom/ns#' term='IUCN'/><category scheme='http://www.blogger.com/atom/ns#' term='ION'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='SPARQL'/><category scheme='http://www.blogger.com/atom/ns#' term='conservation status'/><category scheme='http://www.blogger.com/atom/ns#' term='Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='frogs'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><category scheme='http://www.blogger.com/atom/ns#' term='DBpedia'/><title type='text'>Reflections on the TDWG RDF "Challenge"</title><content type='html'>This is a follow up to my previous post &lt;a href="http://iphylo.blogspot.com/2011/10/tdwg-challenge-what-is-rdf-good-for.html"&gt;TDWG Challenge - what is RDF good for?&lt;/a&gt; where I'm being, frankly, a pain in the arse, and asking why we bother with RDF? In many ways I'm not particularly anti-RDF, but it bothers me that there's a big disconnect between the reasons we are going down this route and how we are actually using RDF. In other words, if you like RDF and buy the promise of large-scale data integration while still being decentralised ("the web as database"), then we're doing it wrong.&lt;br /&gt;&lt;br /&gt;As an aside, my own perspective is one of data integration. I want to link all this stuff together so I can follow a path through multiple datasets and extract the information I want. In other words, "linked data" (little "l", little "d"). I'm interested in fairly light weight integration, typically through shared identifiers. There is also integration via ontologies, which strikes me as a different, if related,  problem, that in many ways is closer to the original vision of the Semantic Web as a giant inference engine. I think the concerns (and experience) of these two communities are somewhat different. I don't particularly care about ontologies, I want key-value pairs and reusable identifiers so I can link stuff together. If, for example, you're working on something like &lt;a href="http://phenoscape.org/"&gt;Phenoscape&lt;/a&gt;, then I think you have a rather more circumscribed set of data, with potentially complicated interrelationships that you want to make inferences on, in which case ontologies are your friend.&lt;br /&gt;&lt;br /&gt;So, I posted a "challenge". It wasn't a challenge so much as a set of RDF to play with. What I'm interested in is seeing how easily we can string this data together to learn stuff. For example, using the RDF I posted earlier here is a table listing the name, conservation status, publication DOI and date, and (where available) image from Wikipedia for frogs with sequences in GenBank. &lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody style="font-size:10px;"&gt;&lt;tr&gt;&lt;th&gt;Species&lt;/th&gt;&lt;th&gt;Status&lt;/th&gt;&lt;th&gt;DOI&lt;/th&gt;&lt;th&gt;Year described&lt;/th&gt;&lt;th&gt;Image&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Atelopus nanay&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/0018-0831(2002)058[0229:TNSOAA]2.0.CO;2"&gt;http://dx.doi.org/10.1655/0018-0831(2002)058[0229:TNSOAA]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus mariposa&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1466962"&gt;http://dx.doi.org/10.2307/1466962&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1992&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phrynopus kauneorum&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1565993"&gt;http://dx.doi.org/10.2307/1565993&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus eunaster&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563010"&gt;http://dx.doi.org/10.2307/1563010&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1973&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus amadeus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1445557"&gt;http://dx.doi.org/10.2307/1445557&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1987&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus lamprotes&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563010"&gt;http://dx.doi.org/10.2307/1563010&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1973&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Churamiti maridadi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1080/21564574.2002.9635467"&gt;http://dx.doi.org/10.1080/21564574.2002.9635467&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus thorectes&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1445381"&gt;http://dx.doi.org/10.2307/1445381&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1988&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus apostates&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563010"&gt;http://dx.doi.org/10.2307/1563010&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1973&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Leptodactylus silvanimbus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563691"&gt;http://dx.doi.org/10.2307/1563691&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1980&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus sciagraphus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563010"&gt;http://dx.doi.org/10.2307/1563010&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1973&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Bufo chavin&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1643/0045-8511(2001)001[0216:NSOBAB]2.0.CO;2"&gt;http://dx.doi.org/10.1643/0045-8511(2001)001[0216:NSOBAB]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2001&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus fowleri&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563010"&gt;http://dx.doi.org/10.2307/1563010&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1973&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Ptychohyla hypomykter&lt;/i&gt;&lt;/td&gt;&lt;td&gt;CR&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/3672060"&gt;http://dx.doi.org/10.2307/3672060&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1993&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Hyla suweonensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1444138"&gt;http://dx.doi.org/10.2307/1444138&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1980&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Proceratophrys concavitympanum&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1565412"&gt;http://dx.doi.org/10.2307/1565412&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phrynopus bufoides&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1643/CH-04-278R2"&gt;http://dx.doi.org/10.1643/CH-04-278R2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Boophis periegetes&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1111/j.1096-3642.1995.tb01427.x"&gt;http://dx.doi.org/10.1111/j.1096-3642.1995.tb01427.x&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1995&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Boophis_periegetes.jpg/200px-Boophis_periegetes.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phyllomedusa duellmani&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1444649"&gt;http://dx.doi.org/10.2307/1444649&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1982&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Boophis liami&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853803322440772"&gt;http://dx.doi.org/10.1163/156853803322440772&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Hyalinobatrachium ignioculus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1670/0022-1511(2003)037[0091:ANSOHA]2.0.CO;2"&gt;http://dx.doi.org/10.1670/0022-1511(2003)037[0091:ANSOHA]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Proceratophrys cururu&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1447712"&gt;http://dx.doi.org/10.2307/1447712&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1c/Proceratophrys_cururu.jpg/200px-Proceratophrys_cururu.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Amolops bellulus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1643/0045-8511(2000)000[0536:ABANSO]2.0.CO;2"&gt;http://dx.doi.org/10.1643/0045-8511(2000)000[0536:ABANSO]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Centrolene bacatum&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1564528"&gt;http://dx.doi.org/10.2307/1564528&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1994&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Litoria kumae&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/ZO03008"&gt;http://dx.doi.org/10.1071/ZO03008&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2004&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phrynopus pesantesi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1643/CH-04-278R2"&gt;http://dx.doi.org/10.1643/CH-04-278R2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Gastrotheca galeata&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1443617"&gt;http://dx.doi.org/10.2307/1443617&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1978&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Paratelmatobius cardosoi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1447976"&gt;http://dx.doi.org/10.2307/1447976&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1999&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Rhacophorus catamitus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/0733-1347(2002)016[0046:NAPKPF]2.0.CO;2"&gt;http://dx.doi.org/10.1655/0733-1347(2002)016[0046:NAPKPF]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Huia melasma&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1643/CH-04-137R3"&gt;http://dx.doi.org/10.1643/CH-04-137R3&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Telmatobius vilamensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;DD&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/0018-0831(2003)059[0253:ANSOTA]2.0.CO;2"&gt;http://dx.doi.org/10.1655/0018-0831(2003)059[0253:ANSOTA]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Callulina kisiwamsitu&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1670/209-03A"&gt;http://dx.doi.org/10.1670/209-03A&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2004&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Arthroleptis nikeae&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1080/21564574.2003.9635486"&gt;http://dx.doi.org/10.1080/21564574.2003.9635486&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus amplinympha&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1139/z94-297"&gt;http://dx.doi.org/10.1139/z94-297&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1994&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus glaphycompus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563010"&gt;http://dx.doi.org/10.2307/1563010&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1973&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Bufo tacanensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1439700"&gt;http://dx.doi.org/10.2307/1439700&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1952&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Bufo_tacanensis_distribution.svg/200px-Bufo_tacanensis_distribution.svg.png"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phrynopus bracki&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1445826"&gt;http://dx.doi.org/10.2307/1445826&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1990&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Telmatobius sibiricus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/0018-0831(2003)059[0127:ANSOTF]2.0.CO;2"&gt;http://dx.doi.org/10.1655/0018-0831(2003)059[0127:ANSOTF]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Cochranella mache&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/03-74"&gt;http://dx.doi.org/10.1655/03-74&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2004&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus melacara&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1466962"&gt;http://dx.doi.org/10.2307/1466962&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1992&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Plectrohyla glandulosa&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1441046"&gt;http://dx.doi.org/10.2307/1441046&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1964&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Aglyptodactylus laticeps&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1111/j.1439-0469.1998.tb00775.x"&gt;http://dx.doi.org/10.1111/j.1439-0469.1998.tb00775.x&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus glamyrus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1565664"&gt;http://dx.doi.org/10.2307/1565664&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1997&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Gastrotheca trachyceps&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1564375"&gt;http://dx.doi.org/10.2307/1564375&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1987&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus grahami&lt;/i&gt;&lt;/td&gt;&lt;td&gt;EN&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563929"&gt;http://dx.doi.org/10.2307/1563929&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1979&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Litoria havina&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/ZO9930225"&gt;http://dx.doi.org/10.1071/ZO9930225&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1993&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Crinia riparia&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1440794"&gt;http://dx.doi.org/10.2307/1440794&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1965&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Litoria longirostris&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1443159"&gt;http://dx.doi.org/10.2307/1443159&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1977&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Osteocephalus mutabor&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853802320877609"&gt;http://dx.doi.org/10.1163/156853802320877609&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Leptobrachium nigrops&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1440966"&gt;http://dx.doi.org/10.2307/1440966&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1963&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Pseudis tocantins&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1590/S0101-81751998000400011"&gt;http://dx.doi.org/10.1590/S0101-81751998000400011&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Mantidactylus argenteus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1111/j.1096-3642.1919.tb02128.x"&gt;http://dx.doi.org/10.1111/j.1096-3642.1919.tb02128.x&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1919&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Mantidactylus_argenteus02.jpg/200px-Mantidactylus_argenteus02.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Aglyptodactylus securifer&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1111/j.1439-0469.1998.tb00775.x"&gt;http://dx.doi.org/10.1111/j.1439-0469.1998.tb00775.x&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/9/94/Aglyptodactylus_securifer.jpg/200px-Aglyptodactylus_securifer.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Pseudis cardosoi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853800507264"&gt;http://dx.doi.org/10.1163/156853800507264&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Podonectes_cardosoi.jpg/200px-Podonectes_cardosoi.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Uperoleia inundata&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/AJZS079"&gt;http://dx.doi.org/10.1071/AJZS079&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1981&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Litoria pronimia&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/ZO9930225"&gt;http://dx.doi.org/10.1071/ZO9930225&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1993&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Litoria paraewingi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/ZO9760283"&gt;http://dx.doi.org/10.1071/ZO9760283&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1976&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Philautus aurifasciatus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853887X00036"&gt;http://dx.doi.org/10.1163/156853887X00036&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1987&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Proceratophrys avelinoi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853893X00156"&gt;http://dx.doi.org/10.1163/156853893X00156&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1993&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Proceratophrys_avelinoi.jpg/200px-Proceratophrys_avelinoi.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Osteocephalus deridens&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853800507525"&gt;http://dx.doi.org/10.1163/156853800507525&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Gephyromantis boulengeri&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1111/j.1096-3642.1919.tb02128.x"&gt;http://dx.doi.org/10.1111/j.1096-3642.1919.tb02128.x&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1919&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/75/Mantidactylus_boulengeri_map-fr.svg/200px-Mantidactylus_boulengeri_map-fr.svg.png"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Crossodactylus caramaschii&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1446907"&gt;http://dx.doi.org/10.2307/1446907&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1995&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Rana yavapaiensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1445338"&gt;http://dx.doi.org/10.2307/1445338&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1984&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Boophis lichenoides&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853898X00025"&gt;http://dx.doi.org/10.1163/156853898X00025&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Boophis_lichenoides01.jpg/200px-Boophis_lichenoides01.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Megistolotis lignarius&lt;/i&gt;&lt;/td&gt;&lt;td&gt;LC&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/ZO9790135"&gt;http://dx.doi.org/10.1071/ZO9790135&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1979&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Ansonia endauensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;NE&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/0018-0831(2006)62[466:ANSOAS]2.0.CO;2"&gt;http://dx.doi.org/10.1655/0018-0831(2006)62[466:ANSOAS]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2006&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Ansonia kraensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;NE&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2108/zsj.22.809"&gt;http://dx.doi.org/10.2108/zsj.22.809&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Arthroleptella landdrosia&lt;/i&gt;&lt;/td&gt;&lt;td&gt;NT&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1565359"&gt;http://dx.doi.org/10.2307/1565359&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Litoria jungguy&lt;/i&gt;&lt;/td&gt;&lt;td&gt;NT&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1071/ZO02069"&gt;http://dx.doi.org/10.1071/ZO02069&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2004&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Litoria_jungguy.jpg/200px-Litoria_jungguy.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phrynobatrachus phyllophilus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;NT&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1565925"&gt;http://dx.doi.org/10.2307/1565925&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Philautus ingeri&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853887X00036"&gt;http://dx.doi.org/10.1163/156853887X00036&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1987&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Gastrotheca dendronastes&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1445088"&gt;http://dx.doi.org/10.2307/1445088&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1983&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Hyperolius cystocandicans&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1443911"&gt;http://dx.doi.org/10.2307/1443911&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1977&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Boophis sambirano&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1080/21564574.2005.9635520"&gt;http://dx.doi.org/10.1080/21564574.2005.9635520&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Ansonia torrentis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853883X00021"&gt;http://dx.doi.org/10.1163/156853883X00021&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1983&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Telmatobufo australis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1563086"&gt;http://dx.doi.org/10.2307/1563086&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1972&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Stefania coxi&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/0018-0831(2002)058[0327:EDOSAH]2.0.CO;2"&gt;http://dx.doi.org/10.1655/0018-0831(2002)058[0327:EDOSAH]2.0.CO;2&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2002&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Oreolalax multipunctatus&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1564828"&gt;http://dx.doi.org/10.2307/1564828&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1993&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Eleutherodactylus guantanamera&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1466962"&gt;http://dx.doi.org/10.2307/1466962&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1992&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Spicospina flammocaerulea&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1447757"&gt;http://dx.doi.org/10.2307/1447757&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1997&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Spicospina_distribution.png/200px-Spicospina_distribution.png"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Cycloramphus acangatan&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1655/02-78"&gt;http://dx.doi.org/10.1655/02-78&lt;/a&gt;&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Leiopelma pakeka&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1080/03014223.1998.9517554"&gt;http://dx.doi.org/10.1080/03014223.1998.9517554&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;&lt;td&gt;&lt;img width="64" src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/79/Leiopelma_pakeka01.jpg/200px-Leiopelma_pakeka01.jpg"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Rana okaloosae&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.2307/1444847"&gt;http://dx.doi.org/10.2307/1444847&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1985&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;i&gt;Phrynobatrachus uzungwensis&lt;/i&gt;&lt;/td&gt;&lt;td&gt;VU&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1163/156853883X00030"&gt;http://dx.doi.org/10.1163/156853883X00030&lt;/a&gt;&lt;/td&gt;&lt;td&gt;1983&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;This is a small fraction of the frog species actually in GenBank because I've filtered it down to those that have been linked to Wikipedia (from where we get the conservation status) and which were described in papers with DOIs (from which we get the date of description).&lt;br /&gt;&lt;br /&gt;I generated this result using this SPARQL query on a triple store that had the primary data sources (Uniprot, Dbpedia, CrossRef, ION) loaded, together with the all-important "glue" datasets that link ION to CrossRef, and Uniprot to Dbpedia (see &lt;a href="http://iphylo.blogspot.com/2011/10/tdwg-challenge-what-is-rdf-good-for.html"&gt;previous post&lt;/a&gt; for details):&lt;br /&gt; &lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;&lt;br /&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;&lt;br /&gt;PREFIX dbpedia-owl: &amp;lt;http://dbpedia.org/ontology/&amp;gt;&lt;br /&gt;PREFIX uniprot: &amp;lt;http://purl.uniprot.org/core/&amp;gt;&lt;br /&gt;PREFIX tdwg_tn: &amp;lt;http://rs.tdwg.org/ontology/voc/TaxonName#&amp;gt;&lt;br /&gt;PREFIX tdwg_co: &amp;lt;http://rs.tdwg.org/ontology/voc/Common#&amp;gt;&lt;br /&gt;PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt;&lt;br /&gt;&lt;br /&gt;SELECT ?name ?status ?doi ?date ?thumbnail&lt;br /&gt;WHERE {&lt;br /&gt;  ?ncbi uniprot:scientificName ?name .&lt;br /&gt;  ?ncbi rdfs:seeAlso ?dbpedia .&lt;br /&gt;  ?dbpedia dbpedia-owl:conservationStatus ?status .&lt;br /&gt;  ?ion  tdwg_tn:nameComplete ?name . &lt;br /&gt;  ?ion tdwg_co:publishedInCitation ?doi .&lt;br /&gt;  ?doi dcterms:date ?date .&lt;br /&gt;&lt;br /&gt;  OPTIONAL&lt;br /&gt;  {&lt;br /&gt;   ?dbpedia dbpedia-owl:thumbnail ?thumbnail&lt;br /&gt;  }&lt;br /&gt;} &lt;br /&gt;ORDER BY ASC(?status)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This table doesn't tell us a great deal, but we could, for example, graph date of description against conservation status (CR=critical, EN=endangered, VU=vulnerable, NT=not threatened, LC=least concern, DD=data deficient):&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-8wu90HZC4Ow/Tp_iiUOM_MI/AAAAAAAABA4/etqvT0iq9K4/chart.png?imgmax=800" alt="Chart" title="chart.png" border="0" width="400" height="333" /&gt;&lt;br /&gt;In other words, is it the case that more recently described species are more likely to be endangered than  taxa we've known about for some time (based on the assumption that we've found all the common species already)? We could imagine extending this query to retrieve sequences for a class of frog (e.g., critically endangered) so we could compute a measure population genetic variation, etc. We shouldn't take the graph above too seriously because it's based on small fraction of the data, but you get the idea. As more frog taxonomy goes online (there's a lot of stuff in &lt;a href="http://www.biodiversitylibrary.org"&gt;BHL&lt;/a&gt; and &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt;, for example) we could add more dates and build a dataset worth analysing properly.&lt;br /&gt;&lt;br /&gt;It seems to me that these should be fairly simple things to do, yet they are the sort of thing that if we attempt today it's a world of hurt involving scripts, Excel, data cleaning, etc. before we can do the science.&lt;br /&gt;&lt;br /&gt;The thing is, without the "glue" files mapping identifiers across different databases even this simple query isn't possible. Obviously we have no say in how many organisations publish RDF, but within the biodiversity informatics community we should make every effort to use external identifiers wherever possible so that we can make these links. This is the core of my complaint. If we are using RDF to foster data integration so we can query across the diverse data sets that speak to biodiversity, then we are doing it wrong.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;Here is a nice visualisation of this dataset from &lt;a href="http://twitter.com/orovellotti"&gt;@orovellotti&lt;/a&gt; (original &lt;a href="http://twitter.com/#!/orovellotti/status/127045777351651328/photo/1/large"&gt;here&lt;/a&gt;), made using &lt;a href="http://code.google.com/p/ecoreleve/"&gt;ecoRelevé&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-0K8jb1ylsvs/TqBZbGWWN7I/AAAAAAAABBE/Lgnx93Ni7UY/AcNbdh2CMAA3ysc.png-large.png?imgmax=800" alt="AcNbdh2CMAA3ysc png large" title="AcNbdh2CMAA3ysc.png-large.png" border="0" width="400" height="250" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7195339787892934087?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7195339787892934087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7195339787892934087'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/reflections-on-tdwg-rdf.html' title='Reflections on the TDWG RDF &amp;quot;Challenge&amp;quot;'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-8wu90HZC4Ow/Tp_iiUOM_MI/AAAAAAAABA4/etqvT0iq9K4/s72-c/chart.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7642345513411208054</id><published>2011-10-19T13:47:00.001+01:00</published><updated>2011-10-20T09:38:00.118+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Genbank'/><category scheme='http://www.blogger.com/atom/ns#' term='Uniprot'/><category scheme='http://www.blogger.com/atom/ns#' term='TDWG'/><category scheme='http://www.blogger.com/atom/ns#' term='integration'/><category scheme='http://www.blogger.com/atom/ns#' term='ION'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='NCBI'/><category scheme='http://www.blogger.com/atom/ns#' term='CrossRef'/><category scheme='http://www.blogger.com/atom/ns#' term='linking'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><category scheme='http://www.blogger.com/atom/ns#' term='Bio2RDF'/><title type='text'>TDWG Challenge - what is RDF good for?</title><content type='html'>Last month, feeling particularly grumpy, I fired off an &lt;a href="http://lists.tdwg.org/pipermail/tdwg-tag/2011-September/002381.html"&gt;email to the TDWG-TAG&lt;/a&gt; mailing list with the subject &lt;b&gt;Lobbing grenades: a challenge&lt;/b&gt;. Here's the email:&lt;br /&gt;&lt;blockquote&gt;It's morning and the coffee hasn't quite kicked in yet, but reading through recent TDWG TAG posts, and mindful of the upcoming meeting in New Orleans  (which sadly I won't be attending) I'm seeing a mismatch between the amount of effort being expended on discussions of vocabularies, ontologies, etc. and the concrete results we can point to. &lt;br /&gt;&lt;br /&gt;Hence, a challenge:&lt;br /&gt;&lt;br /&gt;"What new things have we learnt about biodiversity by converting biodiversity data into RDF?"&lt;br /&gt;&lt;br /&gt;I'm not saying we can't learn new things, I'm simply asking what have we learnt so far? &lt;br /&gt;&lt;br /&gt;Since around 2006 we have had literally millions of triples in the wild (uBio, ION, Index Fungorum, IPNI, Catalogue of Life, more recently Biodiversity Collections Index, Atlas of Living Australia, World Register of Marine Species, etc.), most of these using the same vocabulary. What new inferences have we made?&lt;br /&gt;&lt;br /&gt;Let's make the challenge more concrete. Load all these data sources into a triple store (subchallenge - is this actually possible?). Perhaps add other RDF sources (DBpedia, Bio2RDF, CrossRef). What novel inferences can we make?&lt;br /&gt;&lt;br /&gt;I may, of course, simply be in "grumpy old arse" mode, but we have millions of triples in the wild and nothing to show for it. I hope I'm not alone in wondering why...&lt;/blockquote&gt;&lt;br /&gt;In the context of the TDWG meeting (happening as we speak and which I'm following via Twitter, hashtag &lt;a href="http://twitter.com/#!/search/tdwg"&gt;#tdwg&lt;/a&gt;) Joel Sachs asked me whether I had any specific data in mind that could form the basis of a discussion. So, here goes. I've assembled some small RDF data sets that it might be fun to play with. Each data set is for frogs, and I've divided them into two sets.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Primary data&lt;/b&gt;&lt;br /&gt;These data sets are essentially unmodified RDF fetched from data providers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/uniprot.rdf"&gt;uniprot.rdf&lt;/a&gt; Uniprot RDF for frogs in GenBank&lt;/li&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/ion.rdf"&gt;ion.rdf&lt;/a&gt; Index of Organism Names (ION) RDF for taxonomic names for frogs (filtered to just those names that are also in GenBank, the RDF comes from ION LSIDs)&lt;/li&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/crossref.rdf"&gt;crossref.rdf&lt;/a&gt; CrossRef RDF for DOIs for publications that published new frog names (obtaining using &lt;a href="http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html"&gt;CrossRef's support for Linked Data for DOIs&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/dbpedia.rdf"&gt;dbpedia.rdf&lt;/a&gt; Dbpedia RDF for frogs in GenBank (Update 2011-10-20: the dbpedia.rdf file is a bit big, so here is &lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/subset.rdf"&gt;subset.rdf&lt;/a&gt; which has just the conservation status and thumbnail image)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;These sources give us information on genomics (at least, they tell us which taxa have been sequenced), where and when the original taxonomic description was published, and by whom, as well as some information on conservation status and what the frog looks like (via Dbpedia). Ideally we just load these files into a triple store and then ask a bunch of questions, such as what is the conservation status of frogs sequenced in Genbank?, is there correlation between the conservation status of a frog and the date it was discovered?, who has described the most frog species?, etc. &lt;br /&gt;&lt;br /&gt;My contention is that actually we can't do any of this because the data is siloed due to the lack of shared identifiers and vocabularies (I suspect that there is not a single identifier any of these files share). The only way we can currently link these data sets together is by shared string literals (e.g., taxonomic names), in which case why bother with RDF? So my first challenge is to see whether any of the questions I've just listed can actually be tackled using this data. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Glue&lt;/b&gt;&lt;br /&gt;In a slightly more constructive mode, to see if we can make progress I'm providing some additional RDF files, based on projects I'm working on to link data together. These files may help provide some of the missing "glue" to connect these data sets.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/linkout.rdf"&gt;linkout.rdf&lt;/a&gt; The list of links between NCBI and Dbpedia (based on mapping in &lt;a href="http://iphylo.org/linkout"&gt;iPhylo LinkOut&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/ion_doi.rdf"&gt;ion_doi.rdf&lt;/a&gt; A subset of publications listed in ION have DOIs, this file links the corresponding ION LSIDs to those DOIs (this file is from an ongoing project mapping names to primary literature)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;The first file links the ION and CrossRef RDF, so we could start to ask questions about dates of discovery, who described what species, etc.. The second file links NCBI taxon ids (in this case in the form of UniProt URIs) to Wikipedia (in the form of Dbpedia URIs). Dbpedia has information on conservation status, and some frogs will also have pictures, so we can start to join genomics to conservation, as well as make some visualisations.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;I've now added another RDF file for 1000 georeferenced GenBank sequences for frogs. The file is &lt;a href="http://dl.dropbox.com/u/639486/tdwg-rdf/genbank.rdf"&gt;genbank.rdf&lt;/a&gt;. This file is generated from a local, processed version of EMBL, and uses a mixture of Dublin Core and TDWG vocabularies. Here's an example of a single record:&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br /&gt;&amp;lt;rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" &lt;br /&gt;xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" &lt;br /&gt;xmlns:owl="http://www.w3.org/2002/07/owl#" &lt;br /&gt;xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" &lt;br /&gt;xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" &lt;br /&gt;xmlns:tcommon="http://rs.tdwg.org/ontology/voc/Common#" &lt;br /&gt;xmlns:toccurrence="http://rs.tdwg.org/ontology/voc/TaxonOccurrence#" &lt;br /&gt;xmlns:uniprot="http://purl.uniprot.org/core/"&amp;gt;&lt;br /&gt;  &amp;lt;uniprot:Molecule rdf:about="http://bio2rdf.org/genbank:EU566842"&amp;gt;&lt;br /&gt;    &amp;lt;dcterms:created&amp;gt;2008-07-06&amp;lt;/dcterms:created&amp;gt;&lt;br /&gt;    &amp;lt;dcterms:modified&amp;gt;2010-12-23&amp;lt;/dcterms:modified&amp;gt;&lt;br /&gt;    &amp;lt;dcterms:title&amp;gt;EU566842&amp;lt;/dcterms:title&amp;gt;&lt;br /&gt;    &amp;lt;dcterms:description&amp;gt;Xenopus borealis voucher MHNG:Herp:2644.64 &lt;br /&gt;cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial.&amp;lt;/dcterms:description&amp;gt;&lt;br /&gt;    &amp;lt;dcterms:subject rdf:resource="http://purl.uniprot.org/taxonomy/8354"/&amp;gt;&lt;br /&gt;    &amp;lt;dcterms:relation rdf:parseType="Resource"&amp;gt;&lt;br /&gt;      &amp;lt;rdf:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence#TaxonOccurrence"/&amp;gt;&lt;br /&gt;      &amp;lt;toccurrence:identifiedToString&amp;gt;Xenopus borealis&amp;lt;/toccurrence:identifiedToString&amp;gt;&lt;br /&gt;      &amp;lt;toccurrence:decimalLatitude&amp;gt;0.66&amp;lt;/toccurrence:decimalLatitude&amp;gt;&lt;br /&gt;      &amp;lt;geo:lat&amp;gt;0.66&amp;lt;/geo:lat&amp;gt;&lt;br /&gt;      &amp;lt;toccurrence:decimalLongitude&amp;gt;37.5&amp;lt;/toccurrence:decimalLongitude&amp;gt;&lt;br /&gt;      &amp;lt;geo:long&amp;gt;37.5&amp;lt;/geo:long&amp;gt;&lt;br /&gt;      &amp;lt;toccurrence:verbatimCoordinates&amp;gt;0.66 N 37.5 E&amp;lt;/toccurrence:verbatimCoordinates&amp;gt;&lt;br /&gt;      &amp;lt;toccurrence:country&amp;gt;Kenya&amp;lt;/toccurrence:country&amp;gt;&lt;br /&gt;      &amp;lt;dcterms:identifier&amp;gt;MHNG:Herp:2644.64&amp;lt;/dcterms:identifier&amp;gt;&lt;br /&gt;    &amp;lt;/dcterms:relation&amp;gt;&lt;br /&gt;  &amp;lt;/uniprot:Molecule&amp;gt;&lt;br /&gt;&amp;lt;/rdf:RDF&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I've added this simply so one could do some geographical queries.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Missing links&lt;/b&gt;&lt;br /&gt;There are still lots of missing links here (for example, there's no explicit link between NCBI and ION, so we'd need to create this using taxonomic names), and we could add further links to the literature via sequences for taxa. Then there's the lack of geographic data. We could get some of this via georeferenced sequences in GenBank, but there's no RDF for this (&lt;a href="http://bio2rdf.org/"&gt;Bio2RDF&lt;/a&gt; does have RDF for sequences but it ignores the bulk of the organismal metadata such as voucher specimens and latitude and longitude).&lt;br /&gt;&lt;br /&gt;In many ways it's this lack of links that was point of my original email. The reality is that "linked data" isn't linked to anything like the extent that makes it useful. Simply pumping out RDF won't get us very far until we tackle this problem (see also my earlier post &lt;a href="http://iphylo.blogspot.com/2011/09/linked-data-that-isn-failings-of-rdf.html"&gt;Linked data that isn't: the failings of RDF&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;So, if you think RDF is the way to go, please tell me what you can learn from these data files.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7642345513411208054?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7642345513411208054'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7642345513411208054'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/tdwg-challenge-what-is-rdf-good-for.html' title='TDWG Challenge - what is RDF good for?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5676704203226165083</id><published>2011-10-11T08:56:00.001+01:00</published><updated>2011-10-11T08:56:57.889+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='long tail'/><category scheme='http://www.blogger.com/atom/ns#' term='iTunes'/><category scheme='http://www.blogger.com/atom/ns#' term='digitisation'/><category scheme='http://www.blogger.com/atom/ns#' term='DeepDyve'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='business model'/><title type='text'>DeepDyve - renting scientific articles</title><content type='html'>&lt;img src="http://lh6.ggpht.com/-ZdvCr84lFfE/TpP2xcIt4GI/AAAAAAAABAk/LVDU2bVZYI4/deepdyve_button.gif?imgmax=800" alt="Deepdyve button" title="deepdyve_button.gif" border="0" width="54" height="54" style="float:right;" /&gt;Bit late, but I stumbled across &lt;a href="http://www.deepdyve.com"&gt;DeepDyve&lt;/a&gt;, which provides rental access to scientific papers for as little as $0.99. The &lt;a href="http://www.deepdyve.com/corp/partners/publishers"&gt;pitch to publishers&lt;/a&gt; is:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Today, scholarly publisher sites receive over 2 billion visits per year from users who are unaffiliated with an institution yet convert less than 0.2% into a purchase or subscription. DeepDyve’s service is designed for these ‘unaffiliated users’ who need an easy and affordable access to authoritative information vital to their careers.&lt;/blockquote&gt;&lt;br /&gt;Renting a paper means you get to read it online, but you can't print or download it, and access is time limited (unless you purchase the article outright). You can also purchase monthly plans (think &lt;a href="http://www.spotify.com/"&gt;Spotify&lt;/a&gt; for papers).&lt;br /&gt;&lt;br /&gt;It's an interesting model, and the interface looks nice. Here's a paper on &lt;a href="http://www.deepdyve.com/lp/springer-journals/taxonomy-and-biodiversity-43Fw0dtaJ4"&gt;Taxonomy and Diversity&lt;/a&gt; (&lt;a href="http://dx.doi.org/10.1023/A:1003602221172"&gt;http://dx.doi.org/10.1023/A:1003602221172&lt;/a&gt;):&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-Q5A2eSgpemo/TpP2yB8JOSI/AAAAAAAABAs/4ETfMUCTN1w/deepdyvescreenshot.png?imgmax=800" alt="Deepdyvescreenshot" title="deepdyvescreenshot.png" border="0" width="400" height="313" /&gt;&lt;br /&gt;Leaving aside the issue of whether restricted access to the scientific literature is a good idea (even if it is relatively cheap) I'm curious about the business model and the long tail. One could imagine lots of people downloading a few high-visibility papers, and my sense (based on no actual data I should stress) is that DeepDyve's publishing partners are providing access to their first-tier journals. &lt;br /&gt;&lt;br /&gt;Taxonomic literature is vast, but most individual papers will have few readers (describing a single new species is usually not big news, with obvious exceptions). But I wonder if in aggregate the potential taxonomic readership would be enough to make cheap access to that literature economic. Publishers such as Wiley, Taylor and Francis, and Springer have digitised some major taxonomic journals, how will they get a return on this? I suspect the a price tag of, say, €34.95 for an article on seabird lice (e.g., "Neue Zangenläuse (Mallophaga, Philopteridae) von procellariiformen und charadriiformen Wirten" &lt;a href="http://dx.doi.org/10.1007/BF00260996"&gt;http://dx.doi.org/10.1007/BF00260996&lt;/a&gt;) will be too high for many people, but the chance to rent it for 24 hours for, say, $0.99, would be appealing. If this is the case, then maybe this would encourage publishers to digitise more of their back catalogue. It would be nice if everything is digitised and free, but I could live with digitised and cheap.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5676704203226165083?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5676704203226165083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5676704203226165083'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/deepdyve-renting-scientific-articles.html' title='DeepDyve - renting scientific articles'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-ZdvCr84lFfE/TpP2xcIt4GI/AAAAAAAABAk/LVDU2bVZYI4/s72-c/deepdyve_button.gif?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3750469324888609142</id><published>2011-10-06T13:43:00.001+01:00</published><updated>2011-10-06T13:43:32.654+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='joy'/><category scheme='http://www.blogger.com/atom/ns#' term='Steve Jobs'/><category scheme='http://www.blogger.com/atom/ns#' term='design'/><category scheme='http://www.blogger.com/atom/ns#' term='iBook'/><category scheme='http://www.blogger.com/atom/ns#' term='Apple'/><title type='text'>My favourite Apple moment</title><content type='html'>In light of &lt;a href="http://www.apple.com/stevejobs/"&gt;today's news&lt;/a&gt; here's my favourite Mac, the original iBook.&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-KntyZmt_mNA/To2icSY2h8I/AAAAAAAABAc/F0HLhmMuu4Y/ibookclam.jpg?imgmax=800" alt="Ibookclam" title="ibookclam.jpg" border="0" width="448" height="600" /&gt;&lt;br /&gt;In many ways, it wasn't the machine itself so grabbed me (cool as it was), it was the experience of unpacking it when it arrived in my office over a decade ago. In the box with the computer and the mains cord was a disc about the size of a hockey puck (on the right in the image above). I looked at it and wondered what on Earth it was. It looked like a giant yo-yo, with cable wrapped around instead of string. Then the penny dropped — it was the power supply. You plugged the mains cord into the yo-yo, then unwound just as much cord as you needed (oh, and when you connected it in to your iBook the plug glowed orange if the battery needed charging, green if it was fully charged). The child inside me squealed with delight (being a grown up I laughed out loud, rather than actually squealing).&lt;br /&gt;&lt;br /&gt;The iBook still works (the battery is long dead, but plug the yo-yo into the mains and it still works), and it manages to run an early version of Mac OS X. &lt;br /&gt;&lt;br /&gt;If anybody has to ask why people love Apple products, it's not because of the "brand", or the "exclusivity", it's because of the joy they can invoke. Someone cared enough to make the most mundane task — plugging a laptop into the mains — into a thing of beauty.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3750469324888609142?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3750469324888609142'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3750469324888609142'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/my-favourite-apple-moment.html' title='My favourite Apple moment'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-KntyZmt_mNA/To2icSY2h8I/AAAAAAAABAc/F0HLhmMuu4Y/s72-c/ibookclam.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4168460820207843447</id><published>2011-10-05T17:34:00.001+01:00</published><updated>2011-10-05T17:34:39.604+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='we feel fine'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomists'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><title type='text'>Taxonomy - crisis, what crisis?</title><content type='html'>Following on from the last post &lt;a href="http://iphylo.blogspot.com/2011/10/how-many-species-are-there-and-why-do.html"&gt;How many species are there, and why do we get two very different answers from same data?&lt;/a&gt; another interesting paper has appeared in TREE:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Lucas N. Joppa, David L. Roberts, Stuart L. Pimm &lt;b&gt;The population ecology and social behaviour of taxonomists&lt;/b&gt; Trends in Ecology &amp; Evolution &lt;a href="http://dx.doi.org/10.1016/j.tree.2011.07.010"&gt;doi:10.1016/j.tree.2011.07.010&lt;/a&gt;&lt;/blockquote&gt;&lt;br /&gt;The paper analyses the "ecology and social habits of taxonomists" and concludes:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Conventional wisdom is highly prejudiced. It suggests that taxonomists were a formerly more numerous people, are in 'crisis', are becoming endangered and are generally asocial. We consider these hypotheses and reject them to varying degrees.&lt;/blockquote&gt;&lt;br /&gt;Queue flame war on TAXACOM, no doubt, but it's a refreshing conclusion, and it's based on actual data. Here I declare an interest. I was a reviewer, and in a fit of pique recommended rejection simply because the authors don't make the data available (they do, however, provide the R scripts used to do the analyses). As the authors patiently pointed out in their response to reviews, the various explicit or implicit licensing statements attached to taxonomic data mean they can't provide the data (and I'm assuming that in at least some cases the dark art of screen scrapping was used to get the data).&lt;br /&gt;&lt;br /&gt;There's an irony here. Taxonomic databases are becoming hot topics, generating estimates of the scale of the task facing taxonomy, and diagnosing state of the discipline itself (according to Joppa et al. it's in rude health). This is the sort of thing that can have a major impact on how people perceive the discipline (and may influence how many resources are allocated to the subject). If taxonomists take issue with the analyses then they will find them difficult to repeat because the taxonomic data they've spent their careers gathering are under lock and key.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4168460820207843447?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4168460820207843447'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4168460820207843447'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/taxonomy-crisis-what-crisis.html' title='Taxonomy - crisis, what crisis?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2871938902250766714</id><published>2011-10-04T11:33:00.001+01:00</published><updated>2011-10-04T11:35:09.334+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='number of species'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='biodiversity informatics'/><category scheme='http://www.blogger.com/atom/ns#' term='WoRMS'/><category scheme='http://www.blogger.com/atom/ns#' term='Catalogue of Life'/><title type='text'>How many species are there, and why do we get two very different answers from same data?</title><content type='html'>&lt;img src="http://lh3.ggpht.com/-DjI5jztvHLo/Torg5UpjUtI/AAAAAAAABAQ/ns_tQ0q11zs/Globe.jpg?imgmax=800" alt="Globe" border="0" width="100" height="100" style="float:right;" /&gt;Two papers estimating the total number of species have recently been published, one in the open access journal &lt;i&gt;PLoS Biology&lt;/i&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Camilo Mora, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, Boris Worm. &lt;b&gt;How Many Species Are There on Earth and in the Ocean?&lt;/b&gt;. PLoS Biol 9(8): e1001127. &lt;a href="http://dx.doi.org/10.1371/journal.pbio.1001127"&gt;doi:10.1371/journal.pbio.1001127&lt;/a&gt;&lt;/blockquote&gt;&lt;img src="http://lh5.ggpht.com/-9I4ZTzzIe88/Torg5zXvLiI/AAAAAAAABAU/w7x4_3ddHP8/SSB_logo_final.png?imgmax=800" alt="SSB logo final" border="0" width="100" height="100" style="float:right;" /&gt;&lt;br /&gt;the second in &lt;i&gt;Systematic Biology&lt;/i&gt; (which has an open access option but the authors didn't use it for this article): &lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Mark J. Costello, Simon Wilson and Brett Houlding. &lt;b&gt;Predicting total global species richness using rates of species description and estimates of taxonomic effort&lt;/b&gt;. Syst Biol (2011) &lt;a href="http://dx.doi.org/10.1093/sysbio/syr080"&gt;doi:10.1093/sysbio/syr080&lt;/a&gt;&lt;/blockquote&gt;&lt;br /&gt;The first paper has gained a lot of attention, in part because Jonathan Eisen &lt;a href="http://phylogenomics.blogspot.com/2011/08/bacteria-archaea-dont-get-no-respect.html"&gt;Bacteria &amp; archaea don't get no respect from interesting but flawed #PLoSBio paper on # of species on the planet&lt;/a&gt; was mightily pissed off about the estimates of the number:&lt;br /&gt;&lt;blockquote&gt;Their estimates of ~ 10,000 or so bacteria and archaea on the planet are so completely out of touch in my opinion that this calls into question the validity of their method for bacteria and archaea at all.&lt;/blockquote&gt;&lt;br /&gt;The fuss over the number of bacteria and archaea seems to me to be largely a misunderstanding of how taxonomic databases count taxa. Databases like Catalogue of Life record described species, and most bacteria aren't formally described because they can't be cultured. Hence there will always be a disparity between the extent of diversity revealed by phylogenetics and by classical taxonomy.&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;PLoS Biology&lt;/i&gt; paper has garnered a lot more reaction than the &lt;i&gt;Systematic Biology&lt;/i&gt; paper (e.g., the commentary by Carl Zimmer in the &lt;i&gt;New York Times&lt;/i&gt;&lt;a href="http://www.nytimes.com/2011/08/30/science/30species.html"&gt;How Many Species? A Study Says 8.7 Million, but It’s Tricky&lt;/a&gt;), which arguably has the more dramatic conclusion.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How many species, 8.7 million, or 1.8 to 2.0 million?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Whereas the Mora et al. in &lt;i&gt;PLoS Biology&lt;/i&gt; concluded that there are some 8.7 million (±1.3 million SE) species on the planet, Costello et al. in &lt;i&gt;Systematic Biology&lt;/i&gt; arrive at a much more conservative figure (1.8 to 2.0 million). The implications of these two studies are very different, one implies there's a lot of work to do, the other leads to headlines such as &lt;a href="http://thescotsman.scotsman.com/news/39Every-species-could-be-discovered.6845137.jp"&gt;'Every species on Earth could be discovered within 50 years'&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;What is intriguing is that both studies use the same databases, &lt;a href="http://www.catalogueoflife.org/"&gt;Catalogue of Life&lt;/a&gt; and the &lt;a href="http://www.marinespecies.org"&gt;World's Register of Marine Species&lt;/a&gt;, and yet arrive at very different results.&lt;br /&gt;&lt;br /&gt;So, the question is, how did we arrive at two very different answers from the same data?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2871938902250766714?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2871938902250766714'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2871938902250766714'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/10/how-many-species-are-there-and-why-do.html' title='How many species are there, and why do we get two very different answers from same data?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-DjI5jztvHLo/Torg5UpjUtI/AAAAAAAABAQ/ns_tQ0q11zs/s72-c/Globe.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3901660624497371747</id><published>2011-09-30T09:23:00.001+01:00</published><updated>2011-09-30T09:23:18.616+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Atypon'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='fail'/><category scheme='http://www.blogger.com/atom/ns#' term='CrossRef'/><category scheme='http://www.blogger.com/atom/ns#' term='Wallace'/><category scheme='http://www.blogger.com/atom/ns#' term='Taylor and Francis'/><title type='text'>Taylor and Francis Online breaks DOIs - lots of DOIs</title><content type='html'>&lt;img src="http://lh4.ggpht.com/-i7qw9Qqb3UE/ToV8dBsS_LI/AAAAAAAABAI/Os8U9scCt8U/TandFOnline-twitter.gif?imgmax=800" alt="TandFOnline twitter" title="TandFOnline-twitter.gif" border="0" width="72" height="72" style="float:right;padding:10px;" /&gt;DOIs are meant to be the gold standard in bibliographic identifier for article. They are not supposed to break. Yet some publishers seem to struggle to get them to work. In the past I've grumbled about BioOne, Wiley, and others as cuplrits with broken or &lt;a href="http://iphylo.blogspot.com/2007/05/duplicate-dois.html"&gt;duplicate&lt;/a&gt; or &lt;a href="http://iphylo.blogspot.com/2008/05/when-dois-collide-and-then-disappear.html"&gt;disappearing&lt;/a&gt; DOIs.&lt;br /&gt;&lt;br /&gt;Today's source of frustration is Taylor and Francis Online. T&amp;F Online is powered by (&lt;a href="http://www.atypon.com/"&gt;Atypon&lt;/a&gt;), which recently issued this glowing &lt;a href="http://www.atypon.com/news-and-events/press-release.php?id=2020"&gt;press release&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SANTA CLARA, Calif.—20 September 2011—Atypon®, a leading provider of software to the professional and scholarly publishing industry, today announced that its Literatum™ software is powering the new Taylor &amp; Francis Online platform (www.TandFOnline.com). Taylor &amp; Francis Online hosts 1.7 million articles.&lt;br /&gt;...&lt;br /&gt;"The performance of Taylor &amp; Francis Online has been excellent," said Matthew Jay, Chief Technology Officer for the Taylor &amp; Francis Group. "Atypon has proven that it can deliver on schedule and achieve tremendous scale. We're thrilled to expand the scope of our relationship to include new products and developments."&lt;/blockquote&gt; &lt;br /&gt;Great, except that lots of T&amp;F DOIs are broken. I've come across two kinds of fail.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;DOI resolves to server that doesn't exist&lt;/b&gt;&lt;br /&gt;The first is where a DOI resolves to a phantom web address. For example, the DOI &lt;a href="http://dx.doi.org/10.1080/00288300809509849"&gt;doi:10.1080/00288300809509849&lt;/a&gt; resolves to &lt;a href="http://tandfprod.literatumonline.com/doi/abs/10.1080/00288300809509849"&gt;http://tandfprod.literatumonline.com/doi/abs/10.1080/00288300809509849&lt;/a&gt;. But the domain &lt;b&gt;tandfprod.literatumonline.com&lt;/b&gt; doesn't exist, so the DOI is a dead end.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;DOI doesn't resolve&lt;/b&gt;&lt;br /&gt;Taylor and Francis have digitised the complete &lt;i&gt;Annals and Magazine of Natural History&lt;/i&gt;, a massive journal comprising nearly 20,000 articles from 1841 to 1966, and which has published some seminal papers, including A. R. Wallace's "On the law which has regulated the introduction of new species" &lt;a href="http://dx.doi.org/10.1080/037454809495509"&gt;doi"10.1080/037454809495509&lt;/a&gt; which forced Darwin's hand (see the Wikipedia page for the successor journal &lt;a href="http://en.wikipedia.org/wiki/Journal_of_Natural_History"&gt;&lt;i&gt;Journal of Natural History&lt;/i&gt;&lt;/a&gt;. Taylor and Francis are to be congratulated for putting such a great resource online.&lt;br /&gt;&lt;br /&gt;Problem is, I've not found a single DOI for any article in &lt;i&gt;Annals and Magazine of Natural History&lt;/i&gt; that actually works. If you try and resolve the DOI for Wallace's paper, &lt;a href="http://dx.doi.org/10.1080/037454809495509"&gt;doi"10.1080/037454809495509&lt;/a&gt;, you get the dreaded "Error - DOI not found" web page. So something like 20,000 DOIs simply don't work. The only way to make the DOI work is append it to "http://www.tandfonline.com/doi/abs/", e.g. &lt;a href="http://www.tandfonline.com/doi/abs/10.1080/037454809495509"&gt;http://www.tandfonline.com/doi/abs/10.1080/037454809495509&lt;/a&gt;. This gets us to the article, but rather defeats the purpose of DOIs.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why?&lt;/b&gt;&lt;br /&gt;Something is seriously wrong with CrossRef's quality control. It can't be too hard to screen all domains to see if they actually exist (this would catch the first error). It can't be too hard to take a random sample of DOIs and check that they work, or automatically check DOIs that are reported as missing. In the case the &lt;i&gt;Annals and Magazine of Natural History&lt;/i&gt; the web page for the Wallace article states that it has been available online since 16 December 2009. That's a long time for a DOI to be dead.&lt;br /&gt;&lt;br /&gt;There is a wealth of great content that is being made hard to find by some pretty basic screw ups. So CrossRef, Atypon and Taylor and Francis, can we please sort this out? &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3901660624497371747?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3901660624497371747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3901660624497371747'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/taylor-and-francis-online-breaks-dois.html' title='Taylor and Francis Online breaks DOIs - lots of DOIs'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-i7qw9Qqb3UE/ToV8dBsS_LI/AAAAAAAABAI/Os8U9scCt8U/s72-c/TandFOnline-twitter.gif?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7871113065376586793</id><published>2011-09-21T11:22:00.001+01:00</published><updated>2011-09-21T11:25:02.032+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Semantic Web'/><category scheme='http://www.blogger.com/atom/ns#' term='CiNii'/><category scheme='http://www.blogger.com/atom/ns#' term='SPARQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='CrossRef'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>Linked data that isn't: the failings of RDF</title><content type='html'>OK, a bit of hyperbole in the morning. One of the goals of RDF is to create the Semantic Web, an interwoven network of data seamlessly linked by shared identifiers and shared vocabularies. Everyone uses the same identifiers for the same things, and when they describe these things they use the same terms. Simples.&lt;br /&gt;&lt;br /&gt;Of course, the reality is somewhat different. Typically people don't reuse identifiers, and there are usually several competing vocabularies we can chose from. To give a concrete example, consider two RDF documents describing the same article, one provided by CiNii, the other by CrossRef. The article is: &lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Astuti, D., Azuma, N., Suzuki, H., &amp; Higashi, S. (2006). Phylogenetic Relationships Within Parrots (Psittacidae) Inferred from Mitochondrial Cytochrome-b Gene Sequences(Phylogeny). Zoological science, 23(2), 191-198. &lt;a href="http://dx.doi.org/10.2108/zsj.23.191"&gt;doi:10.2108/zsj.23.191&lt;/a&gt;&lt;/blockquote&gt;&lt;br /&gt;You can get RDF for a CiNii record by appending ".rdf" to the URL for the article, in this case &lt;a href="http://ci.nii.ac.jp/naid/130000017049"&gt;http://ci.nii.ac.jp/naid/130000017049&lt;/a&gt;. For CrossRef you need a Linked Data compliant client, or you can do something like this:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;curl -D - -L -H "Accept: application/rdf+xml" "http://dx.doi.org/10.2108/zsj.23.191"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;You can view the RDF from these two sources &lt;a href="http://dl.dropbox.com/u/639486/rdf/130000017049.rdf"&gt;here&lt;/a&gt; and &lt;a href="http://dl.dropbox.com/u/639486/rdf/doi.rdf"&gt;here&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;No shared identifiers&lt;/b&gt;&lt;br /&gt;The two RDF documents have no shared identifiers, or at least, any identifiers they do share aren't described in a way that is easily discovered. The CrossRef record knows nothing about the CiNii record, but the CiNii document includes this statement:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&amp;lt;rdfs:seeAlso rdf:resource="http://ci.nii.ac.jp/lognavi?name=crossref&lt;br /&gt;&amp;amp;amp;id=info:doi/10.2108/zsj.23.191" dc:title="CrossRef" /&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;So, CiNii knows about the DOI, but this doesn't help much as the CrossRef document has the URI "http://dx.doi.org/10.2108/zsj.23.191", so we don't have an explicit statement that the two documents refer to the same article. &lt;br /&gt;&lt;br /&gt;The other shared identifier the documents could share is the ISSN for the journal (0289-0003), but CiNii writes this without the "-", and uses the PRISM term "prism:issn", so we have:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&amp;lt;prism:issn&amp;gt;02890003&amp;lt;/prism:issn&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;whereas CrossRef writes the ISSN like this:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&amp;lt;ns0:issn xmlns:ns0="http://prismstandard.org/namespaces/basic/2.1/"&amp;gt;&lt;br /&gt;0289-0003&amp;lt;/ns0:issn&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Unless we have a linked data client that normalises ISSNs before it does a SPARQL query we will miss the fact that these two articles are in the same journal.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Inconsistent vocabularies&lt;/b&gt;&lt;br /&gt;Both CiNii use the PRISM vocabulary to describe the article, but they use different versions. CrossRef uses "http://prismstandard.org/namespaces/basic/2.1/" whereas CiNii uses "http://prismstandard.org/namespaces/basic/2.0/". Version 2.1 versus version 2.0 is a minor difference, but the URIs are different and hence they are different vocabularies (having version numbers in vocabulary URIs is asking for trouble). Hence, even if CiNii and CrossRef wrote ISSNs in the same way, we'd still not be able to assert that the articles come from the same journal. &lt;br /&gt;&lt;!--&lt;br /&gt;Then there are the different URIs for Dublin Core: &lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;xmlns:dc="http://purl.org/dc/elements/1.1/" &lt;br /&gt;&lt;/pre&gt;and &lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;xmlns:ns0="http://purl.org/dc/terms/"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;--&gt;&lt;b&gt;Inconsistent use of vocabularies&lt;/b&gt;&lt;br /&gt;Both CiNii use FOAF for author names, but they write the names differently:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&amp;lt;foaf:name xml:lang="en"&amp;gt;Suzuki Hitoshi&amp;lt;/foaf:name&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&amp;lt;ns0:name xmlns:ns0="http://xmlns.com/foaf/0.1/"&amp;gt;Hitoshi Suzuki&amp;lt;/ns0:name&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So, another missed opportunity to link the documents. One could argue this would be solved if we had consistent identifiers for authors, but we don't. In this case CiNii have their own local identifiers (e.g. &lt;a href="http://ci.nii.ac.jp/nrid/1000040179239"&gt;http://ci.nii.ac.jp/nrid/1000040179239&lt;/a&gt;), and CrossRef has a rather hideous looking &lt;a href="http://www.w3.org/2011/rdf-wg/wiki/Skolemisation"&gt;Skolemisation&lt;/a&gt;: &lt;a href="http://id.crossref.org/contributor/hitoshi-suzuki-2gypi8bnqk7yy"&gt;http://id.crossref.org/contributor/hitoshi-suzuki-2gypi8bnqk7yy&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In summary, it's a mess. Both CiNii and CrossRef organisations are whose core business is bibliographic metadata. It's great that both are serving RDF, but if we think this is anything more than providing metadata in a useful format I think we may be deceiving ourselves.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7871113065376586793?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7871113065376586793'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7871113065376586793'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/linked-data-that-isn-failings-of-rdf.html' title='Linked data that isn&amp;#39;t: the failings of RDF'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6740850495875254220</id><published>2011-09-20T08:45:00.001+01:00</published><updated>2011-09-20T08:52:18.067+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='Entomologica Scandinavica'/><category scheme='http://www.blogger.com/atom/ns#' term='CrossRef'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='ISSN'/><category scheme='http://www.blogger.com/atom/ns#' term='Insect Systematics and Evolution'/><category scheme='http://www.blogger.com/atom/ns#' term='Graphviz'/><category scheme='http://www.blogger.com/atom/ns#' term='WorldCat'/><title type='text'>Orwellian metadata: making journals disappear</title><content type='html'>&lt;img src="http://lh6.ggpht.com/-wxBve_I6WWc/TnhEmYBkpKI/AAAAAAAAA_4/__ClTGzQX8Y/Unknown.gif?imgmax=800" alt="Unknown" title="Unknown.gif" border="0" width="100" height="150" style="float:right;padding-left:10px;" /&gt;I've been spending a lot of time recently mapping bibliographic citations for taxonomic names to digital identifiers (such as DOIs). This is tedious work at the best of times (despite lots of automation), but  it is not helped but the somewhat &lt;a href="http://en.wikipedia.org/wiki/Orwellian"&gt;Orwellian&lt;/a&gt; practices of some publishers. Occasionally when an established journal gets renamed the publisher retrospectively applies that name to the previous journal. For example, in 2000 the journal &lt;i&gt;Entomologica Scandinavica&lt;/i&gt; (ISSN &lt;a href="http://www.worldcat.org/issn/0013-8711"&gt;0013-8711&lt;/a&gt;) became &lt;i&gt;Insect Systematics &amp; Evolution&lt;/i&gt; (ISSN &lt;a href="http://www.worldcat.org/issn/1399-560X"&gt;1399-560X&lt;/a&gt;):&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;img src="https://chart.googleapis.com/chart?cht=gv&amp;chl=digraph%20%221399-560X%22%20%7Bnode%20%5Blabel=%22%5CN%22,%20shape=plaintext%5D;%220013-8711%22%20%5Blabel=%22Entomologica%20scandinavica%5Cn0013-8711%22%5D%221399-560X%22%20%5Blabel=%22Insect%20systematics%5Cn1399-560X%22%5D;%220375-0205%22%20%5Blabel=%22Opuscula%20entomologica%5Cn/%20edidit%20Societas%20entomologica%5Cn0013-8711%22%5D;%220013-8711%22%20-%3E%20%221399-560X%22%20;%220375-0205%22%20-%3E%20%220013-8711%22%20;%7D"&gt;&lt;/div&gt;&lt;br /&gt;(diagram based on &lt;a href="http://worldcat.org/xissn/titlehistory?issn=1399-560X"&gt;WorldCat xISSN history tool&lt;/a&gt;, rendered using &lt;a href="http://code.google.com/apis/chart/image/docs/gallery/graphviz.html"&gt;Google Charts&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Content for both &lt;i&gt;Entomologica Scandinavica&lt;/i&gt; and &lt;i&gt;Insect Systematics &amp; Evolution&lt;/i&gt; is available from &lt;a href="http://www.ingentaconnect.com/content/brill/ise/"&gt;Ingenta's web site&lt;/a&gt;, but every article is listed as being in &lt;i&gt;Insect Systematics &amp; Evolution&lt;/i&gt;, and this is reflected in the metadata CrossRef has for each DOI.&lt;br /&gt;&lt;br /&gt;For example, the paper&lt;br /&gt;&lt;blockquote&gt;Andersen, N.M. &amp; P.-p. Chen, 1993. A taxonomic revision of pondskater genus Gerris Fabricius in China, with two new species (Hemiptera: Gerridae). – Entomologica Scandinavica 24: 147-166&lt;/blockquote&gt;&lt;br /&gt;has the DOI &lt;a href="http://dx.doi.org/10.1163/187631293X00262"&gt;doi:10.1163/187631293X00262&lt;/a&gt; which resolves to a page saying this article was published in &lt;i&gt;Insect Systematics &amp; Evolution&lt;/i&gt;. The XML for the DOI says the same thing:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&lt;br /&gt;   &amp;lt;issn type="print"&amp;gt;1399560X&amp;lt;/issn&amp;gt;&lt;br /&gt;   &amp;lt;issn type="electronic"&amp;gt;1876312X&amp;lt;/issn&amp;gt;&lt;br /&gt;   &amp;lt;journal_title&amp;gt;Insect Systematics &amp; Evolution&amp;lt;/journal_title&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In one sense this is no big deal. If you know the DOI then that's all you need to use to refer to the article (and the sooner we abandon fussing with citation styles and just use DOIs the better).&lt;br /&gt;&lt;br /&gt;But if you haven't yet found the DOI then this is problem, because if I search CrossRef using the original journal name (&lt;i&gt;Entomologica Scandinavica&lt;/i&gt;) I get nothing. As far as CrossRef is concerned the DOI doesn't exist. If, however, I happen to know that &lt;i&gt;Entomologica Scandinavica&lt;/i&gt; is now &lt;i&gt;Insect Systematics &amp; Evolution&lt;/i&gt;, I rewrite the query and I retrieve the DOI. &lt;br /&gt;&lt;br /&gt;It's bad enough dealing with taxonomic names changes without having to deal with journal names changes as well! It would be great if publishers didn't indulge in wholesale renaming old journals, or if CrossRef had a mechanism (perhaps based on WorldCat's &lt;a href="http://worldcat.org/xissn/titlehistory"&gt;xISSN History Visualization Tool&lt;/a&gt;) to handle retrospectively renamed journals.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6740850495875254220?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6740850495875254220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6740850495875254220'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/orwellian-metadata-making-journals.html' title='Orwellian metadata: making journals disappear'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-wxBve_I6WWc/TnhEmYBkpKI/AAAAAAAAA_4/__ClTGzQX8Y/s72-c/Unknown.gif?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8980138996729546117</id><published>2011-09-15T21:17:00.001+01:00</published><updated>2011-09-15T21:17:38.789+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Charles Sherbon'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='speaking'/><category scheme='http://www.blogger.com/atom/ns#' term='NHM'/><category scheme='http://www.blogger.com/atom/ns#' term='biodiversity informatics'/><title type='text'>Anchoring Biodiversity Information: from Sherborn to the 21st century and beyond</title><content type='html'>&lt;div style="float:right;padding:10px;"&gt;&lt;img src="http://lh5.ggpht.com/-y9nvbrss2ac/TnJdYADi8II/AAAAAAAAA_Y/ROLfyJpa_HY/sherb0_1949769b.jpg?imgmax=800" alt="Sherb0 1949769b" title="sherb0_1949769b.jpg" border="0" width="200" height="125"  /&gt;&lt;div style="width:200px;"&gt;&lt;a href="http://www.telegraph.co.uk/science/8646534/Charles-Davies-Sherborn-the-Natural-History-Museums-magpie-with-a-card-index-mind.html"&gt;Charles Davies Sherborn, the Natural History Museum's 'magpie with a card-index mind’&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;Next month I'll be speaking in London at The Natural History Museum at a one day event &lt;a href="http://iczn.org/content/anchoring-biodiversity-information-sherborn-21st-century-and-beyond"&gt;Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond&lt;/a&gt;. This meeting is being organised by the International Commission on Zoological Nomenclature and the Society for the History of Natural History, and is partly a celebration of his major work &lt;a href="http://www.sil.si.edu/digitalcollections/indexanimalium/"&gt;Index Animalium&lt;/a&gt; and partly a chance to look at the future of zoological nomenclature. &lt;br /&gt;&lt;br /&gt;Details are available from the &lt;a href="http://iczn.org/content/anchoring-biodiversity-information-sherborn-21st-century-and-beyond"&gt;ICZN web site&lt;/a&gt;. I'll be giving a a talk entitled "Towards an open taxonomy" (no, I don't know what I mean by that either). But it should be a chance to rant about the failure of taxonomy to embrace the Interwebs.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-HLrKTjm64Z8/TnJdXeZBiyI/AAAAAAAAA_U/ah_yjCDv_UU/SherbornPoster-Sept%25252711.jpg?imgmax=800" alt="SherbornPoster Sept 11" title="SherbornPoster-Sept'11.jpg" border="0" width="424" height="600" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8980138996729546117?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8980138996729546117'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8980138996729546117'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/anchoring-biodiversity-information-from.html' title='Anchoring Biodiversity Information: from Sherborn to the 21st century and beyond'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-y9nvbrss2ac/TnJdYADi8II/AAAAAAAAA_Y/ROLfyJpa_HY/s72-c/sherb0_1949769b.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2377111311318140045</id><published>2011-09-14T10:23:00.001+01:00</published><updated>2011-09-14T10:23:46.213+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tagging'/><category scheme='http://www.blogger.com/atom/ns#' term='Social media'/><category scheme='http://www.blogger.com/atom/ns#' term='EOL'/><title type='text'>I think I now "get" the Encylopedia of Life</title><content type='html'>The &lt;a href="http://www.eol.org"&gt;Encylopedia of Life&lt;/a&gt; (EOL) has been relaunched, with a new look and much social media funkiness. I've been something of an EOL sceptic, but looking at the new site I think I can see what EOL is for. Ironically, it's not really about E. O. Wilson's original vision (&lt;a href="http://dx.doi.org/10.1016/S0169-5347(02)00040-X"&gt;doi:10.1016/S0169-5347(02)00040-X&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;Imagine an electronic page for each species of organism on Earth, available everywhere by single access on command. The page contains the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits. The page opens out directly or by linking to other data bases, such as ARKive, Ecoport, GenBank and MORPHOBANK. It comprises a summary of everything known about the species’ genome, proteome, geographical distribution, phylogenetic position, habitat, ecological relationships and, not least, its practical importance for humanity.&lt;/blockquote&gt;We still lack a decent database that does this. EOL tries, but in my opinion still falls short, partly because it isn't nearly aggressive enough in harvesting and linking data (links to the primary literature anyone?), and has absolutely no notion of phylogenetics.&lt;br /&gt;&lt;br /&gt;In terms of doing science I don't see much that I'd want to do with EOL, as opposed, say, to Wikipedia or existing taxonomic databases. But thinking about other applications, EOL has a lot of potential. One nice feature is the ability to make "collections". For example, Cyndy Parr has created a collection called &lt;a href="http://www.eol.org/collections/740"&gt;Fascinating textures&lt;/a&gt;, which is simply a collection of images in EOL (I've included some below):&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-gPLdmFpIVaM/TnBynykLi0I/AAAAAAAAA_I/4GVVZGVYjA0/textures.png?imgmax=800" alt="Textures" title="textures.png" border="0" width="312" height="314" /&gt;&lt;br /&gt;What is nice about this is that it cuts across any existing classification and assembles a set of taxa that share nothing other than having "fascinating textures". This ability to tag taxa means we could create all sorts of interest sets of taxa based on criteria that are meaningful in a particular context. For example, egotist that I am, I created a collection called &lt;a href="http://www.eol.org/collections/5706"&gt;Taxa described by Roderic Page&lt;/a&gt;, which includes the one crab and 6 bopyrid isopods that I described in the 80's. &lt;br /&gt;&lt;br /&gt;Putting on my teaching hat, I'm involved in teaching a course on animal diversity and could imagine assembling collections of taxa relevant to a particular lecture (either taxonomically, or based on some other criteria, such as all parasites of a particular taxon, or all organisms found associated with deep sea vents. Other collections could be built by people or organisations with content. For example, lists of &lt;a href="http://iphylo.blogspot.com/2011/05/top-ten-new-species-described-in-2010.html"&gt;top ten new species&lt;/a&gt;, lists of species for which the &lt;a href="http://iphylo.blogspot.com/2011/03/linking-ncbi-taxonomy-to-bbc-wildlife.html"&gt;BBC has content&lt;/a&gt;, etc.&lt;br /&gt;&lt;br /&gt;In this sense, EOL becomes a tagging service for life, a bit like &lt;a href="http://delicious.com"&gt;delicious&lt;/a&gt;. The social network side of things is still a little clunky —there doesn't seem to be a notion of "contacts" or "friends", and it needs integration with existing social networks — but I think I now "get" what EOL is for.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2377111311318140045?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2377111311318140045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2377111311318140045'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/i-think-i-now-encylopedia-of-life.html' title='I think I now &amp;quot;get&amp;quot; the Encylopedia of Life'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-gPLdmFpIVaM/TnBynykLi0I/AAAAAAAAA_I/4GVVZGVYjA0/s72-c/textures.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4070257991727055517</id><published>2011-09-13T21:46:00.001+01:00</published><updated>2011-09-13T21:56:57.363+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='duplication'/><category scheme='http://www.blogger.com/atom/ns#' term='merging'/><category scheme='http://www.blogger.com/atom/ns#' term='error'/><category scheme='http://www.blogger.com/atom/ns#' term='matching'/><title type='text'>Phantom articles: why Mendeley needs to make duplication transparent</title><content type='html'>Browsing Mendeley I found the following record: &lt;a href="http://www.mendeley.com/research/description-larva/"&gt;http://www.mendeley.com/research/description-larva/&lt;/a&gt;. This URL is for a paper&lt;br /&gt;&lt;blockquote&gt;Costa, J. M., &amp; Santos, T. C. (2008). Description of the larva of. Zootaxa, 99(2), 129-131&lt;/blockquote&gt;which apparently has the DOI &lt;a href="http://dx.doi.org/10.1645/GE-2580.1"&gt;doi:10.1645/GE-2580.1&lt;/a&gt;. This is strange because &lt;i&gt;Zootaxa&lt;/i&gt; doesn't have DOIs. The DOI given resolves to a paper in the &lt;i&gt;Journal of Parasitology&lt;/i&gt;:&lt;br /&gt;&lt;blockquote&gt;Harriman, V. B., Galloway, T. D., Alisauskas, R. T., &amp; Wobeser, G. A. (2011). Description of the larva of Ceratophyllus vagabundus vagabundus (Siphonaptera: Ceratophyllidae) from nests of Rossʼs and lesser snow geese in Nunavut, Canada. The Journal of parasitology, 93(2), 197-200&lt;/blockquote&gt;Now, this paper has it's &lt;a href="http://www.mendeley.com/research/description-larva-ceratophyllus-vagabundus-vagabundus-siphonaptera-ceratophyllidae-nests-rosss-lesser-snow-geese-nunavut-canada/"&gt;own record in Mendeley&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;OK, so this is weird..., but it gets weirder. If you look at the Mendeley page for this chimeric article there is a PDF preview of yet another article: &lt;br /&gt;&lt;blockquote&gt;LOPES, Maria José Nascimento; FROEHLICH, Claudio Gilberto  and  DOMINGUEZ, Eduardo (2003). Description of the larva of Thraulodes schlingeri (Ephemeroptera, Leptophlebiidae). Iheringia, Sér. Zool. 92(2), 197-200 2003 &lt;a href="http://dx.doi.org/10.1590/S0073-47212003000200011"&gt;doi:10.1590/S0073-47212003000200011&lt;/a&gt;&lt;/blockquote&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/--a2sSrAiZaM/Tm_BEaEmW1I/AAAAAAAAA_A/yn28RlwZeBU/mendeley_duplicate.png?imgmax=800" alt="Mendeley duplicate" title="mendeley_duplicate.png" border="0" width="400" height="211" /&gt;&lt;br /&gt;&lt;br /&gt;But it gets even more interesting. The abstract for the phantom &lt;i&gt;Zootaxa&lt;/i&gt; article belongs to yet another paper:&lt;br /&gt;&lt;blockquote&gt;Marques, K. I. D. S., &amp; Xerez, R. D.Description of the larva of Popanomyia kerteszi James &amp; Woodley (Diptera: Stratiomyidae) and identification key to immature stages of Pachygastrinae. Neotropical Entomology, 38(5), 643-648.&lt;/blockquote&gt; which also &lt;a href="http://www.mendeley.com/research/description-larva-popanomyia-kerteszi-james-woodley-diptera-stratiomyidae-identification-key-immature-stages-pachygastrinae-13/"&gt;exists in Mendeley&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To investigate further I used Mendeley's API to retrieve this record (I had to look at the source of the web page to find the internal identifier used by Mendeley, namely &lt;b&gt;010c48d0-edb5-11df-99a6-0024e8453de6&lt;/b&gt; to do this, why does Mendeley hide these?). Here's the abbreviated JSON for this record.&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;{&lt;br /&gt;  ...&lt;br /&gt;  "website": "http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21506868",&lt;br /&gt;  "identifiers": {&lt;br /&gt;    "pmid": "21506868",&lt;br /&gt;    "issn": "19372345",&lt;br /&gt;    "doi": "10.1645\/GE-2580.1"&lt;br /&gt;  },&lt;br /&gt;  ...&lt;br /&gt;  "issue": "2",&lt;br /&gt;  "pages": "129-131",&lt;br /&gt;  "public_file_hash": "fe7eed3f6c43a3be1480a0937229b9ad33666df4",&lt;br /&gt;  "publication_outlet": "Zootaxa",&lt;br /&gt;  "type": "Journal Article",&lt;br /&gt;  "mendeley_url": "http:\/\/www.mendeley.com\/research\/description-larva\/",&lt;br /&gt;  "uuid": "010c48d0-edb5-11df-99a6-0024e8453de6",&lt;br /&gt;  "authors": [&lt;br /&gt;    {&lt;br /&gt;      "forename": "J M",&lt;br /&gt;      "surname": "Costa"&lt;br /&gt;    },&lt;br /&gt;    {&lt;br /&gt;      "forename": "T C",&lt;br /&gt;      "surname": "Santos"&lt;br /&gt;    }&lt;br /&gt;  ],&lt;br /&gt;  "title": "Description of the larva of",&lt;br /&gt;  "volume": "99",&lt;br /&gt;  "year": 2008,&lt;br /&gt;  "categories": [&lt;br /&gt;    39,&lt;br /&gt;    203,&lt;br /&gt;    37,&lt;br /&gt;    52,&lt;br /&gt;    43,&lt;br /&gt;    40,&lt;br /&gt;    210&lt;br /&gt;  ],&lt;br /&gt;  "oa_journal": false&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Doesn't add much to the story, but does give us the &lt;a href="http://en.wikipedia.org/wiki/SHA-1"&gt;sha1&lt;/a&gt; for the PDF for the chimeric article (&lt;b&gt;fe7eed3f6c43a3be1480a0937229b9ad33666df4&lt;/b&gt;). If I download the PDF for the article in &lt;i&gt;Iheringia, Sér. Zool.&lt;/i&gt; it has the same sha1:&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;&lt;br /&gt;openssl sha1 a11v93n2.pdf &lt;br /&gt;SHA1(a11v93n2.pdf)= fe7eed3f6c43a3be1480a0937229b9ad33666df4&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;This article doesn't exist&lt;/b&gt;&lt;br /&gt;So, to summarise, this paper doesn't exist. It is credited to a journal that doesn't have DOIs, the DOI resolves to an article in a different journal, the abstract comes from another article in another journal, and the PDF is from a third article. OMG!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;This is just weird&lt;/b&gt;&lt;br /&gt;So, something about the way Mendeley merges references is broken. Merging references is a tough problem so there will always be cases where things go wrong. But it would be really, really helpful if Mendeley could display the set of articles that it has merged to create each canonical reference (say by listing the UUIDs for each article). Users could then see if badness had happened, and provide feedback, for example by highlighting references that are clearly the same, and those that are clearly different. Until this happens I'm a bit nervous about trusting Mendeley with my bibliographic data, I don't want it mangled into chimeric papers that don't exist.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4070257991727055517?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4070257991727055517'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4070257991727055517'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/phantom-articles-why-mendeley-needs-to.html' title='Phantom articles: why Mendeley needs to make duplication transparent'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/--a2sSrAiZaM/Tm_BEaEmW1I/AAAAAAAAA_A/yn28RlwZeBU/s72-c/mendeley_duplicate.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5422975044036276040</id><published>2011-09-13T09:34:00.001+01:00</published><updated>2011-09-13T09:34:28.273+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='citation matching'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='citation'/><category scheme='http://www.blogger.com/atom/ns#' term='microcitations'/><category scheme='http://www.blogger.com/atom/ns#' term='matching'/><title type='text'>Rethinking citation matching</title><content type='html'>Some quick half-baked thoughts on citation matching. One of the things I'd really like to add to &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; is the ability to parse article text and extract the list of literature cited. Not only would this be another source of bibliographic data I can use to find more articles in BHL, but I could also build citation networks for articles in BioStor.&lt;br /&gt;&lt;br /&gt;Citation matching is a tough problem (see the papers below for a starting point).&lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://www.mendeley.com/groups/529031/_/widget/21/5/" frameborder="0" allowTransparency="true" style="width:260px;height:400px;"&gt;&lt;/iframe&gt;&lt;p style="width:260px"&gt;&lt;a href="http://www.mendeley.com/groups/529031/citation-multi-parser/" title="Citation::Multi::Parser on Mendeley"&gt;Citation::Multi::Parser&lt;/a&gt; is a group in &lt;a href="http://www.mendeley.com/groups/computer-and-information-science/" title="Computer and Information Science on Mendeley"&gt;Computer and Information Science&lt;/a&gt; on &lt;a href="http://www.mendeley.com/" title="Mendeley"&gt;Mendeley&lt;/a&gt;.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;To date my approach has been to write various regular expressions to extract citations (mainly from web pages and databases). The goal, in a sense, is to discover the rules used to write the citation, then extract the component parts (authors, date, title, journal, volume, pagination, etc.). It's error prone — the citation might not exactly follow the rules, there might be errors (e.g., OCR, etc.). There are more formal ways of doing this (e.g., using statistical methods to discover which set of rules is most likely to have generated the citation, but these can get complicated.&lt;br /&gt;&lt;br /&gt;It occurs to me another way of doing this would be the following:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Assume, for arguments sake, we have a database of most of the references we are likely to encounter.&lt;/li&gt;&lt;li&gt;Using the most common citation styles, generate a set of possible citations for each reference.&lt;/li&gt;&lt;li&gt;Use approximate string matching to find the closest citation string to the one you have. If the match is above a certain threshold, accept the match.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;The idea is essentially to generate the universe of possible citation strings, and find the one that's closest to the string you are trying to match. Of course, tis universe could be huge, but if you restrict it to a particular field (e.g., taxonomic literature) it might be manageable. This could be a useful way of handling "&lt;a href="http://iphylo.blogspot.com/2011/03/microcitations-linking-nomenclators-to.html"&gt;microcitations&lt;/a&gt;". Instead of developing regular expressions of other tools to discover the underlying model, generate a bunch of microcitations that you expect for a given reference, and string match against those.&lt;br /&gt;&lt;br /&gt;Might not be elegant, but I suspect it would be fast.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5422975044036276040?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5422975044036276040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5422975044036276040'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/rethinking-citation-matching.html' title='Rethinking citation matching'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5557927419761876985</id><published>2011-09-13T08:45:00.001+01:00</published><updated>2011-09-13T08:45:36.186+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikispecies'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='Flickr'/><category scheme='http://www.blogger.com/atom/ns#' term='Species-ID'/><category scheme='http://www.blogger.com/atom/ns#' term='interface'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenURL'/><title type='text'>More BHL app ideas</title><content type='html'>&lt;img src="http://lh4.ggpht.com/-TOV1wUYtW64/Tm8KGmqC7JI/AAAAAAAAA-0/JiLMxWVftVw/hero_rosellas.png?imgmax=800" alt="Hero rosellas" title="hero_rosellas.png" border="0"  height="200" style="float:right;" /&gt;Following on from my &lt;a href="http://iphylo.blogspot.com/2011/09/suggested-apps-for-bhl-life-and.html"&gt;previous post on BHL apps&lt;/a&gt; and a Twitter discussion in which I appealed for a "sexier" interface for BHL (to which &lt;a href="http://twitter.com/elyw"&gt;@elyw&lt;/a&gt;&lt;a href="https://twitter.com/elyw/status/113415960760823808"&gt;replied&lt;/a&gt; that is what BHL Australia were trying to do), here are some further thoughts on improving BHL's web interface.&lt;br /&gt;&lt;b&gt;Build a new interface&lt;/b&gt;&lt;br /&gt;A fun project would be to create a BHL website clone using just the &lt;a href="http://biodivlib.wikispaces.com/Developer+Tools+and+API"&gt;BHL API&lt;/a&gt;. This would give you the freedom to explore interface ideas without having to persuade BHL to change its site. In a sense, the app would be provide the persuasion.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Third party annotations&lt;/b&gt;&lt;br /&gt;It would be nice if the BHL web site made use of third party annotations. For example, BHL itself is extracting some of the best images and putting them on &lt;a href="http://www.flickr.com/photos/biodivlibrary/sets/"&gt;Flickr&lt;/a&gt;. How about if you go to the page for an item in BHL and you see a summary of the images from that item in Flickr? At a glance you can see whether the item has some interesting content. For example, if you go to &lt;a href="http://biodiversitylibrary.org/item/109846"&gt;http://biodiversitylibrary.org/item/109846&lt;/a&gt; you see this:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/-xAbSv951fWk/Tm8KHo3yd8I/AAAAAAAAA-4/QksWyr18mBw/n2_w1150.png?imgmax=800" alt="N2 w1150" title="n2_w1150.png" border="0" width="200"  /&gt;&lt;br /&gt;&lt;br /&gt;which gives you no idea that it contains images like this:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.flickr.com/photos/biodivlibrary/6105705787/" title="n24_w1150 by BioDivLibrary, on Flickr"&gt;&lt;img src="http://farm7.static.flickr.com/6084/6105705787_93c9f272c9.jpg" width="500" height="313" alt="n24_w1150"&gt;&lt;/a&gt;&lt;b&gt;Tables of contents&lt;/b&gt;&lt;br /&gt;Another source of annotations is my own &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; project, which finds articles in scanned volumes in BHL. If you are looking at an item in BHL it would be nice to see a list of articles that have been found in that item, perhaps displayed in a drop down menu as a table of contents. This would help provide a way to navigate through the volume.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Who links to BHL?&lt;/b&gt;&lt;br /&gt;When I suggested third party annotations on Twitter &lt;a href="http://twitter.com/stho002"&gt;@stho002&lt;/a&gt;&lt;a href="https://twitter.com/stho002/status/113382622448717825"&gt;chimed in&lt;/a&gt; asking about &lt;a href="http://species.wikimedia.org/"&gt;Wikispecies&lt;/a&gt;, &lt;a href="http://species-id.net/"&gt;Species-ID&lt;/a&gt;, &lt;a href="http://zoobank.org"&gt;ZooBank&lt;/a&gt;, etc. These resources are different, in that they aren't repurposing BHL content but are linking to it. It woud be great if a BHL page for an item could display reverse links (i.e., the pages in those external databases that link to that BHL item).&lt;br /&gt;&lt;br /&gt;Implementing reverse links (essential citation linking) can be tricky, but two ways to do it might be:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Use BHL web server logs to find and extract referrals from those projects&lt;/li&gt;&lt;li&gt;Perhaps more elegantly, encourage external databases to link to BHL content using an OpenURL which includes the URL of the originating page. OpenURL can be messy, but especially in Mediawiki-based projects such as Wikispecies and Species-ID it would be straightforward to make a template that generated the correct syntax. In this way BHL could harvest the inbound links and display them on the item page.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5557927419761876985?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5557927419761876985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5557927419761876985'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/more-bhl-app-ideas.html' title='More BHL app ideas'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-TOV1wUYtW64/Tm8KGmqC7JI/AAAAAAAAA-0/JiLMxWVftVw/s72-c/hero_rosellas.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-862996663166401864</id><published>2011-09-12T10:40:00.001+01:00</published><updated>2011-09-12T11:06:01.503+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='duplicates'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='HS_ALIAS'/><category scheme='http://www.blogger.com/atom/ns#' term='CrossRef'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenHandle'/><category scheme='http://www.blogger.com/atom/ns#' term='indirection'/><category scheme='http://www.blogger.com/atom/ns#' term='Handle'/><title type='text'>Duplicate DOIs for the same article:  alias DOIs, who knew?</title><content type='html'>As part of a project to map taxonomic citations to bibliographic identifiers I'm tackling strings like this (from the ION record &lt;a href="http://www.organismnames.com/lsidmetadata.htm?lsid=1405511"&gt;urn:lsid:organismnames.com:name:1405511&lt;/a&gt; for &lt;i&gt;Pseudomyrmex crudelis&lt;/i&gt;):&lt;br /&gt;&lt;br /&gt;&amp;lt;tdwg_co:PublishedIn&amp;gt;&lt;br /&gt;Systematics, biogeography and host plant associations of the Pseudomyrmex viduus group (Hymenoptera: Formicidae), Triplaris- and Tachigali-inhabiting ants. Zoological Journal of the Linnean Society, 126(4), August 1999: 451-540. 516 [Zoological Record Volume 136]&lt;br /&gt;&amp;lt;/tdwg_co:PublishedIn&amp;gt;&lt;br /&gt;&lt;br /&gt;I parse the string into its components (e.g., journal, volume, issue, pagination) and use scripts to locate identifiers such as DOIs. I regard DOIs as the gold standard for bibliographic identifiers. The are (usually) unique, and CrossRef provides some really useful services to support them (DOIs now also &lt;a href="http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html"&gt;support linked data&lt;/a&gt; if you are in to that sort of thing). Occasionally there are problems, such as duplicate DOIs when &lt;a href="http://iphylo.blogspot.com/2007/05/duplicate-dois.html"&gt;material moves from a publisher's site to, say, JSTOR&lt;/a&gt;. And some publishers are really, really bad at releasing DOIs that don't resolve. For example, &lt;a href="http://www.tandfonline.com/"&gt;Taylor &amp; Francis Online&lt;/a&gt; have at least 18,000 DOIs for the &lt;i&gt;Annals and Magazine of Natural History&lt;/i&gt; that don't resolve (e.g., &lt;a href="http://dx.doi.org/10.1080/00222933809512318"&gt;doi:10.1080/00222933809512318&lt;/a&gt; for &lt;a href="http://www.tandfonline.com/doi/abs/10.1080/00222933809512318"&gt;this paper&lt;/a&gt;). &lt;br /&gt;&lt;br /&gt;Sometimes my automated scripts for finding DOIs fail and I have to resort to Googling. To my surprise, I found two versions of the paper "Systematics, biogeography and host plant associations of the Pseudomyrmex viduus group (Hymenoptera: Formicidae), Triplaris- and Tachigali-inhabiting ants", each with a different DOI:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1006/zjls.1998.0158"&gt;doi:10.1006/zjls.1998.0158&lt;/a&gt; at &lt;a href="http://linkinghub.elsevier.com/retrieve/pii/S0024408298901583"&gt;Science Direct&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1111/j.1096-3642.1999.tb00157.x"&gt;doi:10.1111/j.1096-3642.1999.tb00157.x&lt;/a&gt; at &lt;a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1096-3642.1999.tb00157.x/abstract"&gt;Wiley&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Now, this isn't supposed to happen. Interestingly, if you resolve &lt;a href="http://dx.doi.org/10.1006/zjls.1998.0158"&gt;doi:10.1006/zjls.1998.0158&lt;/a&gt;, either on the web or using CrossRef's OpenURL resolver, you get the page/metadata for &lt;a href="http://dx.doi.org/10.1111/j.1096-3642.1999.tb00157.x"&gt;doi:10.1111/j.1096-3642.1999.tb00157.x&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To see what was going on I fired up my local installation of Tony Hammnd's &lt;a href="http://code.google.com/p/openhandle/"&gt;OpenHandle&lt;/a&gt; tool (see &lt;a href="http://bioguid.info/openhandle/"&gt;http://bioguid.info/openhandle/&lt;/a&gt;) and entered the Elsevier DOI (10.1006/zjls.1998.0158) and got this:&lt;br /&gt;&lt;br /&gt;&lt;pre style="font-size:10px;border:1px solid rgb(192,192,192);background-color:rgb(240,240,240);"&gt;&lt;br /&gt;{&lt;br /&gt;    "comment" : "OpenHandle (JSON) - see http://code.google.com/p/openhandle/" ,&lt;br /&gt;    "handle" : "hdl:10.1006/zjls.1998.0158" ,&lt;br /&gt;    "handleStatus" : {&lt;br /&gt;        "code" : "1" ,&lt;br /&gt;        "message" : "SUCCESS"&lt;br /&gt;    } ,&lt;br /&gt;    "handleValues" : [&lt;br /&gt;        {&lt;br /&gt;            "index" : "100" ,&lt;br /&gt;            "type" : "HS_ADMIN" ,&lt;br /&gt;         "data" : {&lt;br /&gt;                "adminRef" : "hdl:10.1006/zjls.1998.0158?index=100" ,&lt;br /&gt;                "adminPermission" : "111111110111"&lt;br /&gt;            } ,&lt;br /&gt;            "permission" : "1110" ,&lt;br /&gt;            "ttl" : "+86400" ,&lt;br /&gt;            "timestamp" : "Thu Apr 13 19:09:03 BST 2000" ,&lt;br /&gt;            "reference"  : []&lt;br /&gt;        } ,&lt;br /&gt;        {&lt;br /&gt;            "index" : "1" ,&lt;br /&gt;            "type" : "URL" ,&lt;br /&gt;            "data" : "http://linkinghub.elsevier.com/retrieve/pii/S0024408298901583" ,&lt;br /&gt;            "permission" : "1110" ,&lt;br /&gt;            "ttl" : "+86400" ,&lt;br /&gt;            "timestamp" : "Tue Aug 12 16:43:12 BST 2003" ,&lt;br /&gt;            "reference"  : []&lt;br /&gt;        } ,&lt;br /&gt;        {&lt;br /&gt;            "index" : "700050" ,&lt;br /&gt;            "type" : "700050" ,&lt;br /&gt;            "data" : "20030811104844000" ,&lt;br /&gt;            "permission" : "1110" ,&lt;br /&gt;            "ttl" : "+86400" ,&lt;br /&gt;            "timestamp" : "Tue Aug 12 16:43:16 BST 2003" ,&lt;br /&gt;            "reference"  : []&lt;br /&gt;        } ,&lt;br /&gt;        {&lt;br /&gt;            "index" : "1970" ,&lt;br /&gt;            "type" : "HS_ALIAS" ,&lt;br /&gt;            "data" : "10.1111/j.1096-3642.1999.tb00157.x" ,&lt;br /&gt;            "permission" : "1110" ,&lt;br /&gt;            "ttl" : "+86400" ,&lt;br /&gt;            "timestamp" : "Mon Aug 25 21:06:50 BST 2008" ,&lt;br /&gt;            "reference"  : []&lt;br /&gt;        }&lt;br /&gt;    ]&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The interesting bit is the "HS_ALIAS" at the bottom. I'd not come across this before, although it's in the spec (&lt;a href="http://www.ietf.org/rfc/rfc3651.txt"&gt;RFC 3651&lt;/a&gt;) for all to see (yeah, but who reads those?). The handle system that underlies DOIs has mechanism to support aliases, so that a DOI that originally pointed to a web page (say, for an article) can be redirected to point to another DOI. In this case, the Elsevier DOI redirects to the Wiley DOI ("10.1111/j.1096-3642.1999.tb00157.x" in the HS_ALIAS section), so the user ends up at Wiley's page for this article, not Elsevier's. This provides a way to accommodate changes in article ownership, without requiring an existing publisher to reuse the previous publisher's DOI.&lt;br /&gt;&lt;br /&gt;In one sense this seems to defeat the point of DOIs, namely that they are effectively opaque identifiers that any publisher should be able to host. Perhaps in this case the issue is that the DOI prefix ("10.1006" and "10.1111" for Elsevier and Wiley, respectively) corresponds to a publisher, and when something goes wrong with a DOI it's easier to identify who is responsible based on this prefix, rather than the individual DOI.&lt;br /&gt;&lt;br /&gt;In any event, next time I come across a duplicate DOI I'll need to check whether it is an alias of another DOI before launching into another rant about the (occasional) failings of DOIs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-862996663166401864?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/862996663166401864'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/862996663166401864'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/duplicate-dois-for-same-article-alias.html' title='Duplicate DOIs for the same article:  alias DOIs, who knew?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8074113326590115651</id><published>2011-09-07T18:00:00.001+01:00</published><updated>2011-09-10T11:47:42.231+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OCR'/><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='Life and Literature'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='PLoS'/><category scheme='http://www.blogger.com/atom/ns#' term='iPhone'/><category scheme='http://www.blogger.com/atom/ns#' term='iPad'/><category scheme='http://www.blogger.com/atom/ns#' term='Challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='hOCR'/><category scheme='http://www.blogger.com/atom/ns#' term='ngram'/><title type='text'>Suggested apps for BHL's Life and Literature Code Challenge</title><content type='html'>&lt;a href='http://photo.blogpressapp.com/show_photo.php?p=11/09/07/1908.jpg'&gt;&lt;img src='http://photo.blogpressapp.com/photos/11/09/07/s_1908.jpg' border='0' width='128' height='128' align='right' style='margin:5px'&gt;&lt;/a&gt;&lt;br /&gt;Since I won't be able to be at the &lt;a target="_blank" href="http://www.biodiversitylibrary.org"&gt;Biodiversity Heritage Library's&lt;/a&gt; Life and Literature meeting I thought I'd share some ideas for their &lt;a target="_blank" href="http://www.lifeandliterature.org/p/code-challenge.html"&gt;Life and Literature Code Challenge&lt;/a&gt;. The deadline is pretty close (October 17) so having ideas now isn't terribly helpful I admit. That aside, here are some thoughts inspired by the challenge. In part this post has been inspired by  the &lt;a target="_blank" href="http://www.mendeley.com/blog/developer-resources/what-the-scientific-community-wants-computers-to-do-for-them-the-results-of-the-plos-and-mendeley-call-for-apps/"&gt;Results of the PLoS and Mendeley "Call for Apps"&lt;/a&gt;, where PLoS and Mendeley asked for people (not necessarily developers) to suggest the kind of apps they'd like to see. As an aside, one thing conspicuous by it's absence is a prize for winning the challenge. PLoS and Mendeley have a &lt;a target="_blank" href="http://dev.mendeley.com/api-binary-battle"&gt;"API Binary Battle"&lt;/a&gt; with a prize of $US 10,001, which seems more likely to inspire people to take part.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Visual search engine&lt;/b&gt;&lt;br /&gt;I suspect that many BHL users are looking for illustrations (exemplified by the images being gathered in BHL's &lt;a target="_blank" href="http://www.flickr.com/photos/biodivlibrary/sets/"&gt;Flickr group&lt;/a&gt;). One way to search for images would be to search within the OCR text for figure and plate captions, such as "Fig. 1". Indexing these captions by taxonomic name would provide a simple image search tool. For modern publications most figures are on the same page as the caption, but for older publications with illustrations as plates, the caption and corresponding image may be separated (e.g., on facing pages), so the search results might need to show pages around the page containing the caption. As an aside, it's a pity the Flickr images only link to the BHL item and not the BHL page. If they did the later, and the images were tagged with what they depict, you could great a visual search engine using the Flickr API (of course, this might be just the way to implement the visual search engine — harvest images, tags with PageID and taxon names, upload to Flickr).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Mobile interface&lt;/b&gt;&lt;br /&gt;The BHL web site doesn't look great on an iPhone. It makes no concessions to the mobile device, and there are some weird things such as the way the list of pages is rendered. A number of mainstream science publishers are exploring mobile versions of their web sites, for example &lt;a target="_blank" href="http://www.tandfonline.com"&gt;Taylor and Francis&lt;/a&gt; have a &lt;a target="_blank" href="http://jquerymobile.com/"&gt;jQuery Mobile&lt;/a&gt; powered interface for mobile users. I've explored iPad interfaces to scientific articles in previous posts. BHL content posses some challenges, but is fundamentally the same as viewing PDFs — you have fixed pages that you may want to zoom.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;OCR correction&lt;/b&gt;&lt;br /&gt;There is a lot of scope for cleaning up the OCR text in BHL. Part of the trick would be to have a simple use interface for people to contribute to this task. In an earlier post I discussed a &lt;a target="_blank" href="http://iphylo.blogspot.com/2011/07/correcting-ocr-using-hocr-firefox.html"&gt;Firefox hOCR add-on&lt;/a&gt; that provides a nice way to do this. Take this as a starting point, add a way to save the cleaned up text, and you'd be well on the way to making a useful tool.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Taxon name timeline&lt;/b&gt;&lt;br /&gt;Despite the shiny new interface, the &lt;a target="_blank" href="http://www.eol.org"&gt;Encyclopedia of Life&lt;/a&gt; still displays BHL literature in the same clunky way I described in an &lt;a target="_blank" href="http://iphylo.blogspot.com/2009/09/visualising-biodiversity-heritage.html"&gt;earlier blog post&lt;/a&gt;. It would great to have a timeline of the usage of a name, especially if you could compare the usage of different names (such as &lt;a target="_blank" href="http://iphylo.blogspot.com/2009/10/biodiversity-heritage-library.html"&gt;synonyms&lt;/a&gt;). In many ways this is the BHL equivalent &lt;a target="_blank" href="http://ngrams.googlelabs.com/"&gt;Google Books Ngram viewer&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;These are just a few hastily put together thoughts. If you have any other ideas or suggestions, feel free to add them as comments below.&lt;br /&gt;&lt;br /&gt;- Posted using BlogPress from my iPad&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8074113326590115651?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8074113326590115651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8074113326590115651'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/09/suggested-apps-for-bhl-life-and.html' title='Suggested apps for BHL&amp;#39;s Life and Literature Code Challenge'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3876350482796260177</id><published>2011-08-28T00:18:00.001+01:00</published><updated>2011-08-28T00:18:42.790+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NSF'/><category scheme='http://www.blogger.com/atom/ns#' term='Tree of Life'/><category scheme='http://www.blogger.com/atom/ns#' term='NCBI'/><category scheme='http://www.blogger.com/atom/ns#' term='annotation'/><title type='text'>Tree of Life 0.1 - annotating the NCBI taxonomy</title><content type='html'>Last week I was at the NSF "Assembling, Visualising and Analysing the Tree of Life" Ideas Lab, run by &lt;a href="http://www.knowinnovation.com/"&gt;KnowInnovation.com/&lt;/a&gt;. It was an interesting experience, essentially a structured week of brainstorming ideas.&lt;br /&gt;&lt;br /&gt;One thing I came away with is the feeling that our notions of the "tree of life" are fuzzy, contradictory, and often probably unobtainable. It's tempting to imagine all sorts of wonderful visualisations, and loose sight of building something that is useful. Perhaps it's time instead to think of "Tree of Life version 0.1". &lt;br /&gt;&lt;br /&gt;Imagine taking the NCBI taxonomy as a starting point. Yes it's incomplete, and has almost no fossils, but it's freely available and linked to a lot of data. Let's use a Google Maps-like viewer along the lines I &lt;a href="http://iphylo.blogspot.com/2011/03/zooming-large-tree-now-with-thumbnails.html"&gt;explored earlier this year&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Then add annotation "tracks" to the tips. As a first pass these could be taken from the &lt;a href="http://www.ncbi.nlm.nih.gov/projects/linkout/"&gt;NCBI LinkOut&lt;/a&gt; service, such as the NCBI-Wikipedia mapping &lt;a href="http://iphylo.org/linkout"&gt;http://iphylo.org/linkout&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-me96WuCex7s/Tll7STe0lrI/AAAAAAAAA-Y/T1Nqw-JZlYk/ncbi-1.png?imgmax=800" alt="Ncbi 1" title="ncbi-1.png" border="0" width="400" height="410" /&gt;&lt;br /&gt;&lt;br /&gt;The NCBI tree is a classification rather than a phylogeny, so we could add greater phylogenetic content by linking to phylogenetic databases, such as &lt;a href="http://www.treebase.org"&gt;TreeBASE&lt;/a&gt; and &lt;a href="http://phylota.net"&gt;PhyLoTA&lt;/a&gt;. Imagine clicking on a node in the NCBI taxonomy and seeing a display of all the phylogenies centred on that node:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-lAqpz4FgoEA/Tll7TUcuhpI/AAAAAAAAA-c/Hg3jDIdlSGw/ncbi-02.png?imgmax=800" alt="Ncbi 02" title="ncbi-02.png" border="0" width="400"  /&gt;&lt;br /&gt;&lt;br /&gt;Now we have a way to navigate a large tree, view annotations, and display phylogenetic trees. All of this could be done fairly easily. The key is to have services keyed by the NCBI tax_id used to identify nodes on the tree.&lt;br /&gt;&lt;br /&gt;Among the next steps would be to add additional "tracks", perhaps based on curated links analogous to the wiki-based NCBI-Wikipedia mapping. For example, very basic habitat data (marine or terrestrial) could be added, or geography, or host relationships (could be based in part on the data already in GenBank).&lt;br /&gt;&lt;br /&gt;Given that the NCBI tree continues to grow, subsequent versions could be released as the tree changes. Or we could "fork" the NCBI tree and start to refine it based on phylogenetic information, and add taxa that aren't in the genome databases (these taxa will need consistent identifiers so we can map annotations on to them as well). Perhaps we could use something like &lt;a href="http://en.wikipedia.org/wiki/Git_(software)"&gt;Git&lt;/a&gt; to manage this tree, and to handle the necessary merging of updated versions of the NCBI tree. People could edit the tree, or indeed fork it and come up with their own.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh6.ggpht.com/-Lj8I4ebWeao/Tll7UA88ybI/AAAAAAAAA-g/cDWvJYHkz0c/logo_tmp_reasonably_small.png?imgmax=800" alt="Logo tmp reasonably small" title="logo_tmp_reasonably_small.png" border="0" width="128" height="128" style="float:right;" /&gt;There are lots of ways to visualise trees (see &lt;a href="http://treevis.net"&gt;TreeVis.net&lt;/a&gt; for some great examples), but what I'm after is a tool that is useful, that gives us a sense of what we know and what we don't. I suspect that one of the reasons we've struggled with visualising the tree of life is that there are lots of different notions about what it's for. In this case, I want a tool to navigate data about organisms, one that we can easily add annotations too.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3876350482796260177?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3876350482796260177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3876350482796260177'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/08/tree-of-life-01-annotating-ncbi.html' title='Tree of Life 0.1 - annotating the NCBI taxonomy'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-me96WuCex7s/Tll7STe0lrI/AAAAAAAAA-Y/T1Nqw-JZlYk/s72-c/ncbi-1.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-967422124023459646</id><published>2011-08-26T17:20:00.001+01:00</published><updated>2011-08-26T19:13:45.132+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NSF'/><category scheme='http://www.blogger.com/atom/ns#' term='Ideator'/><category scheme='http://www.blogger.com/atom/ns#' term='close to the bone'/><category scheme='http://www.blogger.com/atom/ns#' term='AVATOL'/><title type='text'>I am not a number...I am an "ideator"</title><content type='html'>As part of the NSF "Assembling, Visualising and Analysing the Tree of Life" Ideas Lab that I took part in earlier this week I had an assessment of my "problem solving style" carried out using a service called &lt;a href="http://www.foursightonline.com/"&gt;FourSight&lt;/a&gt;. I'm hugely sceptical of attempts to classify people (I'm unique, aren't I?), but I took the test and turns out am an "Ideator". FourSight's web site defines an Ideator as one who:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Likes to look at the big picture&lt;/li&gt;&lt;li&gt;Enjoys toying with ideas and possibilities&lt;/li&gt;&lt;li&gt;Likes to stretch his or her imagination&lt;/li&gt;&lt;li&gt;Enjoys thinking in more global and abstract terms&lt;/li&gt;&lt;li&gt;Takes an intuitive approach to innovation&lt;/li&gt;&lt;li&gt;May overlook details&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Details schmetails, it's the big picture folks!&lt;br /&gt;&lt;br /&gt;Ideators are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Playful&lt;/li&gt;&lt;li&gt;Imaginative&lt;/li&gt;&lt;li&gt;Social&lt;/li&gt;&lt;li&gt;Adaptable&lt;/li&gt;&lt;li&gt;Flexible&lt;/li&gt;&lt;li&gt;Adventurous&lt;/li&gt;&lt;li&gt;Independent&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Liking this. OK, how do you care for ideators? We need:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Room to be playful&lt;/li&gt;&lt;li&gt;Constant stimulation&lt;/li&gt;&lt;li&gt;Variety and change&lt;/li&gt;&lt;li&gt;The big picture&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;That's right, leave us alone to think our great thoughts. Result! Then there's this totally superfluous category "Ideators annoy others by...". &lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Drawing attention to themselves&lt;/li&gt;&lt;li&gt;Being impatient when others don’t get their ideas&lt;/li&gt;&lt;li&gt;Offering ideas that are too off-the-wall&lt;/li&gt;&lt;li&gt;Being too abstract&lt;/li&gt;&lt;li&gt;Not sticking to one idea&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Utter, utter, nonsense. Look at my blog, it's full of ideas that have been developed fully... oh, wait. And, maybe the blog thing is a bit attention seeking, and I guess saying "it sucks" is a tad impatient, and saying to a crowd of taxonomists "haven't we basically found every species bigger than my coffee cup?" is a little off-the-wall.&lt;br /&gt;&lt;br /&gt;Good job these psychometric thingies are clearly bogus.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-967422124023459646?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/967422124023459646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/967422124023459646'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/08/i-am-not-numberi-am.html' title='I am not a number...I am an &amp;quot;ideator&amp;quot;'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2425236243751210928</id><published>2011-07-13T13:12:00.001+01:00</published><updated>2011-07-13T13:17:09.372+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='microformat'/><category scheme='http://www.blogger.com/atom/ns#' term='OCR'/><category scheme='http://www.blogger.com/atom/ns#' term='markup'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='hOCR'/><category scheme='http://www.blogger.com/atom/ns#' term='Firefox'/><title type='text'>Correcting OCR using hOCR in Firefox</title><content type='html'>Quick post on a little tool I came across, &lt;a href="http://jimgarrison.org/moz-hocr-edit/"&gt;moz-hocr-edit&lt;/a&gt;. This Firefox add-on lets you proofread Optical Character Recognition (OCR) output. Given my interest in &lt;a href="http://iphylo.blogspot.com/2010/12/bhl-and-ocr.html"&gt;OCR and the Biodiversity Heritage Library&lt;/a&gt; I decided to take it for a spin.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;moz-hocr-edit&lt;/b&gt; uses the &lt;a href="http://en.wikipedia.org/wiki/HOCR"&gt;hOCR&lt;/a&gt;, which is a format for representing the output of OCR software, and is used by tools such as &lt;a href="http://en.wikipedia.org/wiki/OCRopus"&gt;OCRopus&lt;/a&gt; (you can see the public specification for hOCR &lt;a href="http://docs.google.com/View?docid=dfxcv4vc_67g844kf"&gt;here&lt;/a&gt;). Basically it's a microformat, that is, it's HTML with some additional tags. Given some hOCR, &lt;b&gt;moz-hocr-edit&lt;/b&gt; enables you to edit the OCR output line-by-line.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Demo&lt;/b&gt;&lt;br /&gt;I've created a simple demo based upon &lt;a href="http://biostor.org/reference/80780"&gt;Case 3368 Eatoniella Dall, 1876 and EATONIELLIDAE Ponder, 1965 (Mollusca, Gastropoda): proposed conservation&lt;/a&gt;. For the demo to work you will need to use the Firefox web browser with the &lt;a href="https://addons.mozilla.org/en-US/firefox/addon/11067"&gt;moz-hocr-edit&lt;/a&gt; installed.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Go to &lt;a href="http://dl.dropbox.com/u/639486/hocr/80780.html"&gt;http://dl.dropbox.com/u/639486/hocr/80780.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;You will see a simple HTML representation of the OCR text from "Case 3368 Eatoniella Dall, 1876 and EATONIELLIDAE Ponder, 1965 (Mollusca, Gastropoda): proposed conservation". I created this HTML from the original &lt;a href="http://en.wikipedia.org/wiki/ABBYY"&gt;ABBYY FineReader&lt;/a&gt; XML from the Internet Archive.&lt;/li&gt;&lt;li&gt;On the bottom right-hand of the Firefox browser window you should see &lt;b&gt;hOCR&lt;/b&gt;. Click on it and select "Edit this hOCR document":&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-UvG_XpBDCgw/Th2LnwmpZtI/AAAAAAAAA6g/VMW6guACy5U/statusbar.png?imgmax=800" alt="Statusbar" border="0" width="397" height="130" /&gt;&lt;/li&gt;&lt;li&gt;Firefox will open a new tab that will look something like this:&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-W1g4z7-IiDY/Th2Lo2GMvJI/AAAAAAAAA6k/_P49tsGwSUU/screenshot.png?imgmax=800" alt="Screenshot" border="0" width="400" height="289" /&gt;&lt;/li&gt;&lt;li&gt;You can now edit individual lines of text, and see your edits applied to the HTML below.&lt;/li&gt;&lt;/ol&gt;&lt;b&gt;moz-hocr-edit&lt;/b&gt; is a neat little tool. With appropriate web server settings (and, as the tool's author Jim Garrison suggests, autoversioning) it could the basis of a great tool for correcting OCR errors in BHL. &lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2425236243751210928?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2425236243751210928'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2425236243751210928'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/07/correcting-ocr-using-hocr-firefox.html' title='Correcting OCR using hOCR in Firefox'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-UvG_XpBDCgw/Th2LnwmpZtI/AAAAAAAAA6g/VMW6guACy5U/s72-c/statusbar.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6915835232717452414</id><published>2011-07-12T08:23:00.001+01:00</published><updated>2011-07-12T10:45:10.888+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='phylogeny'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='vizbi'/><category scheme='http://www.blogger.com/atom/ns#' term='talk'/><title type='text'>Talk @vizbi on phylogeny visualisation</title><content type='html'>The talks from the 2001 workshop on Visualizing Biological Data (&lt;a href="http://vizbi.org/2011/"&gt;VizBi 2011&lt;/a&gt;) are now &lt;a href="http://www.vimeo.com/vizbi"&gt;available on Vimeo&lt;/a&gt;. There were some great talks at VizBi, especially the keynotes (the "featured videos" on the Vimeo page for VizBi).&lt;br /&gt;&lt;br /&gt;My own (slightly breathless) talk was on phylogeny visualisation, which you can watch below. &lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://player.vimeo.com/video/26314853?color=ffffff" width="400" height="300" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/26314853"&gt;Visualization of phylogenetics &amp; phylogeography&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;br /&gt;In the talk I mention that the slides are also on &lt;a href="http://www.slideshare.net/rdmpage/phylogeny-vizbi-2011"&gt;SlideShare&lt;/a&gt;, and that is where you'll find URLs for the projects I mention. The URls aren't all that easy to get that way, so here they are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://evolution.genetics.washington.edu/phylip/software.html#Plotting "&gt;http://evolution.genetics.washington.edu/phylip/software.html#Plotting&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://loco.biosci.arizona.edu/paloverde/paloverde.html"&gt;http://loco.biosci.arizona.edu/paloverde/paloverde.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.wellcometreeoflife.org/"&gt;http://www.wellcometreeoflife.org/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1186/1471-2105-5-48"&gt;http://dx.doi.org/10.1186/1471-2105-5-48&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1093/sysbio/46.3.523"&gt;http://dx.doi.org/10.1093/sysbio/46.3.523&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1186/1471-2105-8-213"&gt;http://dx.doi.org/10.1186/1471-2105-8-213&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.blogspot.com/2007/06/earth-not-flat-official.html"&gt;http://iphylo.blogspot.com/2007/06/earth-not-flat-official.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://systbio.org/?q=node/184"&gt;http://systbio.org/?q=node/184&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1080/10635150701266848"&gt;http://dx.doi.org/10.1080/10635150701266848&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/17968069"&gt;http://www.ncbi.nlm.nih.gov/pubmed/17968069&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://innovis.cpsc.ucalgary.ca/Research/CollaborativeTreeComparison"&gt;http://innovis.cpsc.ucalgary.ca/Research/CollaborativeTreeComparison&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.perceptivepixel.com/"&gt;http://www.perceptivepixel.com/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.blogspot.com/2008/08/perceptive-pixel-taxonomy-demo.html"&gt;http://iphylo.blogspot.com/2008/08/perceptive-pixel-taxonomy-demo.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www-ab.informatik.uni-tuebingen.de/software/dendroscope"&gt;http://www-ab.informatik.uni-tuebingen.de/software/dendroscope&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dx.doi.org/10.1126/science.300.5626.1692"&gt;http://dx.doi.org/10.1126/science.300.5626.1692&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://loco.biosci.arizona.edu/"&gt;http://loco.biosci.arizona.edu/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.blogspot.com/2007/08/visualising-very-big-trees-part-iv.html"&gt;http://iphylo.blogspot.com/2007/08/visualising-very-big-trees-part-iv.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://iphylo.blogspot.com/2011/02/live-demo-of-zooming-large-tree.html"&gt;http://iphylo.blogspot.com/2011/02/live-demo-of-zooming-large-tree.html&lt;/a&gt;&lt;/li&gt;&lt;ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6915835232717452414?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6915835232717452414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6915835232717452414'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/07/talk-vizbi-on-phylogeny-visualisation.html' title='Talk @vizbi on phylogeny visualisation'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2148965393526945272</id><published>2011-06-11T19:16:00.001+01:00</published><updated>2011-06-11T19:21:42.204+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hack4knowledge'/><category scheme='http://www.blogger.com/atom/ns#' term='SVG'/><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='treemap'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='quantum treemap'/><title type='text'>Mendeley Hack4Knowledge: towards an "ego wall"</title><content type='html'>I'm taking a virtual part in Mendeley's &lt;a href="http://hack4knowledge.eventbrite.com/"&gt;Hack4Knowledge&lt;/a&gt; event. I'm using this a chance to explore some ideas about building novel interfaces to bibliographic data in Mendeley. One idea is to display a user's entire library in one screen. I think the user interfaces employed by most bibliographic software are too conservative and there some cool things that could be done. For example, see &lt;b&gt;A fluid treemap interface for personal digital libraries&lt;/b&gt; (&lt;a href="http://dx.doi.org/10.1145/1065385.1065512"&gt;doi:10.1145/1065385.1065512&lt;/a&gt;, PDF available &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.1365"&gt;from CiteSeer&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;One idea I'm playing with is to display all a Mendeley user's papers as a quantum treemap, with thumbnails of the papers and "badges" indicating, for example, how many readers each paper has. The idea is that at a glance you can see all your publications, and which ones are being read the most. You can think of it as an "ego wall" — a quick way to see what others think about your work. Below is part of my library. You can see the full treemap &lt;a href="http://iphylo.org/~rpage/hack4knowledge/wall/wall.svg"&gt;here as an SVG file&lt;/a&gt;. Imagine this as an iPad interface to a user's Mendeley library.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/-XRvRid4urqg/TfOxCbNx8PI/AAAAAAAAA58/MdTTibKBNpg/wall.png?imgmax=800" alt="Wall" border="0" width="500" height="414" /&gt;&lt;br /&gt;&lt;br /&gt;Eventually I'll make this live. I'm doing this yet as the script to create the visualisation is slow due to the multiple requests I need to make to get the necessary information. I have to get the list of a user's papers from Mendeley, then I call the API for each paper to get basic bibliographic details. I have to screen scrape the corresponding paper's web page to get the thumbnail and the paper's UUID, which I can then use to get the readership stats via Mendeley's API via yet another API call. Sigh.&lt;br /&gt;&lt;br /&gt;Anyway, this is enough hacking for one day. Hope to spend some more time on this project tomorrow.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2148965393526945272?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2148965393526945272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2148965393526945272'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/06/mendeley-hack4knowledge-towards-wall.html' title='Mendeley Hack4Knowledge: towards an &amp;quot;ego wall&amp;quot;'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-XRvRid4urqg/TfOxCbNx8PI/AAAAAAAAA58/MdTTibKBNpg/s72-c/wall.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-157885089021110796</id><published>2011-06-08T17:19:00.001+01:00</published><updated>2011-06-08T17:19:44.515+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='Solr'/><category scheme='http://www.blogger.com/atom/ns#' term='lucene'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><title type='text'>Adding Solr to BioStor: searching for real</title><content type='html'>&lt;img src="http://lh4.ggpht.com/-Psg2ndw50CM/Te-hHVf_qKI/AAAAAAAAA50/5mAZYkpMPIs/solr.jpg?imgmax=800" alt="Solr" border="0" width="200" height="110" style="float:right;" /&gt;&lt;br /&gt;&lt;br /&gt;Prompted by the appearance on the BHL blog of &lt;a href="http://biodiversitylibrary.blogspot.com/2011/06/bhl-and-our-users-rod-page-and-biostor.html"&gt;an article about BioStor&lt;/a&gt; I've thinking about how to improve what is basically a fairly clunky tool.&lt;br /&gt;&lt;br /&gt;One major weakness is searching the collection of nearly 40,000 articles extracted from BHL. Note the word "extracted." BioStor isn't a tool like PubMed or Google Scholar where the goal is to find articles on a topic. Instead it addresses a more specific question, namely whether a given article is contained in an item scanned by BHL. Confusion about this was one reason publication of my paper on BioStor (&lt;a href="http://dx.doi.org/10.1186/1471-2105-12-187"&gt;doi:10.1186/1471-2105-12-187&lt;/a&gt;) took so long to pass through the review stage.&lt;br /&gt;&lt;br /&gt;However, users (myself included) expect to be able to search for articles. So, it's time to explore ways to make it easier to find articles within  the BioStor database. I've junked the previous pretty crappy code I wrote and have started to play with the &lt;a href="http://lucene.apache.org/solr/"&gt;Solr search engine&lt;/a&gt;. I'd experimented with Solr a while ago, but other stuff got in the way. Today I've managed to add it to BioStor and do a preliminary indexing of the articles in BioStor. So far I'm only indexing basic bibliographic metadata, and displaying the first 30 hits, but already it's making it much easier to find interesting stuff in BioStor. &lt;br /&gt;&lt;br /&gt;Solr also supports faceted searching (i.e., clustering results by categories such as year, author, journal). I don't so much with this yet, but there's clearly a lot of scope. I could also add taxonomic names, and even the OCR text to Solr, greatly expanding the ability to find articles. But that's for the future. For now, here are some interesting searches:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://biostor.org/search.php?q=fig+wasps"&gt;fig wasps&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://biostor.org/search.php?q=lice"&gt;lice&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://biostor.org/search.php?q=Begonia"&gt;Begonia&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://biostor.org/search.php?q=frogs"&gt;frogs&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://biostor.org/search.php?q= Madagascar"&gt; Madagascar &lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-157885089021110796?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/157885089021110796'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/157885089021110796'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/06/adding-solr-to-biostor-searching-for.html' title='Adding Solr to BioStor: searching for real'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-Psg2ndw50CM/Te-hHVf_qKI/AAAAAAAAA50/5mAZYkpMPIs/s72-c/solr.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2766547864066269175</id><published>2011-06-08T08:36:00.001+01:00</published><updated>2011-06-08T08:42:20.589+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hack4knowledge'/><category scheme='http://www.blogger.com/atom/ns#' term='OAuth'/><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='authorship'/><category scheme='http://www.blogger.com/atom/ns#' term='identity'/><category scheme='http://www.blogger.com/atom/ns#' term='hack'/><title type='text'>I wrote that: asserting authorship using the Mendeley API</title><content type='html'>Inspired by the forthcoming &lt;a href="http://hack4knowledge.eventbrite.com/"&gt;Hack4Knowledge&lt;/a&gt; I've put together a service that enables you to assert that you are the author of a paper using the &lt;a href="http://dev.mendeley.com"&gt;Mendeley API&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you are impatient, give it a try at:  &lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/hack4knowledge/iwrotethat/"&gt;http://iphylo.org/~rpage/hack4knowledge/iwrotethat/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;To use it you need a Mendeley account. When you go to &lt;a href="http://iphylo.org/~rpage/hack4knowledge/iwrotethat/"&gt;I wrote that&lt;/a&gt; you will be asked to connect to your Mendeley account. Once you've done that, enter the DOI or PubMed ID of a paper and, if the paper is in your Mendeley library and flagged as a paper you've authored, you should see something like this:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-XXiqJSJt5MQ/Te8mlV884BI/AAAAAAAAA5s/xcrimRbmzYM/wrote.png?imgmax=800" alt="Wrote" border="0" width="500"  /&gt; &lt;br /&gt;&lt;br /&gt;The site can be a little sluggish as it needs to go through all of your publications one by one until it finds a match.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why?&lt;/b&gt;&lt;br /&gt;Imagine you have a web database that includes publications, and you want people to join your site as users. If they have publications in your database, you'd like your users to be able to say "I'm the author of those papers" or, more generally, the author you have as "Roderic D. M. Page" is me.&lt;br /&gt;&lt;br /&gt;One way to do this would be to enable the users to sign in to your site using Mendeley (see my blog post &lt;a href="http://iphylo.blogspot.com/2010/09/mendeley-connect.html"&gt;Mendeley connect&lt;/a&gt;). Once they've done that, the user could select a publication and say "that's mine". How do we test this assertion? Well, if the user is indeed the author it is likely that they will have added it to their "My Publications" section in their &lt;a href="http://www.mendeley.com/library/"&gt;Mendeley library&lt;/a&gt;. So, we can use the Mendeley API to get a list of the author's publications and see whether the publication they claim is, in fact, one of theirs.&lt;br /&gt;&lt;br /&gt;The inspiration for this came from tools like Google Analytics, where in order to add the tool to your web site you need to convince Google that you own the site. One way to do this is to add some text supplied by Google to the HTML on for site, on the assumption that only you can do this (because it's your site). In the same way, only you can add papers to your Mendeley library. Of course, I'm assuming that Mendeley users are being trustworthy when they and papers to "My Publications" (i.e., they're not claiming authorship on papers they didn't write).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How?&lt;/b&gt;&lt;br /&gt;This hack uses Mendeley's OAuth support (the same technology used by Twitter and Facebook to connect to other sites) to enable you to connect your Mendeley account to the "I wrote that" application (note that my app never sees your account name or password). I use the Mendeley API &lt;a href="http://apidocs.mendeley.com/home/user-specific-methods/user-authored"&gt;user authored&lt;/a&gt; method to get a list of your publications, and &lt;a href="http://apidocs.mendeley.com/home/user-specific-methods/user-library-document-details"&gt;user library document details&lt;/a&gt; to retrieve details of each publication. I then compare the DOI or PMID you supplied with each publication, until I find one that matches. If none matches, then I've no evidence you authored that paper.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Moan&lt;/b&gt;&lt;br /&gt;No post about the Mendeley API would be complete without a moan about the state of the API. Apart from the fact that there is no function to directly find a publication in your library by DOI or PMID (hence I have to look at them all), there is virtually no support for retrieving any details about the user. For example, I wanted to brighten the web page up a little by adding a picture of the Mendeley user once they've logged in. There is no API function for this, nor a function to retrieve an identifier or URL for the user. Hence, in order to get a picture I screen scrape (yes, &lt;i&gt;screen scrape&lt;/i&gt;) the Mendeley web page for the reference to get the URL for the linked author of the paper, then scrape the author's profile page and extract the URL for the image. This is insane. Please, please can we have a better API?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2766547864066269175?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2766547864066269175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2766547864066269175'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/06/i-wrote-that-asserting-authorship-using.html' title='I wrote that: asserting authorship using the Mendeley API'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-XXiqJSJt5MQ/Te8mlV884BI/AAAAAAAAA5s/xcrimRbmzYM/s72-c/wrote.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7542961917739313207</id><published>2011-06-02T14:36:00.003+01:00</published><updated>2011-06-02T14:38:48.638+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='grant'/><category scheme='http://www.blogger.com/atom/ns#' term='Open Science'/><category scheme='http://www.blogger.com/atom/ns#' term='dark taxa'/><title type='text'>Would you give me a grant? An experiment in Open Science</title><content type='html'>I would like to know what you think of a grant proposal I plan to submit to the UK &lt;a href="http://www.nerc.ac.uk/"&gt;Natural Environment Research Council&lt;/a&gt; at the end of the month. The proposal takes the notion of "dark taxa" &lt;a href="http://iphylo.blogspot.com/2011/04/dark-taxa-genbank-in-post-taxonomic.html"&gt;explored in an earlier blog post&lt;/a&gt; and outlines three things I'd like to do:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Quantify the extent of dark taxa (taxa in GenBank that don't have scientific names)&lt;/li&gt;&lt;li&gt;Determine how many dark taxa are genuinely new species (as opposed to taxa that are known to science but simply haven't been labelled with their proper names)&lt;/li&gt;&lt;li&gt;Explore what we can learn about a taxon's biology even if it lacks a scientific name (e.g., the &lt;a href="http://iphylo.blogspot.com/2011/03/visualising-symbiome-hosts-parasites.html"&gt;"symbiome"&lt;/a&gt;)&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Given that I discuss most of my ideas on this blog, and deposit preprints in &lt;a href="http://precedings.nature.com/"&gt;Nature Precedings&lt;/a&gt; before the corresponding manuscript is published, it seems a logical extension to make grant proposals open as well. So you &lt;a href="https://docs.google.com/document/pub?id=1wWfjlFjthH0Q2FdN73fbkaxhDyisVCyUUsKDTavW_aE"&gt;view the proposal on Google Docs&lt;/a&gt;, and you can add comments, if you wish.&lt;br /&gt;&lt;br /&gt;&lt;div style="width:550px;height:500px;"&gt;&lt;iframe src="https://docs.google.com/document/pub?id=1wWfjlFjthH0Q2FdN73fbkaxhDyisVCyUUsKDTavW_aE&amp;amp;embedded=true" width="100%" height="100%"&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Any feedback or suggestions are welcome. Do you think this is fundable? Have I made a good case for the proposed research? Is it interesting, or is it obvious, or has it already been done? Let me know what you think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7542961917739313207?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7542961917739313207'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7542961917739313207'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/06/would-you-give-me-grant-experiment-in.html' title='Would you give me a grant? An experiment in Open Science'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-528516738048843887</id><published>2011-05-26T17:18:00.001+01:00</published><updated>2011-05-26T17:20:14.921+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='PubMed Central'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='Web Hooks'/><category scheme='http://www.blogger.com/atom/ns#' term='replication'/><category scheme='http://www.blogger.com/atom/ns#' term='ZooBank'/><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><category scheme='http://www.blogger.com/atom/ns#' term='CouchDB'/><title type='text'>ZooBank on CouchDB: UUIDs, replication, and embedding the literature in taxonomic databases</title><content type='html'>&lt;img src="http://lh4.ggpht.com/-O7ywE8AEdhU/Td59P6h6zKI/AAAAAAAAA5c/MoyKBhEVeTc/ZooBankBanner.jpg?imgmax=800" alt="ZooBankBanner" border="0" width="200" height="41" style="float:right;" /&gt;Last December I released a web site called &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt;, which was part of my ongoing exploration of how to build a simple yet useful database of taxonomic names. In particular, I want to &lt;a href="http://iphylo.blogspot.com/2010/12/linking-taxonomic-databases-to-primary.html"&gt;link names directly to the primary taxonomic literature&lt;/a&gt;. No longer is it adequate to simply list names, or list names with mangled bibliographic details (I'm looking at you, &lt;a href="http://www.catalogueoflife.org/"&gt;Catalogue of Life&lt;/a&gt;). This is the 21st century, so I expect one click from name to literature, or at the most two (via, say, a DOI). Nothing else will cut it.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh6.ggpht.com/-dEznrgUky5Q/Td59Qat6zbI/AAAAAAAAA5g/-i-w1UQmONI/couchbase.png?imgmax=800" alt="Couchbase" border="0" width="128" height="128" style="float:right;" /&gt;The Australian Faunal Directory (AFD) was an eye opener as it was the first serious use I'd made of CouchDB (now &lt;a href="http://www.couchbase.com/"&gt;CouchBase&lt;/a&gt;). I'd played with &lt;a href="http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html"&gt;replicating and forking data in 2010: Catalogue of Life and CouchDB&lt;/a&gt;, but the AFD project was bigger, and also inspired me to use &lt;a href="http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-screencast.html"&gt;web hooks&lt;/a&gt; to make the database editable. Suddenly this stuff started to look easy: no schema, simple web services, and tiny amounts of code.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;ZooBank&lt;/b&gt;&lt;br /&gt;So then my attention turned to &lt;a href="http://zoobank.org"&gt;ZooBank&lt;/a&gt;, which is "the official registry of Zoological Nomenclature, according to the International Commission on Zoological Nomenclature (ICZN)." ZooBank was proposed by Polaszek et al. (2005) in a short piece in &lt;i&gt;Nature&lt;/i&gt; ("A universal register for animal names", &lt;a href="http://dx.doi.org/10.1038/437477a"&gt;doi:10.1038/437477a&lt;/a&gt;). By providing a registry of names for animals, ultimately it aims to help avoid embarrassing situations such as the example I recount in my paper on &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; (&lt;a href="http://dx.doi.org/10.1186/1471-2105-12-187"&gt;doi:10.1186/1471-2105-12-187&lt;/a&gt;): a recent paper in &lt;i&gt;Nature&lt;/i&gt; published the name &lt;i&gt;Leviathan&lt;/i&gt; for an extinct sperm whale with a giant bite (&lt;a href="http://dx.doi.org/10.1038/nature09067"&gt;doi:10.1038/nature09067&lt;/a&gt;), only for authors to have to publish an erratum with a new name (&lt;a href="http://dx.doi.org/10.1038/nature09381"&gt;doi:10.1038/nature09381&lt;/a&gt;) when it was discovered that &lt;i&gt;Leviathan&lt;/i&gt; had already been used for an extinct mammoth.&lt;br /&gt;&lt;br /&gt;ZooBank is developed and run by &lt;a href="http://hbs.bishopmuseum.org/staff/pylerichard.html"&gt;Rich Pyle&lt;/a&gt;, and has some nice features, such as RDF export (via LSIDs), but like most taxonomic databases it doesn't link directly to the literature. Where are the DOIs? Where are links to BHL? Where is the ability to add these links? And why is it almost entirely about fish? (OK, I know the answer to that one).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;CouchDB&lt;/b&gt;&lt;br /&gt;But the thing which really got me thinking about using CouchDB to create a version of ZooBank was Rich Pyle's vision of having a distributed ZooBank, and his insistence on using ugly &lt;a href="http://en.wikipedia.org/wiki/Universally_unique_identifier"&gt;UUIDs&lt;/a&gt; in ZooBank identifiers (e.g., &lt;a href="http://zoobank.org:80/?uuid=6bbef50e-76b4-42ef-97b1-7029dbcd8257"&gt;urn:lsid:zoobank.org:act:6BBEF50E-76B4-42EF-97B1-7029DBCD8257&lt;/a&gt;). As much as they are ugly, Rich has always argued that they make distributed systems easy because you don't need a centralised system to assign unique identifiers.&lt;br /&gt;&lt;br /&gt;Anybody who has played with CouchDB will know that CouchDB uses UUIDs by default to create identifiers for database documents. It also excels at data synchronisation, and can run on platforms large and small (including mobile such as Android and &lt;a href="http://blog.couchbase.com/mobile-couchbase-iOS-beta"&gt;iOS&lt;/a&gt;). This means a database could be updated on an iPhone or iPad without an Internet connection, then the data could be synchronised with other databases. Indeed, I developed this CouchDB clone of ZooBank on my MacBook, then pointed it at CouchDB running on my server and within minutes had an exact copy of the database running on the server. This ease of replication, together with the joy of schema-less design makes CouchDB seem an obvious fit to ZooBank.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Demo&lt;/b&gt;&lt;br /&gt;You can  &lt;a href="http://iphylo.org/~rpage/zoobank/"&gt;see the ZooBank on CouchDB demo here&lt;/a&gt;. It's not a complete copy of ZooBank, but has most of it. I reuse the UUIDs issued by ZooBank, so that &lt;br /&gt;&lt;br /&gt;&lt;a href="http://zoobank.org:80/?uuid=6bbef50e-76b4-42ef-97b1-7029dbcd8257"&gt;http://zoobank.org:80/?uuid=&lt;b&gt;6bbef50e-76b4-42ef-97b1-7029dbcd8257&lt;/b&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;becomes&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/zoobank/6bbef50e-76b4-42ef-97b1-7029dbcd8257"&gt;http://iphylo.org/~rpage/zoobank/&lt;b&gt;6bbef50e-76b4-42ef-97b1-7029dbcd8257&lt;/b&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As usual it's all a bit crude, but has some nice features, such as links to BHL content with a built in article viewer I wrote for the &lt;a href="http://iphylo.org/~rpage/afd/"&gt;AFD project&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/zoobank/id/27b8aa5e-c8c4-4c1d-bef2-546f6a8479f4"&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/-ie3tvY9_qEw/Td59RBtyM1I/AAAAAAAAA5k/PU1KM7Ejeg8/etheostoma.png?imgmax=800" alt="Etheostoma" border="0" width="400" height="340" /&gt;&lt;/a&gt;&lt;b&gt;What's next?&lt;/b&gt;&lt;br /&gt;At present only a fraction of the ZooBank references have external links, I hope to add more in the next few days, using both automatic scripts and the web hook interface. The search interface needs work, and being that ZooBank is about nomenclature and not taxonomy, it might be useful to add a classification (say from the Catalogue of Life) so that users can navigate around the names (and get a sense of how many are *cough* fish).&lt;br /&gt;&lt;br /&gt;At present to display a reference I do one of four things:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;If reference is in BHL I use my article viewer&lt;/li&gt;&lt;li&gt;If there is a freely available PDF online I display that using Google Docs PDF viewer&lt;/li&gt;&lt;li&gt;If 1 and 2 don't apply, but there is a DOI then I resolve the DOI and display the result in an IFRAME (yuck)&lt;/li&gt;&lt;li&gt;If none of 1-3 apply I display a blank rectangle&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;There are a couple ways we could improve this. The first is to enhance the display of BHL content by making use of the &lt;a href="http://iphylo.blogspot.com/2010/10/towards-interactive-djvu-file-viewer.html"&gt;structure of the source DjVu files&lt;/a&gt;. Another is to make use of the XML now being made available by the journal &lt;i&gt;Zookeys&lt;/i&gt; (see &lt;a href="http://iphylo.blogspot.com/2010/07/zookeys-publishes-articles-of-future.html"&gt;my blog post&lt;/a&gt;, and Pensoft's announcement that &lt;a href="http://www.pensoft.net/news.php?n=56"&gt;&lt;i&gt;ZooKeys&lt;/i&gt; is now being archived by PubMed Central&lt;/a&gt;, complete with taxonomic markup). There are a lot of &lt;a href="http://iphylo.org/~rpage/zoobank/publication_outlet/ZooKeys"&gt;&lt;i&gt;ZooKeys&lt;/i&gt; articles in ZooBank&lt;/a&gt;, so there's a lot of potential for embedding an article viewer that takes &lt;i&gt;Zookeys&lt;/i&gt; XML and redisplays it with taxonomic names and references as clickable links that link to other ZooBank content. That way we approach the point where taxonomic literature becomes a first class citizen of a taxonomic database.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-528516738048843887?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/528516738048843887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/528516738048843887'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-replication.html' title='ZooBank on CouchDB: UUIDs, replication, and embedding the literature in taxonomic databases'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-O7ywE8AEdhU/Td59P6h6zKI/AAAAAAAAA5c/MoyKBhEVeTc/s72-c/ZooBankBanner.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7803592437560259729</id><published>2011-05-26T13:12:00.001+01:00</published><updated>2011-05-26T13:12:28.928+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Quora'/><category scheme='http://www.blogger.com/atom/ns#' term='output'/><category scheme='http://www.blogger.com/atom/ns#' term='citation'/><category scheme='http://www.blogger.com/atom/ns#' term='career suicide'/><category scheme='http://www.blogger.com/atom/ns#' term='impact factor'/><title type='text'>What is the best way to measure academic outputs that aren't publications?</title><content type='html'>My institute is going through various reviews of staff performance and, frankly, I'm feeling somewhat vulnerable given my somewhat unorthodox (at least amongst my colleagues) approach to doing science. I spend way more time writing code, building databases and web sites, and blogging than writing papers and getting grants (although I have been known to do both).&lt;br /&gt;&lt;br /&gt;So the issue becomes, how to demonstrate that coding, building websites, and ranting on my blog is a worthwhile thing to do? Now, I'm happy that what I do has value, but my happiness isn't the issue. It's convincing people who want to see papers in high impact journals and bums on seats in labs that there's other ways to generate scientific output, and that output can have value. I'm also concerned that a simplistic view of what constitutes valid outputs will stifle innovation, just at the time when traditional science publishing is undergoing a revolution.&lt;br /&gt;&lt;br /&gt;So, I posted a question on Quora:&lt;a href="http://www.quora.com/What-is-the-best-way-to-measure-academic-outputs-that-arent-publications"&gt;What is the best way to measure academic outputs that aren't publications?&lt;/a&gt;, where I wrote:&lt;br /&gt;&lt;blockquote&gt;Usually we assess the quality of academic output using measures based on citations, either directly (how many papers have cited the paper?) or indirectly (is the paper published in a journal like Nature or Science that contains papers that on average get lots of citations, i.e. "impact factor"). But what of other outputs, such as web sites, databases, and software? These outputs often require considerable work, and can be widely used. What is the best way to measure those outputs?&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;There have been various approaches to measuring the impact of an article other than using citations, such as the number of article downloads, or the number of times an article has been bookmarked on a site such as Mendeley or CiteULike. But what of the coding, the database development, the web sites, and the blog posts. How can I show that these have value?&lt;br /&gt;&lt;br /&gt;I guess there are two things here. One is the need to be able to compare across outputs, which is tricky (comparing citations across different disciplines is already hard), the other is the need to be able to compare within broadly similar outputs. Here are some quick thoughts:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Web sites&lt;/b&gt;&lt;br /&gt;An obvious approach is to use Google Analytics to harvest information about page views and visitor numbers. The geographic origin of those visitors could be used to make a case for whether the research/data on that site is internationally relevant, although I suspect "internationally relevant" is a somewhat suspect notion. Most academic specialities are narrow, such that the person most interested in your research is likely living in a different country, hence by definition most research will be internationally "relevant". &lt;br /&gt;&lt;br /&gt;The advantage of Google Analytics is that it is widely used, hence you could get comparative data and be able to show that your web site is more (or less) used that another site.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Code&lt;/b&gt;&lt;br /&gt;The value of code is tricky, but tools like &lt;a href="http://www.ohloh.net/"&gt;ohloh&lt;/a&gt; provide estimates of the effort and expense required to generate code for a project. For example, for my &lt;a href="http://code.google.com/p/bioguid/"&gt;bioGUID code repository&lt;/a&gt; (which includes code for &lt;a href="http://bioguid.info"&gt;bioGUID&lt;/a&gt; and &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt;, as well as some third party code) ohloh's &lt;a href="https://www.ohloh.net/p/bioguid/estimated_cost"&gt;estimated cost&lt;/a&gt; is 87 person-years and $US 4,784,203. OK, silly numbers, but at least I can compare these with other projects (Drupal, for example, represents 153 years and $US 8,438,417 of investment).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Comparing across output categories will be challenging, especially as there is no obvious equivalent for citation (one reason why if you develop software or a web site it makes good sense to write a paper describing it, &lt;a href="http://iphylo.blogspot.com/2008/07/how-to-succeed-in-evolutionary-biology.html"&gt;worked for me&lt;/a&gt;). But perhaps download or article access statistics could provide a way to say "my web site is worth &lt;i&gt;x&lt;/i&gt; publications. Note also that I'm not arguing that any of these measures is actually a good thing, just that if I'm going to be measured, and I have some say in how I'm measured, I'd like to suggest something sensible that others might actually buy.&lt;br /&gt;&lt;br /&gt;So, please feel free to comment either here or on &lt;a href="http://www.quora.com/What-is-the-best-way-to-measure-academic-outputs-that-arent-publications"&gt;Quora &lt;/a&gt;. I need to put together some notes to make the case that people like me aren't just sitting drinking coffee, playing loud music, and tweeting without, you know, actually making stuff.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7803592437560259729?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7803592437560259729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7803592437560259729'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/05/what-is-best-way-to-measure-academic.html' title='What is the best way to measure academic outputs that aren&amp;#39;t publications?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-9024993522973653541</id><published>2011-05-25T11:03:00.001+01:00</published><updated>2011-07-28T20:47:13.921+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Open access'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='paywall'/><category scheme='http://www.blogger.com/atom/ns#' term='Top 10'/><title type='text'>The top-ten new species described in 2010 and the failure of taxonomy to embrace Open Access publication</title><content type='html'>Each year the grandly titled &lt;a href="http://species.asu.edu/"&gt;International Institute for Species Exploration&lt;/a&gt; (IISE) publishes list of the top 10 species described in the previous year. &lt;a href="http://species.asu.edu/Top10"&gt;This year's list&lt;/a&gt; is reproduced below, to which I've added the links to the original publications (why do people think still it's OK to omit links to the primary literature when all of these articles are online?).&lt;br /&gt;&lt;br /&gt;The striking thing is that only 2 of the 10 species were described in Open Access publications (and I use that term loosely as as &lt;a href="http://www.arthropod-systematics.de/"&gt;&lt;i&gt;Arthropod Systematics &amp; Phylogeny&lt;/i&gt;&lt;/a&gt; PDFs are freely available, but the licensing isn't clear). Sadly much of our knowledge of the planet's diversity is still locked up behind a &lt;a href="http://en.wikipedia.org/wiki/Paywall"&gt;paywall&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;table cellspacing="0"&gt;&lt;tbody style="font-size:9px"&gt;&lt;tr valign="top"&gt;&lt;th&gt;Species&lt;/th&gt;&lt;th&gt;&lt;/th&gt;&lt;th&gt;Reference&lt;/th&gt;&lt;th&gt;DOI/PDF&lt;/th&gt;&lt;th&gt;Open Access&lt;/th&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TdzT8br9nTI/AAAAAAAAA4w/lLzKt0RuTLg/Caerostris%205.jpg?imgmax=800" alt="Caerostris 5" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Darwin's Bark Spider&lt;/td&gt;&lt;td&gt;Kuntner, M. and I. Agnarsson. 2010. Web gigantism in Darwin's bark spider, a new species from Madagascar (Araneidae: Caerostris). The Journal of Arachnology 38(2):346-356&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1636/B09-113.1"&gt;10.1636/B09-113.1&lt;/a&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TdzT9PClZFI/AAAAAAAAA40/ySvDbBzxv2g/Mycena%202.jpg?imgmax=800" alt="Mycena 2" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Bioluminescent Mushroom&lt;/td&gt;&lt;td&gt;Desjardin, D.E., B.A. Perry, D.J. Lodge, C.V. Stevani, and E. Nagasawa. 2010. Luminescent Mycena: new and noteworthy species. Mycologia 102(2):459-477&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.3852/09-197"&gt;10.3852/09-197&lt;/a&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TdzT9iVYM0I/AAAAAAAAA44/awtlEGvSKxM/Halomonas.jpg?imgmax=800" alt="Halomonas" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Bacterium&lt;/td&gt;&lt;td&gt;Sanchez-Porro, C., B. Kaur, H. Mann and A. Ventosa. 2010. Halomonas titanicae sp. nov., a halophilic bacterium isolated from the RMS Titanic. International Journal of Systematic and Evolutionary Microbiology 60(12):2768-2774&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1099/ijs.0.020628-0"&gt;10.1099/ijs.0.020628-0&lt;/a&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TdzT-ZA_9kI/AAAAAAAAA48/NvYBYl_5wZ4/Varanus.jpg?imgmax=800" alt="Varanus" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Monitor Lizard&lt;/td&gt;&lt;td&gt;Welton, L.J., C.D. Siler, D. Bennett, A. Diesmos, M.R. Duya, R. Dugay, E.L.B. Rico, M. van Weerd and R.M. Brown. 2010. A spectacular new Philippine monitor lizard reveals a hidden biogeographic boundary and a novel flagship species for conservation. Biology Letters 6(5):654-658&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1098/rsbl.2010.0119"&gt;10.1098/rsbl.2010.0119&lt;/a&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TdzT-09kr6I/AAAAAAAAA5A/NY_bEx_lcPo/Glomeremus.jpg?imgmax=800" alt="Glomeremus" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Pollinating cricket&lt;/td&gt;&lt;td&gt;Hugel, S., C. Micheneau, J. Fournel, B.H. Warren, A. Gauvin-Bialecki, T. Pailler, M.W. Chase and D. Strasberg. 2010. Glomeremus species from the Mascarene islands (Orthoptera, Gryllacrididae) with the description of the pollinator of an endemic orchid from the island of Réunion. Zootaxa 2545:58-68&lt;/td&gt;&lt;td&gt;&lt;a href="http://www.mapress.com/zootaxa/2010/1/zt02545p068.pdf"&gt;PDF&lt;/a&gt;&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TdzT_e6JxnI/AAAAAAAAA5E/LK-z7HSupDs/Philantomba%202.png?imgmax=800" alt="Philantomba 2" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Duiker&lt;/td&gt;&lt;td&gt;Colyn, M., J. Hulselmans, G. Sonet, P. Oudé, J. de Winter, A. Natta, Z.T. Nagy and E. Verheyen. 2010. Discovery of a new duiker species (Bovidae: Cephalophinae) from the Dahomey Gap, West Africa. Zootaxa 2637:1-30&lt;/td&gt;&lt;td&gt;&lt;a href="http://www.mapress.com/zootaxa/2010/2/zt02637p030.pdf"&gt;PDF&lt;/a&gt;&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top" style="background-color:green;color:white;"&gt;&lt;td&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TdzUAPKOYdI/AAAAAAAAA5I/6Bw-4ekgRTc/Tyrannobdella.jpg?imgmax=800" alt="Tyrannobdella" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Leech&lt;/td&gt;&lt;td&gt;Phillips, A.J., R. Arauco-Brown, A. Oceguera-Figueroa, G.P. Gomez, M. Beltran, Y.-T. Lai and M.E. Siddall. 2010. Tyrannobdella rex n. gen. n. sp. and the evolutionary origins of mucosal leech infestations. PLoS ONE 5(4):e10057&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1371/journal.pone.0010057"&gt;10.1371/journal.pone.0010057&lt;/a&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TdzUAgEtyiI/AAAAAAAAA5M/0nL4uvytSKY/Psathyrella.jpg?imgmax=800" alt="Psathyrella" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Underwater mushroom&lt;/td&gt;&lt;td&gt;Frank, J.L., R.A. Coffan and D. Southworth. 2010. Aquatic gilled mushrooms: Psathyrella fruiting in the Rogue River in southern Oregon. Mycologia 102(1):93-107&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.3852/07-190"&gt;10.3852/07-190&lt;/a&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr  valign="top" style="background-color:green;color:white"&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TdzUBP5M9VI/AAAAAAAAA5Q/blOO2HGZ-ME/Saltoblattella.JPG?imgmax=800" alt="Saltoblattella" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Jumping cockroach&lt;/td&gt;&lt;td&gt;Bohn, H., M. Picker, K.-D. Klass and J. Colville. 2010. A jumping cockroach from South Africa, Saltoblattella montistabularis, gen. nov., spec. nov. (Blattodea: Blattellidae). Arthropod Systematics and Phylogeny 68(1):53-39/td&gt;&lt;td&gt;&lt;a href="http://www.arthropod-systematics.de/ASP_68_1/68_1_Bohn_53-69.pdf"&gt;PDF&lt;/a&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr valign="top"&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TdzUBv-60xI/AAAAAAAAA5U/M5k25VF7meY/Halieutichthys.jpg?imgmax=800" alt="Halieutichthys" border="0" width="101" height="101" style="float:right;" /&gt;&lt;/td&gt;&lt;td&gt;Pancake Batfish&lt;/td&gt;&lt;td&gt;Ho, H.-C., P. Chakrabarty and J.S. Sparks. 2010. Review of the Halieutichthys aculeatus species complex (Lophiiformes: Ogcocephalidae), with descriptions of two new species. Journal of Fish Biology 77(4):841-869&lt;/td&gt;&lt;td&gt;&lt;a href="http://dx.doi.org/10.1111/j.1095-8649.2010.02716.x"&gt;10.1111/j.1095-8649.2010.02716.x&lt;/a&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-9024993522973653541?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/9024993522973653541'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/9024993522973653541'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/05/top-ten-new-species-described-in-2010.html' title='The top-ten new species described in 2010 and the failure of taxonomy to embrace Open Access publication'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_Gct8lVAxKqQ/TdzT8br9nTI/AAAAAAAAA4w/lLzKt0RuTLg/s72-c/Caerostris%205.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5954361719340493769</id><published>2011-05-23T10:56:00.001+01:00</published><updated>2011-05-23T10:56:23.559+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='published'/><category scheme='http://www.blogger.com/atom/ns#' term='Google Scholar'/><category scheme='http://www.blogger.com/atom/ns#' term='BMC Bioinformatics'/><title type='text'>BioStor article published (finally)</title><content type='html'>&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TdovRMjLHrI/AAAAAAAAA4o/eY9LNLtx6Q0/logo.gif?imgmax=800" alt="Logo" border="0" width="130" height="60" style="float:right;padding:8px;" /&gt;My article describing &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; — "Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library" — has finally seen the light of day in &lt;i&gt;BMC Bioinformatics&lt;/i&gt; (&lt;a href="http://dx.doi.org/10.1186/1471-2105-12-187"&gt;doi:10.1186/1471-2105-12-187&lt;/a&gt;, the DOI is not working at the moment, give it a little while to go live, meantime &lt;a href="http://www.biomedcentral.com/1471-2105/12/187"&gt;you can access the article here&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Getting this article published was more work than I expected. There seems to be an inverse correlation between how important I think the work is and how easy it is to get published — the more straightforward I think the article is the more work it is to convince the referees of its merits. Of course, it may be that my judgement of the article's merits influences how much effort I put into making the manuscript as rigorous and clear as possible. And perhaps having a blog has spoiled me, I really struggle with the notion that it takes months to publish a paper, especially as most of the intellectual debate involved (i.e., the refereeing process) is behind closed doors, compared to the open and immediate nature of commentary on a blog post.&lt;br /&gt;&lt;br /&gt;However, despite my frustrations with the referring process, there's no doubt that it did improve the manuscript (you can see the original version at Nature Precedings, &lt;a href="http://hdl.handle.net/10101/npre.2010.4928.1"&gt;hdl:10101/npre.2010.4928.1&lt;/a&gt;). &lt;br /&gt;&lt;br /&gt;With the publication of this article, and last week's &lt;a href="http://twitter.com/rdmpage/status/71261954852929536"&gt;conversation with Anurag Acharya and Darcy Dapra  about getting BioStor indexed by Google Scholar&lt;/a&gt;, it has been a good few days for &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5954361719340493769?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5954361719340493769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5954361719340493769'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/05/biostor-article-published-finally.html' title='BioStor article published (finally)'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_Gct8lVAxKqQ/TdovRMjLHrI/AAAAAAAAA4o/eY9LNLtx6Q0/s72-c/logo.gif?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2418123650000295703</id><published>2011-04-15T16:16:00.001+01:00</published><updated>2011-04-15T16:16:03.513+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='background'/><category scheme='http://www.blogger.com/atom/ns#' term='RTFM'/><category scheme='http://www.blogger.com/atom/ns#' term='DjVu'/><title type='text'>BHL, DjVu, and reading the f*cking manual</title><content type='html'>One of the many biggest challenges I've faced with the &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; project, apart from dealing with messy metadata, has been handling page images. At present I get these from the &lt;a href="http://www.biodiversitylibrary.org"&gt;Biodiversity Heritage Library&lt;/a&gt;. They are big (typically 1 Mb in size), and have the caramel colour of old paper. Nothing fills up a server quicker than thousands of images.&lt;br /&gt;&lt;br /&gt;A while ago started playing with ImageMagick to resize the images, making them smaller, as well as ways to remove the background colour, leaving just black text and lines on white background.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;a href="http://www.flickr.com/photos/85456381@N00/5209041261/" title="Before and after converting BHL image by Roderic Page, on Flickr"&gt;&lt;img src="http://farm5.static.flickr.com/4145/5209041261_b50a80d1c4.jpg" width="425"  alt="Before and after converting BHL image"&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I think this makes the page image clearer, as well as removing the impression that this is some ancient document, rather than a scientific article. Yes, it's the Biodiversity &lt;b&gt;Heritage&lt;/b&gt; Library, but the whole point of the taxonomic literature is that it lasts forever. Why not make it look as fresh as when it was first printed?&lt;br /&gt;&lt;br /&gt;Working out how to best remove the background colour takes some effort, and running ImageMagick on every image that's downloaded starts putting a lot of stress on the poor little Mac Mini that powers BioStor.&lt;br /&gt;&lt;br /&gt;Then there's the issue of having an &lt;a href="http://iphylo.blogspot.com/2010/09/bhl-and-ipad.html"&gt;iPad viewer for BHL&lt;/a&gt;, and making it &lt;a href="http://iphylo.blogspot.com/2010/10/towards-interactive-djvu-file-viewer.html"&gt;interactive&lt;/a&gt;. So, I started looking at the DjVu files generated by the Internet Archive, and thinking whether it would make more sense to download those and extract images from them, rather than go via the BHL API. I'll need the DjVu files for the text layout anyway (see &lt;a href="http://iphylo.blogspot.com/2010/10/towards-interactive-djvu-file-viewer.html"&gt;Towards an interactive DjVu file viewer for the BHL&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;I couldn't remember the command to extract images from DjVu, but I did remember that Google is my friend, which led me to this question on Stack Overflow: &lt;a href="http://stackoverflow.com/questions/4516901/using-the-djvu-tools-to-for-background-foreground-seperation"&gt;Using the DjVu tools to for background / foreground seperation?&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;OMG! DjVu tools can remove the background? A quick &lt;a href="http://djvu.sourceforge.net/doc/man/ddjvu.html"&gt;look at the documentation&lt;/a&gt; confirmed it. So I did a quick test. The page on the left is the default page image, the page on the right was extracted using &lt;code&gt;ddjvu&lt;/code&gt; with the option &lt;code&gt;-mode=foreground&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TaheW7IW6SI/AAAAAAAAA4g/dQDbK5-qEnc/507.png?imgmax=800" alt="507.png" border="0" width="425" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Much, much nicer. But why didn't I know this? Why did I waste time playing with ImageMagick when it's a trivial option in a DjVu tool? And why does BHL serve the discoloured page images when it could serve crisp, clean versions?&lt;br /&gt;&lt;br /&gt;So, I felt like an idiot. But the other good thing that's come out of this is that I've taken a closer look at the Internet Archive's BHL-related content, and I'm beginning to think that perhaps the more efficient way to build something like BioStor is not through downloading BHL data and using their API, but by going directly to the Internet Archive and downloading the DjVu and associated files. Maybe it's time to rethink everything about how BioStor is built...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2418123650000295703?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2418123650000295703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2418123650000295703'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/04/bhl-djvu-and-reading-fcking-manual.html' title='BHL, DjVu, and reading the f*cking manual'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm5.static.flickr.com/4145/5209041261_b50a80d1c4_t.jpg' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7053013459869384357</id><published>2011-04-12T14:06:00.001+01:00</published><updated>2011-04-12T18:10:24.309+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Genbank'/><category scheme='http://www.blogger.com/atom/ns#' term='post-taxonomic'/><category scheme='http://www.blogger.com/atom/ns#' term='NCBI'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='dark taxa'/><category scheme='http://www.blogger.com/atom/ns#' term='DNA barcoding'/><title type='text'>Dark taxa: GenBank in a post-taxonomic world</title><content type='html'>In an earlier post (&lt;a href="http://iphylo.blogspot.com/2010/10/are-names-really-key-to-big-new-biology.html"&gt;Are names really the key to the big new biology?&lt;/a&gt;, I questioned Patterson et al.'s assertion in a recent TREE article (&lt;a href="http://dx.doi.org/10.1016/j.tree.2010.09.004"&gt;doi:10.1016/j.tree.2010.09.004&lt;/a&gt;) that names are key to the new biology.&lt;br /&gt;&lt;br /&gt;In this post I'm going to revisit this idea by doing a quick analysis of how many species in GenBank have "proper" scientific names, and whether the number of named species has changed over time. My definition of "proper" name is a little loose: anything that had two words, second one starting with a lower case letter, was treated as a proper name. hence, a name like &lt;a href="http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=867837"&gt;Eptesicus sp. A JLE-2010"&lt;/a&gt; is not a proper name, but &lt;i&gt;Eptesicus andersoni&lt;/i&gt; is.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Mammals&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Since GenBank started, every year has seen some 100-200 mammal species added to the database. &lt;br /&gt;&lt;img src="https://spreadsheets.google.com/oimg?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;oid=12&amp;zx=tkb6ncmtkrum" width="425" /&gt;&lt;br /&gt;&lt;br /&gt;Until around 2003 almost all of these species had proper binomial names, but since then an increasing percentage of species-level taxa haven't been identified to species. In 2010 three-quarters of new tax_ids for mammals weren't identified.&lt;br /&gt;&lt;br /&gt;&lt;img src="https://spreadsheets.google.com/oimg?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;oid=11&amp;zx=twmn209jqqky" width="425"  /&gt;&lt;b&gt;Invertebrates&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;For "invertebrates" 2010 saw an explosive growth in the number of new taxa sequenced, with nearly 71,000 new taxa added to GenBank.&lt;br /&gt;&lt;br /&gt;&lt;img src="https://spreadsheets.google.com/oimg?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;oid=6&amp;zx=yli94apaylpb" /&gt;&lt;br /&gt;&lt;br /&gt;This coincides with a spectacular drop in the number of properly-named taxa, but even before 2010 the proportion of named invertebrate species in GenBank was in decline: in 2009 just over a half of the species added had binomials.&lt;br /&gt;&lt;br /&gt;&lt;img src="https://spreadsheets.google.com/oimg?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;oid=5&amp;zx=7hm3uo18odsj" /&gt;&lt;b&gt;Bacteria&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To put this in perspective, here are the equivalent graphs for bacteria. &lt;br /&gt;Although at the outset most of the bacteria in GenBank had binomial names, pretty quickly the bulk of sequenced bacteria had informal names. In 2010 less than 1% of newly sequenced bacteria had been formerly described.&lt;br /&gt;&lt;br /&gt;&lt;img src="https://spreadsheets.google.com/oimg?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;oid=8&amp;zx=lopfb2yyqhoq" /&gt;&lt;img src="https://spreadsheets.google.com/oimg?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;oid=7&amp;zx=3mstujfg79vp" /&gt;&lt;b&gt;Dark taxa&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;For bacteria the graphs are hardly surprising. To get a proper name a bacterium must be cultured, and the vast majority of bacteria haven't been (or can't be) cultured. Hence, microbiologists can gloat at the nomenclatural mess plant and animal taxonomists have to deal with only because microbiologists have a tiny number of names to deal with. &lt;br /&gt;&lt;br /&gt;For mammals and invertebrates there's clear a decline in the use of proper names.It would be tempting to suggest that this reflects a decline in the number of taxonomists - there might simply not be enough of them in enough groups to be able to identify and/or describe the taxa being sequenced.&lt;br /&gt;&lt;br /&gt;However, if we look at the recent peaks of unnamed animal species, we discover that many have names like &lt;a href="http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=736493"&gt;Lepidoptera sp. BOLD:AAD7075&lt;/a&gt;, indicating that they are DNA Barcodes from the &lt;a href="http://www.boldsystems.org/"&gt;Barcode of Life Data Systems&lt;/a&gt;. Of the 62,365 unnamed invertebrates added last year, 54,546 are BOLD sequences that haven't been assigned to a known species. Of the 277 unnamed mammals, 218 are BOLD taxa. Hence, DNA bnacording is flooding Genbank with taxa that lack proper names (and typically are represented by a single DNA bnacode sequence).&lt;br /&gt;&lt;br /&gt;There are various ways to interpret these graphs, but for me the message is clear. The bulk of newly added taxa in GenBank are what we might term "dark taxa", that is, taxa that aren't identified to a known species. This doesn't necessarily mean that they are species new to science, we may already have encountered these species before, they may be sitting in museum collections, and have descriptions already published. We simply don't know. As the output from DNA barcoding grows, the number of dark taxa will only increase, and macroscopic biology starts to look a lot like microbiology.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A post-taxonomic world&lt;/b&gt;&lt;br /&gt;If we look at the graphs for bacteria, we see that taxonomic names are virtually irrelevant, and yet microbiology seems to be doing fine as a discipline. So, perhaps it's time to think about a post-taxonomic world where taxonomic names, &lt;i&gt;contra&lt;/i&gt; Patterson et al., are not that important. We can discover a good deal about organismal biology from GenBank alone (see my post &lt;a href="http://iphylo.blogspot.com/2011/03/visualising-symbiome-hosts-parasites.html"&gt;Visualising the symbiome: hosts, parasites, and the Tree of Life&lt;/a&gt; for some examples, as well as Rougerie et al. 2010 &lt;a href="http://dx.doi.org/10.1111/j.1365-294X.2010.04918.x"&gt;doi:10.1111/j.1365-294X.2010.04918.x&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;This leaves us with two questions:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;How much biology can we do without taxonomic names?&lt;/li&gt;&lt;li&gt;If the lack of taxonomic names limits what we can do (and, playing devil's advocate, this is an open question) how can we speed up linking GenBank sequences to names?&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;I suspect that the answer to (1) is "quite a lot" (especially if we think like microbiologists). Question (2) is ultimately a question about how fast we can link literature, museum collections, sequences, and phylogenies. If progress to date is any indication, we need to rethink how we do this, and in a hurry, because dark taxa are accumulating at an accelerating rate.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How the analyses were done&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Although the NCBI makes a dump of its taxonomic database available via FTP (at &lt;a href="ftp://ftp.ncbi.nih.gov/pub/taxonomy/"&gt;ftp://ftp.ncbi.nih.gov/pub/taxonomy/&lt;/a&gt;), this dump doesn't have dates for when the taxa were added to the database. However, using the &lt;a href="http://eutils.ncbi.nlm.nih.gov/"&gt;Entrez EUtilities&lt;/a&gt; we can get the tax_ids that were published within a given date range. For example, to retrieve all the tax_ids added to the database in December 2010, we set the URL parameters &lt;code&gt;&amp;mindate=2010/12/01&lt;/code&gt; and &lt;code&gt;&amp;maxdate=2010-12-31&lt;/code&gt; to form this URL:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&amp;mindate=2010/12/01&amp;maxdate=2010/12/31&amp;retmax=1000000"&gt;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&amp;mindate=2010/12/01&amp;maxdate=2010/12/31&amp;retmax=1000000&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I've set &lt;code&gt;&amp;retmax&lt;/code&gt; to a big number to ensure I get all the tax_ids for that month (in this case 23511). I then made a local copy of the NCBI database in MySQL ( &lt;a href="http://linnaeus.zoology.gla.ac.uk/~rpage/tbmap/downloads/ncbi/"&gt;instructions here&lt;/a&gt;) and queried for all species-level taxa in GenBank. I used a rather crude regular expression &lt;code&gt;REGEXP '^[A-Z][a-z]+ [a-z][a-z]+$'&lt;/code&gt; to find just those species names that were likely to be proper scientific names (i.e., no "sp.", "aff.", museum or voucher codes, etc.). To group the species into major taxonomic groups I used the &lt;code&gt;division_id&lt;/code&gt;. &lt;br /&gt;&lt;br /&gt;Results are available in a &lt;a href="https://spreadsheets.google.com/ccc?key=0AuPC5KKdhYCQdEFuaWJ3LW96TUM3QXQ3RnpObEp2T1E&amp;hl=en&amp;authkey=COz7oqsC"&gt;Google Spreadsheet&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7053013459869384357?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7053013459869384357'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7053013459869384357'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/04/dark-taxa-genbank-in-post-taxonomic.html' title='Dark taxa: GenBank in a post-taxonomic world'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3890123895569655787</id><published>2011-04-01T17:03:00.001+01:00</published><updated>2011-04-01T17:03:13.528+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data'/><category scheme='http://www.blogger.com/atom/ns#' term='Dryad'/><category scheme='http://www.blogger.com/atom/ns#' term='citation'/><title type='text'>Data matters but do data sets?</title><content type='html'>Interest in archiving data and data publication is growing, as evidenced by projects such as &lt;a href="http://datadryad.org/"&gt;Dryad&lt;/a&gt;, and earlier tools such as &lt;a href="http://www.reebase.org"&gt;TreeBASE&lt;/a&gt;. But I can't help wondering whether this is a little misguided. I think the issues are granularity and reuse.&lt;br /&gt;&lt;br /&gt;Taking the second issue first, how much re-use do data sets get? I suspect the answer is "not much". I think there are two clear use cases, repeatability of a study, and benchmarks. Repeatability is a worthy goal, but difficult to achieve given the complexity of many analyses and the constant problem of "bit rot" as software becomes harder to run the older it gets. Furthermore, despite the growing availability of cheap cloud computing, it simply may not be feasible to repeat some analyses.&lt;br /&gt;&lt;br /&gt;Methodological fields often rely on benchmarks to evaluate new methods, and this is an obvious case where a dataset may get reused ("I ran my new method on your dataset, and my method is the business — yours, not so much").&lt;br /&gt;&lt;br /&gt;But I suspect the real issue here is granularity. Take DNA sequences, for example. New studies rarely reuse (or cite) previous data sets, such as a TreeBASE alignment or a GenBank Popset. Instead they cite individual sequences by accession number. I think in part this is because the rate of accumulation of new sequences is so great that any subsequent study would needs to add these new sequences to be taken seriously. Similarly, in taxonomic work the citable data unit is often a single museum specimen, rather than a data set made up of specimens.&lt;br /&gt;&lt;br /&gt;To me, citing data sets makes almost as much sense as citing journal volumes - the level of granularity is wrong. Journal volumes are largely arbitrary collections of articles, it's the articles that are the typical unit of citation. Likewise I think sequences will be cited more often than alignments.&lt;br /&gt;&lt;br /&gt;It might be argued that there are disciplines where the dataset is the sensible unit, such as an ecological study of a particular species. Such a data set may lack obvious subsets, and hence it makes sense to be cited as a unit. But my expectation here is that such datasets will see limited re-use, for the very reason that they can't be easily partitioned and mashed up. Data sets, such as alignments, are built from smaller, reusable units of data (i.e., sequences) can be recombined, trimmed, or merged, and hence can be readily re-used. Monolithic datasets with largely unique content can't be easily mashed up with other data.&lt;br /&gt;&lt;br /&gt;Hence, my suspicion is that many data sets in digital archives will gather digital dust, and anyone submitting a data set in the expectation that it will be cited may turn out to be disappointed.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3890123895569655787?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3890123895569655787'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3890123895569655787'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/04/data-matters-but-do-data-sets.html' title='Data matters but do data sets?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3605439083064691876</id><published>2011-04-01T11:18:00.001+01:00</published><updated>2011-04-01T11:18:10.960+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='Web Hooks'/><title type='text'>Mendeley and Web Hooks</title><content type='html'>Quick, poorly thought out idea. I've argued &lt;a href="http://iphylo.blogspot.com/2010/10/mendeley-bhl-and-of-life.html"&gt;before&lt;/a&gt; that Mendeley seems the obvious tool to build a "bibliography of life." It has pretty much all the features we need: nice editing tools, support for DOIs, PubMed identifiers, social networking, etc.&lt;br /&gt;&lt;br /&gt;But there's one thing it lacks. There's not an easy way to transmit updates from Mendeley to another database. There are RSS feeds for groups, such as &lt;a href="http://www.mendeley.com/groups/729951/museum-type-catalogues/feed/rss/"&gt;this one&lt;/a&gt; for the "Museum Type Catalogues" group, but that just lists recently added articles. What if I edit an article, say by correcting the authorship, or adding a DOI? How can I get those edits into databases downstream?&lt;br /&gt;&lt;br /&gt;One way would be if Mendeley provided RSS feeds &lt;b&gt;for each article&lt;/b&gt;, and these feeds would list the edits made to that article. But polling thousands of individual RSS feeds would be a hassle. Perhaps we could have a user-level RSS feed of edits made? &lt;br /&gt;&lt;br /&gt;But another way to do this would be with web hooks, which I &lt;a href="http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-making-databases.html"&gt;explored earlier&lt;/a&gt; in connection with updating literature within a taxonomic database. The idea is as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I have a taxonomic database that contains literature. It also has a web hook where I can tell the database that a record has been edited elsewhere.&lt;/li&gt;&lt;li&gt;I edit my Mendeley library using the desktop client.&lt;/li&gt;&lt;li&gt;When I've finished all the edits I've made (e.g., DOIs added, etc.), the web hook is automatically called and the taxonomic database notified of the edits.&lt;/li&gt;&lt;li&gt;The taxonomic database processes the edits, and if it accepts them it updates its own records&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Several things are needed to make this work. We need to be able to talk about the same record in the taxonomic database and in Mendeley, which means either the database stores the Mendeley identifier, or visa versa, or both. We also need a way to find all the recent edits made in Mendeley. Given that the Mendeley database is stored locally as a &lt;a href="http://www.sqlite.org/"&gt;SQLite database&lt;/a&gt;, one simple hack would be to write a script that was called at a set time, determined which records had been changed (records in the Mendeley SQLite database are timestamped) and send those to the web hook. If we're clever, we may even be able to automate this by calling the script when Mendeley quicks (depending on how scriptable the operating system and application are).&lt;br /&gt;&lt;br /&gt;Of course, what would be even better is if the Mendeley application had this feature built in. You supply one or more web hook URLs that Mendeley will call, say after any edits have been synchronised with your Mendeley database in the cloud. More and more I think we need to focus on how we join all these tools and databases together, and web hooks look like being the obvious candidate.&lt;br /&gt; &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3605439083064691876?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3605439083064691876'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3605439083064691876'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/04/mendeley-and-web-hooks.html' title='Mendeley and Web Hooks'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8371682142960095781</id><published>2011-03-31T21:27:00.001+01:00</published><updated>2011-03-31T21:27:26.337+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='PLoS'/><category scheme='http://www.blogger.com/atom/ns#' term='published'/><category scheme='http://www.blogger.com/atom/ns#' term='PLoS Currents'/><category scheme='http://www.blogger.com/atom/ns#' term='NCBI'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><title type='text'>Paper on NCBI and Wikipedia published in PLoS Currents: Tree of Life</title><content type='html'>&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TZTf0VHfdsI/AAAAAAAAA4Y/Eh2-sZqWCcw/__logo__1.png?imgmax=800" alt="__logo__1.jpg" border="0" width="322" height="37" align="right" /&gt;&lt;br /&gt;My paper describing the mapping between NCBI and Wikipedia has been published in &lt;a href="http://knol.google.com/k/plos/plos-currents-tree-of-life/28qm4w0q65e4w/46"&gt;PLoS Currents: Tree of Life&lt;/a&gt;. You can see the paper &lt;a href="http://knol.google.com/k/roderic-d-m-page/linking-ncbi-to-wikipedia-a-wiki-based/16h5bb3g3ntlu/2"&gt;here&lt;/a&gt;. It's only just gone live, so it's yet to get a PubMed Central number (one of the nice features of PLoS Currents is that the articles get archived in PMC).&lt;br /&gt;&lt;br /&gt;Publishing in PLoS Currents: Tree of Life was a pleasant experience. The Google Knol editing environment was easy to use, and the reviewing process quick. It's obviously a new and rather experimental journal, and there are a few things that could be improved. Automatically looking up articles by PubMed identifier is nice, but it would also be great to do this for DOIs as well. Furthermore, the PubMed identifiers aren't displayed as clickable links, which rather defeats the point of having references on the web (I've added DOI links to the articles wherever possible). But, minor grumbles aside, as a way to get an Open Access article published for free, and have it archived in PubMed Central, PLoS Currents is hard to beat. What will be interesting is whether the article receives any comments. This seems to be one area online journals haven't really cracked — providing an environment where people want to engage in discussion.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8371682142960095781?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8371682142960095781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8371682142960095781'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/paper-on-ncbi-and-wikipedia-published.html' title='Paper on NCBI and Wikipedia published in PLoS Currents: Tree of Life'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_Gct8lVAxKqQ/TZTf0VHfdsI/AAAAAAAAA4Y/Eh2-sZqWCcw/s72-c/__logo__1.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3133819846024597702</id><published>2011-03-28T21:41:00.001+01:00</published><updated>2011-03-28T21:41:58.966+01:00</updated><title type='text'>Linking the NCBI taxonomy to BBC Wildlife Finder</title><content type='html'>&lt;br /&gt;&lt;br /&gt;&lt;a href='http://photo.blogpressapp.com/show_photo.php?p=11/03/28/2225.jpg'&gt;&lt;img src='http://photo.blogpressapp.com/photos/11/03/28/s_2225.jpg' border='0' width='200' height='200' align='right' style='margin:5px'&gt;&lt;/a&gt;&lt;br /&gt;A few weeks ago I spent some time mapping pages from the &lt;a target="_blank" href="http://www.bbc.co.uk/wildlifefinder/"&gt;BBC Wildlife Finder&lt;/a&gt; to the equivalent taxa in the NCBI taxonomy. This seemed a useful exercise because the Wildlife Finder pages have some wonderful picture, video, and audio content, as well as  other nice features, such as reusing Wikipedia page titles as "slugs" in the BBC page URLs. For example, the Wikipedia page for the Yacare Caiman (&lt;i&gt;Caiman yacare&lt;/i&gt;) has the URL &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Yacare_Caiman"&gt;http://en.wikipedia.org/wiki/Yacare_Caiman&lt;/a&gt;, and the BBC page has the URL &lt;a target="_blank" href="http://www.bbc.co.uk/nature/life/Yacare_Caiman"&gt;http://www.bbc.co.uk/nature/life/Yacare_Caiman&lt;/a&gt;. Both share the slug &lt;b&gt;Yacare_Caiman&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;After adding these links to &lt;a target="_blank" href="http://iphylo.org/linkout"&gt;iphylo.org/linkout&lt;/a&gt;, where you can find them listed on the &lt;a target="_blank" href="http://iphylo.org/linkout/Category:BBC"&gt;BBC category page&lt;/a&gt;, I've finally uploaded these to the NCBI, so now some 504 NCBI taxon pages have links to high quality multimedia from the BBC.&lt;br /&gt;&lt;br /&gt;- Posted using BlogPress from my iPad&lt;br /&gt;&lt;p class='blogpress_location'&gt;Location:&lt;a href='http://maps.google.com/maps?q=Schmiedestra%C3%9Fe,Wetter,Germany%4051.400881%2C7.342640&amp;z=10'&gt;Schmiedestraße,Wetter,Germany&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3133819846024597702?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3133819846024597702'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3133819846024597702'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/linking-ncbi-taxonomy-to-bbc-wildlife.html' title='Linking the NCBI taxonomy to BBC Wildlife Finder'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7411889768089725966</id><published>2011-03-25T17:19:00.001Z</published><updated>2011-03-26T08:00:45.004Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='symbiome'/><title type='text'>Fun things about crustaceans</title><content type='html'>One side effect of playing with ways to visualise and integrate biology databases is that you stumble across the weird and wonderful stuff that living organisms get up to. My earliest papers were on crustacean taxonomy, so I thought I'd try my &lt;a href="http://iphylo.blogspot.com/2011/03/visualising-symbiome-hosts-parasites.html"&gt;latest toy&lt;/a&gt; on them.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What lives on crustaceans?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The "symbiome" graph for crustacea shows a range of associations, including marine bacteria (&lt;i&gt;Vibrio&lt;/i&gt;), fungi (microsporidians), and other organisms, including other crustacea (crustaceans are at the top of the circle, I'll work on labelling these diagrams a little better).&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TYzOn-0sAeI/AAAAAAAAA4A/wryA6JiYuwE/crusthost.png?imgmax=800" alt="Crusthost" border="0" width="400" height="316" /&gt;&lt;b&gt;What do crustaceans live on?&lt;/b&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TYzOm0kt2EI/AAAAAAAAA38/kh6Gx8EkEeA/crustpara.png?imgmax=800" alt="Crustpara" border="0" width="400"  /&gt;&lt;br /&gt;&lt;br /&gt;Crustacea (in addition to parasitising other crustacea) parasitise several vertebrates groups, including fish and whales. But they also occur in terrestrial vertebrates. For example, sequence &lt;a href="http://www.ncbi.nlm.nih.gov/nuccore/EF583871"&gt;EF583871&lt;/a&gt; is from the pentastomid worm &lt;a href="http://en.wikipedia.org/wiki/Porocephalus_crotali"&gt;&lt;i&gt;Porocephalus crotali&lt;/i&gt;&lt;/a&gt; from a dog. When people think of terrestrial crustacea they usually don't think of parasites. There's also a prominent line from crustaceans to what turns out to be corals, representing coral-living barnacles.&lt;br /&gt;&lt;br /&gt;It's instructive to compare this with insects, which similarly parasitise vertebrates. The striking difference is the association between insects and flowering plants.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TYzOoTV31zI/AAAAAAAAA4E/Xcsl4KJiGXI/insect.png?imgmax=800" alt="Insect" border="0" width="400" height="318" /&gt;&lt;br /&gt;&lt;br /&gt;I guess these really need to be made interactive, so we could click on them and discover more about the association represented by each line in the diagram.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7411889768089725966?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7411889768089725966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7411889768089725966'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/fun-things-about-crustaceans.html' title='Fun things about crustaceans'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_Gct8lVAxKqQ/TYzOn-0sAeI/AAAAAAAAA4A/wryA6JiYuwE/s72-c/crusthost.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-5557080787765414996</id><published>2011-03-25T14:44:00.001Z</published><updated>2011-03-25T14:47:44.300Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Genbank'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='symbiome'/><category scheme='http://www.blogger.com/atom/ns#' term='parasites'/><category scheme='http://www.blogger.com/atom/ns#' term='data mining'/><category scheme='http://www.blogger.com/atom/ns#' term='New Category'/><title type='text'>Visualising the symbiome: hosts, parasites, and the Tree of Life</title><content type='html'>Back in 2006 in a short post entitled &lt;a href="http://ispecies.blogspot.com/2006/03/building-encyclopedia-of-life.html"&gt;"Building the encyclopedia of life"&lt;/a&gt; I wrote that GenBank is a potentially rich source of information on host-parasite relationships. Often sequences of parasites will include information on the name of the host (the example I used was sequence &lt;a href="http://www.ncbi.nlm.nih.gov/nucleotide/7108724"&gt;AF131710&lt;/a&gt; from the platyhelminth &lt;i&gt;Ligophorus mugilinus&lt;/i&gt;, which records the host as the Flathead mullet &lt;a href="http://en.wikipedia.org/wiki/Flathead_mullet"&gt;&lt;i&gt;Mugil cephalus&lt;/i&gt;&lt;/a&gt;). &lt;br /&gt;&lt;br /&gt;I've always wanted to explore this idea a bit more, and have finally made a start, in part inspired by the recent &lt;a href="http://vizbi.org/"&gt;VIZBI 2011&lt;/a&gt; meeting. I've grabbed a large chunk of GenBank, mined the sequences for host records, and created some simple visualisations of what I'm terming (with tongue firmly in cheek) the "symbiome". Jonathan Eisen &lt;a href="http://phylogenomics.blogspot.com/search?q=bad+omics+word"&gt;will not be happy&lt;/a&gt;, but I need a word that describes the complete set of hosts, mutualists, symbionts with which an organism is associated, and "symbiome" seems appropriate.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Human symbiome&lt;/b&gt;&lt;br /&gt;To illustrate the idea, below is the human "symbiome". This diagram shows all the taxa in GenBank arranged in a circle, with lines connecting those organisms that have DNA sequences where humans are recorded as their host. &lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TYyqOWcwrpI/AAAAAAAAA3k/VzhEulakY1U/human.png?imgmax=800" alt="Human" border="0" width="400"  /&gt;&lt;br /&gt;&lt;br /&gt;At a glance, we have a lot of bacteria (the gray bar with &lt;b&gt;&lt;i&gt;E. coli&lt;/i&gt;&lt;/b&gt;) and fungi (blue bar with &lt;b&gt;Yeast&lt;/b&gt;), and a few nematodes and arthropods.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fig tree symbiome&lt;/b&gt;&lt;br /&gt;Next up are organisms collected from fig trees (genus &lt;i&gt;Ficus&lt;/i&gt;).&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TYyqPRPPYeI/AAAAAAAAA3o/--9egoxhRC8/ficus.png?imgmax=800" alt="Ficus" border="0" width="400" /&gt;&lt;br /&gt;Fig trees have &lt;a href="http://en.wikipedia.org/wiki/Fig_wasp"&gt;wasp pollinators&lt;/a&gt; (the dark line landing near the honey bee &lt;i&gt;Apis&lt;/i&gt;), as well as nematodes (dark line landing near &lt;i&gt;Caenorhabditis elegans&lt;/i&gt;). There are also some associations with fungi and other arthropods.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Which taxa host insects?&lt;/b&gt;&lt;br /&gt;Next up is a plot of all associations involving insects and a host.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TYyqP20T2QI/AAAAAAAAA3s/FZtKjqnnf9U/insect.png?imgmax=800" alt="Insect" border="0" width="400" /&gt;&lt;br /&gt;The diagram is dominated by insect-flowering plant interactions, followed by insect-vertebrate associations (most likely bird and mammal lice).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Which taxa are hosted by insects?&lt;/b&gt;&lt;br /&gt;We can reverse the question and ask what organisms are hosted by insects:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TYyqQuTPt0I/AAAAAAAAA3w/ptmNCkC8EIk/insectashost.png?imgmax=800" alt="Insectashost" border="0" width="400" /&gt;&lt;br /&gt;Lots of associations between insects and fungi, as well as bacteria, and a few other organisms, such as nematodes, and &lt;i&gt;Plasmodium&lt;/i&gt; (the organism which causes malaria).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Frog symbiome&lt;/b&gt;&lt;br /&gt;Lastly, below is the symbiome of frogs. "Worms" feature prominently, as well as the fungus that causes &lt;a href="http://en.wikipedia.org/wiki/Chytridiomycosis"&gt;chytridiomycosis&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TYyqRNF7-2I/AAAAAAAAA30/Me3HoYOL6C8/frog.png?imgmax=800" alt="Frog" border="0" width="400" height="324" /&gt;&lt;b&gt;How the visualisation was made&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The symbiome visualisations were made as follows. Firstly DNA sequences were downloaded from EMBL and run through a script that extracted as much metadata as possible, including the contents of the &lt;code&gt;host&lt;/code&gt; field (where present). I then took the NCBI taxonomy and generated an ordered list of taxa by walking the tree in postorder, which determines where on the circumference of the circle the taxon lies. Pairs of taxa in an association are connected by a quadratic Bezier curve. The illustration was created using SVG.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Next steps&lt;/b&gt;&lt;br /&gt;There are several ways this visualisation could be improved. It's based only only a subset of data (I haven't run all of the sequence databases though the parser yet), and the matching of host taxa is based on exact string matching. All manner of weird and wonderful things get entered in the &lt;code&gt;host&lt;/code&gt; field, so we'll need some more sophisticated parsing (see "LINNAEUS: A species name identification system for biomedical literature" &lt;a href=http://dx.doi.org/10.1186/1471-2105-11-85"&gt;doi:10.1186/1471-2105-11-85&lt;/a&gt; for a more general discussion of this issue).&lt;br /&gt;&lt;br /&gt;The visualisation is fairly crude at this stage. Circle plots like this are fairly simple to create, and pop up in all sorts of situations (e.g., RNA secondary structure methods, which &lt;a href="http://taxonomy.zoology.gla.ac.uk/rod/circles/"&gt;I did some work on years ago&lt;/a&gt;). Of course, &lt;a href="http://mkweb.bcgsc.ca/circos/"&gt;Circos&lt;/a&gt; would be an obvious tool to use to create the visualisations, but the overhead of installing it and learning how to use it meant I took a shortcut and wrote some SVG from scratch.&lt;br /&gt;&lt;br /&gt;Although I've focussed on GenBank as a source of data, this visualisation could also be applied to other data. I briefly touched on this in &lt;a href="http://iphylo.blogspot.com/2009/11/tag-trees-displaying-taxonomy-of-names.html"&gt;Tag trees: displaying the taxonomy of names in BHL&lt;/a&gt; where a &lt;a href="http://www.biodiversitylibrary.org/page/2298491"&gt;page in the Biodiversity Heritage Library&lt;/a&gt; contains the names of a  flea and it's mammalian hosts. I think these circle plots would be a great way to highlight possible ecological associations mentioned in a text.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-5557080787765414996?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5557080787765414996'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/5557080787765414996'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/visualising-symbiome-hosts-parasites.html' title='Visualising the symbiome: hosts, parasites, and the Tree of Life'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_Gct8lVAxKqQ/TYyqOWcwrpI/AAAAAAAAA3k/VzhEulakY1U/s72-c/human.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7343108517487495547</id><published>2011-03-24T13:34:00.001Z</published><updated>2011-03-24T13:34:13.294Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='wiki'/><category scheme='http://www.blogger.com/atom/ns#' term='NCBI'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='mapping'/><category scheme='http://www.blogger.com/atom/ns#' term='TreeBASE'/><title type='text'>TreeBASE meets NCBI, again</title><content type='html'>Déjà vu is a scary thing. &lt;a href="http://iphylo.blogspot.com/2007/02/treebase-name-mapping.html"&gt;Four years ago&lt;/a&gt; I released a mapping between names in TreeBASE and other databases called &lt;a href="http://darwin.zoology.gla.ac.uk/~rpage/tbmap/"&gt;TBMap&lt;/a&gt; (described here: &lt;a href="http://dx.doi.org/10.1186/1471-2105-8-158"&gt;doi:10.1186/1471-2105-8-158&lt;/a&gt;). Today I find myself releasing yet another mapping, as part of my &lt;a href="http://iphylo.org/linkout"&gt;NCBI to Wikipedia&lt;/a&gt; project. By embedding the mapping in a wiki, it can be edited, so the kinds of problems I encountered with TbMap, recounted &lt;a href="http://iphylo.blogspot.com/2006/11/homonyms-and-ubios-data-model-yet-more.html"&gt;here&lt;/a&gt;, &lt;a href="http://iphylo.blogspot.com/2007/01/joys-of-mapping-names-in-treebase.html"&gt;here&lt;/a&gt;, and &lt;a href="http://iphylo.blogspot.com/2008/02/tbmap-errors.html"&gt;here&lt;/a&gt;. The mapping in and of itself isn't terribly exciting, but it's the starting point for some things I want to do regarding how to visualise the data in TreeBASE. &lt;br /&gt;&lt;br /&gt;Because TreeBASE 2 has issued new identifiers for its taxa (see &lt;a href="http://iphylo.blogspot.com/2010/05/treebase-ii-makes-me-pull-my-hair-out.html"&gt;TreeBASE II makes me pull my hair out&lt;/a&gt;), and now contains its own mapping to the NCBI taxonomy, as a first pass I've taken their mapping and added it to &lt;a href="http://iphylo.org/linkout"&gt;http://iphylo.org/linkout&lt;/a&gt;. I've also added some obvious mappings that TreeBASE has missed. There are a lot more taxa which could be added, but this is a start.&lt;br /&gt;&lt;br /&gt;The TreeBASE taxa that have a mapping each get their own page with a URL of the form &lt;code&gt;http://iphylo.org/linkout/&amp;lt;TreeBase taxon identifier&amp;gt;&lt;/code&gt;, e.g. &lt;a href="http://iphylo.org/linkout/TB2:Tl257333"&gt; http://iphylo.org/linkout/TB2:Tl257333&lt;/a&gt;. This page simply gives the name of the taxon in TreeBASE and the corresponding NCBI taxon id. It uses a Semantic Mediawiki template to generate a statement that the TreeBASE and and NCBI taxa are a "close match". If you go to the corresponding page in the wiki for the NCBI taxon (e.g., &lt;a href="http://iphylo.org/linkout/Ncbi:448631"&gt;http://iphylo.org/linkout/Ncbi:448631&lt;/a&gt;) you will see any corresponding TreeBASE taxa listed there. If a mapping is erroneous, we simply need to edit the TreeBASE taxon page in the wiki to fix it. Nice and simple. &lt;br /&gt;&lt;br /&gt;At the time of writing the initial mapping is still being loaded (this can take a while). I'll update this post when the uploading has finished.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7343108517487495547?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7343108517487495547'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7343108517487495547'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/treebase-meets-ncbi-again.html' title='TreeBASE meets NCBI, again'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3182758991749180354</id><published>2011-03-21T17:04:00.001Z</published><updated>2011-03-23T10:17:22.911Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='twitter'/><category scheme='http://www.blogger.com/atom/ns#' term='vizbi'/><title type='text'>Some VIZBI 2011 links</title><content type='html'>Given that the &lt;a href="https://twitter.com/#!/search/vizbi"&gt;Twitter stream tagged #vizbi&lt;/a&gt; will fade away soon, I've grabbed most of the links I tweeted during &lt;a href="http://vizbi.org/2011/"&gt;VIZBI 2011&lt;/a&gt; and have put them here. This isn't intended as a comprehensive list, merely the things which caught my eye, and didn't flash by faster than I could tweet.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Books&lt;/b&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.amazon.com/Visual-Complexity-Mapping-Patterns-Information/dp/1568989369" target="_new"&gt;Visual Complexity: Mapping Patterns of Information&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.amazon.com/Visual-Thinking-Kaufmann-Interactive-Technologies/dp/0123708966" target="_new"&gt;Visual Thinking for Design&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Movies&lt;/b&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.wehi.edu.au/education/wehitv/molecular_visualisations_of_dna/" target="_new"&gt;DNA&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://molecularmovies.com/movies/berry_malaria_p1.m4v" target="_new"&gt;Malaria lifecycle Part 1&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://molecularmovies.com/movies/berry_malaria_p2.m4v" target="_new"&gt;Malaria lifecycle Part 2&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.wehi.edu.au/education/wehitv/" target="_new"&gt;WEHI.TV &lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;People&lt;/b&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.macfound.org/fellows/2010/berry" target="_new"&gt;Drew Berry&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.visualcomplexity.com/vc/contact.cfm" target="_new"&gt;Manuel Lima&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.cs.ubc.ca/~tmm/" target="_new"&gt;Tamara Munzner&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://fernandaviegas.com/" target="_new"&gt;Fernada Viégas&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.bewitched.com/" target="_new"&gt;Martin Wattenberg&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Software and websites&lt;/b&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://arena3d.org/" target="_new"&gt;Arena 3D&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.bioblender.net/" target="_new"&gt;BioBlender&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.mquter.qut.edu.au/bio/blastatlas_default.aspx" target="_new"&gt;BLAST Atlas&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.pathogenomics.ca/cerebral/" target="_new"&gt;Cerebral v.2.0&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.research.ibm.com/visual/projects/chromogram.html" target="_new"&gt;Chromograms&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://clovr.org/" target="_new"&gt;CloVR&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.cytoscape.org/" target="_new"&gt;Cytoscape&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.emouseatlas.org/emage/home.php" target="_new"&gt;EMAGE&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://epmv.scripps.edu/" target="_new"&gt;embedded Python Molecular Viewer (ePMV)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.bewitched.com/fleshmap.html" target="_new"&gt;Fleshmap&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://genomeview.org/" target="_new"&gt;GenomeView&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.research.ibm.com/visual/projects/history_flow/" target="_new"&gt;History Flow&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://mkweb.bcgsc.ca/linnet/" target="_new"&gt;HivePlots&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.sci.utah.edu/cibc/software/41-imagevis3d.html" target="_new"&gt;ImageVis3D&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.profilegrid.org"&gt;JProfileGrid&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.many-eyes.com/" target="_new"&gt;Many Eyes&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.molecularmovies.com/toolkit/" target="_new"&gt;Molecular Maya&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.molecularmovies.com/" target="_new"&gt;Molecular Movies&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.pathblast.org/" target="_new"&gt;PathBLAST&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://portnoy.iplantcollaborative.org/" target="_new"&gt;Phyloviewer&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.virtualpathology.leeds.ac.uk/research/HCI/Powerwall/virtual_reality_powerwall.php" target="_new"&gt;Powerwall&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dev.gramene.org/mockups/dna.html" target="_new"&gt;Quartz Composition of DNA&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://chmille4.github.com/Scribl/" target="_new"&gt;Scribl&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://pages.cs.wisc.edu/~dalbers/" target="_new"&gt;Sequence Surveyor&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.bewitched.com/song.html" target="_new"&gt;Shape of Song&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://sybil.sourceforge.net/" target="_new"&gt;Sybil&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://topiaryexplorer.sourceforge.net/" target="_new"&gt;Topiary Explorer&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://caltech.wormbase.org/virtualworm/" target="_new"&gt;Vrtual Worm&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://hint.fm/seer/" target="_new"&gt;Web Seer&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://wholebraincatalog.org/" target="_new"&gt;Whole Brain Catalog&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3182758991749180354?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3182758991749180354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3182758991749180354'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/some-vizbi-2011-links.html' title='Some VIZBI 2011 links'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-1312871620015503802</id><published>2011-03-20T11:55:00.001Z</published><updated>2011-03-20T12:28:22.297Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='Broad Institute'/><category scheme='http://www.blogger.com/atom/ns#' term='twitter'/><category scheme='http://www.blogger.com/atom/ns#' term='vizbi'/><title type='text'>VIZBI 2011</title><content type='html'>&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TYXl-sf0kfI/AAAAAAAAA3c/RFYlQhw8q68/broad.jpg?imgmax=800" alt="broad.jpg" border="0" width="128"  align="right" /&gt;&lt;br /&gt;I've spent the last three days at VIZBI, a &lt;a href="http://vizbi.org/2011/"&gt;Workshop on Visualizing Biological Data&lt;/a&gt;, held at the &lt;a href="http://www.broadinstitute.org/"&gt;Broad Institute&lt;/a&gt; in Boston (note that "Broad" rhymes with "Code"). A great conference in a special venue that includes the &lt;a href="http://bang.clearscience.info/?p=663"&gt;DNAtrium&lt;/a&gt;. Videos of the talks will be online "real soon now", look for the keynotes, which were full of great ideas and visualisations. To get a flavour of the meeting search for the hashtag &lt;a href="https://twitter.com/#!/search/vizbi"&gt;#vizbi&lt;/a&gt; on Twitter (you can also see the tweet stream on the &lt;a href="http://vizbi.org/2011/"&gt;VIZBI home page&lt;/a&gt;). All the keynotes were great, but I personally found Tamara Munzer's the most enlightening. She drew on lots of research in visual perception to outline what works and what doesn't when presenting information visually. You can &lt;a href="http://www.cs.ubc.ca/~tmm/talks/vizbi11/vizbi11.pdf"&gt;grab a PDF of her presentation here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One aspect of the meeting which worked really well was the poster presentations. Poster sessions were held during coffee breaks, and after the last talk of the session but before the audience broke for coffee, each author of a poster got 90 seconds to introduce their poster (there were typically around 10 posters per break). This meant the poster authors got a chance to introduce themselves and their work to the workshop audience, and the audience could discover what posters were being displayed. Neat idea.&lt;br /&gt;&lt;br /&gt;I gave a presentation on phylogenies, which I've put on &lt;a href="http://www.slideshare.net/rdmpage/phylogeny-vizbi-2011"&gt;slideshare&lt;/a&gt;. After explaining that I thought phylogeny visualisation was mostly a solved problem (as evidenced by the large number of tree viewers available), I continued the theme of &lt;a href="http://iphylo.blogspot.com/2011/02/why-3d-phylogeny-viewers-don-work.html"&gt;why I don't think 3D works for phylogeny&lt;/a&gt; (except for geophylogenies), made the pitch for &lt;a href="http://iphylo.blogspot.com/2010/01/why-i-want-ipad.html"&gt;building a phylogeny viewer on the iPad&lt;/a&gt;, and finished with my recent work on &lt;a href="http://iphylo.blogspot.com/2011/02/live-demo-of-zooming-large-tree.html"&gt;Google Maps-style viewing very large trees&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div style="width:425px" id="__ss_7307963"&gt; &lt;strong style="display:block;margin:12px 0 4px"&gt;&lt;a href="http://www.slideshare.net/rdmpage/phylogeny-vizbi-2011" title="Phylogeny VIZBI 2011"&gt;Phylogeny VIZBI 2011&lt;/a&gt;&lt;/strong&gt; &lt;object id="__sse7307963" width="425" height="355"&gt; &lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=vizbi2001-rdmp-110318104339-phpapp02&amp;stripped_title=phylogeny-vizbi-2011&amp;userName=rdmpage" /&gt; &lt;param name="allowFullScreen" value="true"/&gt; &lt;param name="allowScriptAccess" value="always"/&gt; &lt;embed name="__sse7307963" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=vizbi2001-rdmp-110318104339-phpapp02&amp;stripped_title=phylogeny-vizbi-2011&amp;userName=rdmpage" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt; &lt;/object&gt; &lt;div style="padding:5px 0 12px"&gt; View more &lt;a href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/rdmpage"&gt;Roderic Page&lt;/a&gt; &lt;/div&gt; &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-1312871620015503802?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1312871620015503802'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1312871620015503802'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/vizbi-2011.html' title='VIZBI 2011'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_Gct8lVAxKqQ/TYXl-sf0kfI/AAAAAAAAA3c/RFYlQhw8q68/s72-c/broad.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-154516535926939864</id><published>2011-03-11T13:28:00.001Z</published><updated>2011-03-11T13:30:38.638Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='map'/><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='zoom'/><category scheme='http://www.blogger.com/atom/ns#' term='tree'/><title type='text'>Geography and genes: zoomable view of frog NCBI classification with linked map</title><content type='html'>More zoom viewer experiments (see &lt;a href="http://iphylo.blogspot.com/2011/03/zooming-large-tree-now-with-thumbnails.html"&gt;previous post&lt;/a&gt;), this time with a linked map that updates as you browse the tree (SVG-capable browser required). As you browse the frog classification the map updates to show the location of georeferenced sequences in GenBank from the taxa in the part of the tree you are looking at. The map is limited to not more than 200 localities, and many frog sequences aren't georeferenced, but it's a fun way to combine classification and geography. You can try it at:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/deeptree/7.html"&gt;http://iphylo.org/~rpage/deeptree/7.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;or watch the video:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe src="http://player.vimeo.com/video/20917959?color=ffffff" width="400" height="225" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/20917959"&gt;Zoomable tree with linked map&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-154516535926939864?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/154516535926939864'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/154516535926939864'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/geography-and-genes-zoomable-view-of.html' title='Geography and genes: zoomable view of frog NCBI classification with linked map'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-284914176445635184</id><published>2011-03-08T10:57:00.001Z</published><updated>2011-03-08T10:57:57.063Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='Challenge'/><title type='text'>The Mendeley API Binary Battle - win $US 10,001</title><content type='html'>Now we'll &lt;a href="http://iphylo.blogspot.com/2010/08/mendeley-api-we-bring-awesome-if-you.html"&gt;bring the awesome&lt;/a&gt;. Mendeley have announced &lt;a href="http://dev.mendeley.com/api-binary-battle"&gt;The Mendeley API Binary Battle&lt;/a&gt;, with a first prize of $US 10,0001, and some very high-profile judges (Juan Enriquez, Tim O'Reilly, James Powell, Werner Vogels, and John Wilbanks). Deadline for submission is August 31st 2011, with the results announced in October. &lt;br /&gt;&lt;br /&gt;The criterion for judging are:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;How active is your application? We’ll look at your API key usage.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;How viral is the app? We’ll look at the number of sign ups on Mendeley and/or your application, and we’ll also have an eye on Twitter.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Does the application increase collaboration and/or transparency? We’ll look at how much your application contributes to making science more open.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;How cool is your app? Does it make our jaws drop? Is it the most fun that you can have with your pants on? Is it making use of Facebook, Twitter, etc.?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The Binary Battle is open to apps built previous to this announcement.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;Start your engines...&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-284914176445635184?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/284914176445635184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/284914176445635184'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/mendeley-api-binary-battle-win-us-10001.html' title='The Mendeley API Binary Battle - win $US 10,001'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4210269746981441489</id><published>2011-03-07T12:55:00.001Z</published><updated>2011-03-07T12:55:55.289Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='Nomenclator Zoologicus'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='names'/><category scheme='http://www.blogger.com/atom/ns#' term='microcitations'/><category scheme='http://www.blogger.com/atom/ns#' term='uBio'/><title type='text'>Nomenclator Zoologicus meets Biodiversity Heritage Library: linking names directly to literature</title><content type='html'>Following on from my &lt;a href="http://iphylo.blogspot.com/2011/03/microcitations-linking-nomenclators-to.html"&gt;previous post on microcitations&lt;/a&gt; I've blasted all the citations in &lt;a href="http://uio.mbl.edu/NomenclatorZoologicus/"&gt;Nomenclator Zoologicus&lt;/a&gt; through my microcitation service and created a simple web site where these results can be browsed.&lt;br /&gt;&lt;br /&gt;The web site is here: &lt;a href="http://iphylo.org/~rpage/nz/"&gt;http://iphylo.org/~rpage/nz/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To create it I've taken a file dump of Nomenclator Zoologicus provided by Dave Remsen and run all the citations through the microcitation service, storing the results in a simple database. You can search by genus name, author and year, or publication. The search is pretty crude, and in the case of publications can be a bit hit and miss. Citations in Nomenclator Zoologicus are stored as strings, so I've used some crude rules to try and extract the publication name from the rest of the details (such as page numbering).&lt;br /&gt;&lt;br /&gt;To get started, you can look at names published by published by &lt;a href="http://iphylo.org/~rpage/nz/?mode=author&amp;q=Distant+1910"&gt;Distant in 1910&lt;/a&gt;, which you can see below:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TXTV0yz94VI/AAAAAAAAA2w/ZAlncFRsIGA/nz1.png?imgmax=800" alt="Nz1" border="0" width="400" height="260" /&gt;&lt;br /&gt;&lt;br /&gt;If the citation has been found you can click on the &lt;img src="http://iphylo.org/~rpage/nz/images/picture_empty.png" align="absmiddle"/&gt; icon to view the page in a popup, like this:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TXTV2Bo5OiI/AAAAAAAAA20/ArPmd7QcubY/nz2.png?imgmax=800" alt="Nz2" border="0" width="400" height="295" /&gt;&lt;br /&gt;&lt;br /&gt;You can also click on the page number to be taken to that page in BHL.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I've also added some other links, such as to the name in the &lt;a href="http://www.organismnames.com"&gt;Index to Organism Names&lt;/a&gt;, as well as bibliographic identifiers such as DOIs, Handles, and links to JSTOR and CiNii.&lt;br /&gt;&lt;br /&gt;So far only 10% of Nomenclator Zoologicus records have a match in BHL, which is slightly depressing. Browsing through there are some obvious gaps where my parser clearly failed, typically where multiple pages are included in the citation, or the citation has some additional comments. These could be fixed. There are also cases where the OCR text is so mangled that a match has been rejected because the genus name and text were too different.&lt;br /&gt;&lt;br /&gt;This has been hastily assembled, but it's one vision of a simple service where we can go from genus name to being able to see the original publication of that name. There are other things we could do with this mapping, such as enabling BHL to tell users that the reference they are looking at is the original source of a particular name, and enabling services that use BHL content (such as EOL and Atlas of Living Australia to flag which reference in BHL is the one that matters in terms of nomenclature.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4210269746981441489?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4210269746981441489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4210269746981441489'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/nomenclator-zoologicus-meets.html' title='Nomenclator Zoologicus meets Biodiversity Heritage Library: linking names directly to literature'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_Gct8lVAxKqQ/TXTV0yz94VI/AAAAAAAAA2w/ZAlncFRsIGA/s72-c/nz1.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6539433207877587725</id><published>2011-03-03T21:19:00.001Z</published><updated>2011-03-03T21:24:22.662Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Æ'/><category scheme='http://www.blogger.com/atom/ns#' term='nomenclators'/><category scheme='http://www.blogger.com/atom/ns#' term='OCR'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='microcitations'/><title type='text'>Microcitations: linking nomenclators to BHL</title><content type='html'>One of the challenges of linking databases of taxonomic names to the primary literature is the minimal citation style used by nomenclators (see my earlier post &lt;a href="http://iphylo.blogspot.com/2009/05/nomenclators-digitised-literature-fail.html"&gt;Nomenclators + digitised literature = fail&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;For example, consider Nomenclator Zoologicus. Volumes 1-10 of this list of generic names in zoology were digitised in 2004 and &lt;a href="http://uio.mbl.edu/NomenclatorZoologicus/"&gt;put online by uBio&lt;/a&gt; (for more details of this project see &lt;b&gt;Taxonomic informatics tools for the electronic Nomenclator Zoologicus&lt;/b&gt;, &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/16501061"&gt;pmid:16501061&lt;/a&gt;). In Nomenclator Zoologicus the &lt;a href="http://www.ubio.org/NZ/detail.php?uid=41&amp;d=1"&gt;citation for the genus &lt;i&gt;Abana&lt;/i&gt;&lt;/a&gt; is:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;Ann. Mag. nat. Hist., (8) 2, 72.&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;The challenge is to link this short citation to the digital version of the corresponding article. I've been sitting on a copy of the digitised Nomenclator Zoologicus kindly provided by Dave Remsen, and I've finally started to look at the problem of mining it for links to databases such as &lt;a href="http://www.biodiversitylibrary.org"&gt;BHL&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;You can see the first attempt at &lt;a href="http://biostor.org/microcitation.php"&gt;http://biostor.org/microcitation.php&lt;/a&gt;. This form takes a genus name and the short citation and attempts to locate the corresponding page in BHL. It then checks whether the name is present on that page. Locating a page in a journal can be a challenge given the often rather ropey metadata in BHL, but BioStor uses a combination of fuzzy string matching and crude kludges to find the best match. But a further complication is that OCR errors may mean the taxonomic name we are looking for might not be detected on the page. &lt;br /&gt;&lt;br /&gt;For example,  if we &lt;a href="http://tinyurl.com/6b8926a"&gt;search for the citation&lt;/a&gt; for the genus &lt;i&gt;Aethriscus&lt;/i&gt;, &lt;code&gt;Ann. Mag. nat. Hist., (7) 10, 329.&lt;/code&gt; we find two candidate pages in the journal &lt;i&gt;Ann. Mag. nat. Hist&lt;/i&gt;, but neither contains the string "Aethriscus". However, if we use approximate string matching we find the OCR text for one page has the string "thriscus". This differs by only two characters from "Aethriscus", and so is a possible match (shown in orange).&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TXACfe2i8dI/AAAAAAAAA2k/NyVo_Ksn97c/2.png?imgmax=800" alt="2.png" border="0" width="400" height="527" /&gt;&lt;/div&gt;&lt;br /&gt;Looking at the &lt;a href="http://biodiversitylibrary.org/page/19339134"&gt;scanned page&lt;/a&gt; we can see the likely source of the problem:&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TXADZvla2aI/AAAAAAAAA2o/bVPWbUPP2cE/3.png?imgmax=800" alt="3.png" border="0" width="290" height="38" /&gt;&lt;/div&gt;&lt;br /&gt;In the original publication the name Aethriscus  was written as &lt;span style="font-variant: small-caps;"&gt;Æthriscus&lt;/span&gt;. The ligature &lt;a href="http://en.wikipedia.org/wiki/Æ"&gt;Æ&lt;/a&gt; has been corrupted by the OCR engine, and in Nomenclator Zoologicus the name is written without the ligature, hence the failure to exactly match the name with the text. These are some of the challenges faced when trying to close the circle and link names to literature.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://biostor.org/microcitation.php"&gt;microcitation parser&lt;/a&gt; is still pretty crude, but usable. You can get results in either HTML or JSON, so the task of mapping microcitations to BHL pages can be automated. At present the name matching assumes you are looking at a single word (e.g., a genus), I need to extend it to handle binomials.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6539433207877587725?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6539433207877587725'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6539433207877587725'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/microcitations-linking-nomenclators-to.html' title='Microcitations: linking nomenclators to BHL'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_Gct8lVAxKqQ/TXACfe2i8dI/AAAAAAAAA2k/NyVo_Ksn97c/s72-c/2.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7207058537312230314</id><published>2011-03-03T20:22:00.001Z</published><updated>2011-03-03T20:27:34.345Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='twitter'/><title type='text'>BioStor updates on Twitter</title><content type='html'>&lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; has had a Twitter account &lt;a href="http://twitter.com/biostor_org"&gt;@biostor_org&lt;/a&gt; for a while, but it's not been active. I finally got around to hooking it up to BioStor, so that now every time an article is added to BioStor, the title of that article and it's URL appears in the &lt;a href="http://twitter.com/biostor_org"&gt;@biostor_org&lt;/a&gt; Twitter feed.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;script src="http://widgets.twimg.com/j/2/widget.js"&gt;&lt;/script&gt;&lt;script&gt;new TWTR.Widget({  version: 2,  type: 'search',  search: 'biostor_org',  interval: 6000,  title: '',  subject: 'BioStor on Twitter',  width: 250,  height: 300,  theme: {    shell: {      background: '#8ec1da',      color: '#ffffff'    },    tweets: {      background: '#ffffff',      color: '#444444',      links: '#1985b5'    }  },  features: {    scrollbar: false,    loop: true,    live: true,    hashtags: true,    timestamp: true,    avatars: true,    toptweets: true,    behavior: 'default'  }}).render().start();&lt;/script&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Activity on this feed will be variable, depending on whether articles are being added manually, or in bulk. But it's a handy way to keep tabs on the growing number of articles being harvested from the &lt;a href="http://www.biodiversitylibrary.org"&gt;Biodiversity Heritage Library&lt;/a&gt;.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7207058537312230314?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7207058537312230314'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7207058537312230314'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/biostor-updates-on-twitter.html' title='BioStor updates on Twitter'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6290039141244814731</id><published>2011-03-01T18:35:00.001Z</published><updated>2011-03-01T18:35:04.404Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='zoom'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='tree'/><title type='text'>Zooming a large tree, now with thumbnails</title><content type='html'>Continuing experiments with a zoom viewer for large trees (see &lt;a href="http://iphylo.blogspot.com/2011/02/live-demo-of-zooming-large-tree.html"&gt;previous post&lt;/a&gt;), I've now made a demo where the labels are clickable. If the NCBI taxon has an equivalent page in Wikipedia the demo displays and link to that page (and, if present, a thumbnail image). Give it a try at&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/deeptree/3.html"&gt;http://iphylo.org/~rpage/deeptree/3.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;or watch the short video clip below:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe src="http://player.vimeo.com/video/20525164" width="400" height="300" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/20525164"&gt;Zoomable viewer with Wikipedia thumbnails&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6290039141244814731?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6290039141244814731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6290039141244814731'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/zooming-large-tree-now-with-thumbnails.html' title='Zooming a large tree, now with thumbnails'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7246546416526331985</id><published>2011-03-01T07:57:00.001Z</published><updated>2011-03-01T07:58:46.752Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenURL'/><title type='text'>Mendeley, OpenURL, BioStor, and BHL</title><content type='html'>Mendeley has added a feature which makes it easier to use Mendeley with repositories such as &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; and &lt;a href="http://www.biodiversitylibrary.org"&gt;BHL&lt;/a&gt;. As announced in &lt;a href="http://www.mendeley.com/blog/academic-features/get-full-text-mendeley-now-works-with-your-local-library-via-openurl/"&gt;Get Full Text: Mendeley now works with your local library via OpenURL&lt;/a&gt;, you can now add OpenURL resolvers to your Mendeley account: &lt;br /&gt;&lt;blockquote&gt;We’ve added a button to the catalog pages that will allow you to get the article from your library right in Mendeley. This feature will link you directly to the full text copy according to your institutional access rights.&lt;/blockquote&gt;Ironically, in the UK access to electronic articles from a University is pretty seamless via the &lt;a href="http://www.ukfederation.org.uk/"&gt;UK Access Management Federation&lt;/a&gt;, so I don't need to add an OpenURL resolver to get full text for an article. But this new feature does enable another way to access to articles in my &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; repository. By adding the &lt;a href="http://biostor.org/openurl"&gt;BioStor OpenURL&lt;/a&gt; to your Mendeley account, you can search for articles from your Mendeley library in BioStor.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.mendeley.com/blog/academic-features/get-full-text-mendeley-now-works-with-your-local-library-via-openurl/"&gt;Mendeley blog post&lt;/a&gt; explains how to set up an OpenURL resolver. Go to your Mendeley account and click on the &lt;b&gt;My Account&lt;/b&gt; button in the upper right corner of then page, then select &lt;b&gt;Account Details&lt;/b&gt;, then the &lt;b&gt;Sharing/Importing&lt;/b&gt; tab, or just click &lt;a href="https://www.mendeley.com/account/import/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TWye6X8X3fI/AAAAAAAAA2I/FPpotuxLs_w/openurl_settings.jpg?imgmax=800" alt="openurl_settings.jpg" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;Click on &lt;b&gt;Add library manually&lt;/b&gt;, then enter the name of the resolver (e.g., "BioStor") and the URL &lt;code&gt;http://biostor.org/openurl&lt;/code&gt;:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TWyiUrjZrjI/AAAAAAAAA2U/Waf2qOFNXGo/Snapshot%202011-03-01%2007-37-20.png?imgmax=800" alt="Snapshot 2011-03-01 07-37-20.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;If you view a reference in Mendeley, you will now see something like this:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TWyjD4BlQCI/AAAAAAAAA2Y/BayAzAx80aQ/Snapshot%202011-03-01%2007-40-04.png?imgmax=800" alt="Snapshot 2011-03-01 07-40-04.png" border="0" width="263" height="161" /&gt;&lt;/div&gt;&lt;br /&gt;In addition to the DOI and the URL, &lt;a href="http://www.mendeley.com/research/description-austrochaperina-new-genus-engystomatidae-north-australia/"&gt;this reference&lt;/a&gt; now displays a &lt;b&gt;Find this paper at&lt;/b&gt; menu. Clicking on it shows the default services, together with any OpenURL resolvers you've added (in this case, BioStor):&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TWyjkT4PsiI/AAAAAAAAA2c/3d5ng_YKZIs/Snapshot%202011-03-01%2007-42-50.png?imgmax=800" alt="Snapshot 2011-03-01 07-42-50.png" border="0" width="309" height="308" /&gt;&lt;/div&gt;&lt;br /&gt;You can add multiple resolvers, so we could add the &lt;a href="http://www.biodiversitylibrary.org/openurlhelp.aspx"&gt;BHL OpenURL resolver&lt;/a&gt; &lt;code&gt;http://www.biodiversitylibrary.org/openurl&lt;/code&gt;, although finding articles isn't BHL OpenURL resolver's strong point.&lt;br /&gt;&lt;br /&gt;Now, what would be very handy is if Mendeley were to complete the circle by providing their own OpenURL resolver, so that people could find articles in Mendeley from metadata such as article title, journal, volume, and starting page. The &lt;a href="http://dev.mendeley.com/"&gt;Mendeley API&lt;/a&gt; might be a way to implement this, although its search features lack the granularity needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7246546416526331985?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7246546416526331985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7246546416526331985'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/03/mendeley-openurl-biostor-and-bhl.html' title='Mendeley, OpenURL, BioStor, and BHL'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_Gct8lVAxKqQ/TWye6X8X3fI/AAAAAAAAA2I/FPpotuxLs_w/s72-c/openurl_settings.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2286786526533372899</id><published>2011-02-28T18:06:00.001Z</published><updated>2011-02-28T18:06:13.292Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='visulaisation'/><category scheme='http://www.blogger.com/atom/ns#' term='trees'/><category scheme='http://www.blogger.com/atom/ns#' term='deep zoom'/><title type='text'>Live demo of zooming a large tree</title><content type='html'>After the teaser on Friday (see &lt;a href="http://iphylo.blogspot.com/2011/02/deep-zooming-large-2d-tree.html"&gt;Deep zooming a large 2D tree&lt;/a&gt;) I've put a live demo of my experiments with viewing a large tree online at:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/deeptree/"&gt;http://iphylo.org/~rpage/deeptree/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The first example (&lt;a href="http://iphylo.org/~rpage/deeptree/1.html"&gt;Experiment 1&lt;/a&gt;) is the NCBI classification for frogs: &lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe src="http://player.vimeo.com/video/20477986" width="400" height="300" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/20477986"&gt;Simple deep tree viewer&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;br /&gt;This version displays internal node labels, leaf labels (as many as can be displayed at a given zoom level), and works in Safari, Firefox, and Internet Explorer 8. Obviously this is all pretty rough, but &lt;a href="http://iphylo.org/~rpage/deeptree/1.html"&gt;take it for a spin&lt;/a&gt;, I'd welcome any feedback. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2286786526533372899?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2286786526533372899'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2286786526533372899'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/live-demo-of-zooming-large-tree.html' title='Live demo of zooming a large tree'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-623728045593215867</id><published>2011-02-25T18:47:00.001Z</published><updated>2011-02-25T18:47:16.066Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='tiles'/><category scheme='http://www.blogger.com/atom/ns#' term='zoomify'/><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='zoom'/><category scheme='http://www.blogger.com/atom/ns#' term='deep zoom'/><category scheme='http://www.blogger.com/atom/ns#' term='screencast'/><category scheme='http://www.blogger.com/atom/ns#' term='tree'/><category scheme='http://www.blogger.com/atom/ns#' term='Google Maps'/><title type='text'>Deep zooming a large 2D tree</title><content type='html'>Here's a quick demo of a 2D large tree viewer that I'm working on. The aim is to provide a simple way to view and navigate very large trees (such as the NCBI classification) in a web browser using just HTML and Javascript. At the moment this is simply a viewer, but the goal is to add the ability to show "tracks" like a genome browser. For example, you could imagine columns appearing to the right of the tree showing you whether there are phylogenies available for these taxa in TreeBASE, images from Wikipedia, sparklines for sequencing activity over time, etc. I'll blog some more on the implementation details when I get the chance, but it's pretty straightforward. Image tiles are generated from SVG images of tree using ImageMagick, labelling is applied on the fly using GIS-style queries to a MySQL database that holds the "world coordinates" of the nodes in the tree (see &lt;a href="http://code.google.com/apis/maps/documentation/javascript/maptypes.html#WorldCoordinates"&gt;discussion of world coordinates on Google's Map API pages&lt;/a&gt;), and the zooming and tile fetching is based on Michal Migurski's &lt;a href="http://mike.teczno.com/giant/pan/"&gt;Giant-Ass Image Viewer&lt;/a&gt;. Once I've tidied up a few things I'll put up a live demo so people can play with it.&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe src="http://player.vimeo.com/video/20379533" width="400" height="300" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/20379533"&gt;Deep tree zooming&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-623728045593215867?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/623728045593215867'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/623728045593215867'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/deep-zooming-large-2d-tree.html' title='Deep zooming a large 2D tree'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8822195509102808382</id><published>2011-02-24T11:48:00.001Z</published><updated>2011-02-24T11:52:50.901Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='3D'/><category scheme='http://www.blogger.com/atom/ns#' term='imagination'/><category scheme='http://www.blogger.com/atom/ns#' term='phylogeny'/><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='trees'/><category scheme='http://www.blogger.com/atom/ns#' term='interface'/><title type='text'>Why 3D phylogeny viewers don't work</title><content type='html'>Matt Yoder (&lt;a href="http://twitter.com/mjyoder"&gt;@mjyoder&lt;/a&gt; had a Twitter conversation yesterday about phylogeny viewers, prompted by my tweeting about my latest displacement activity, a 2D tree browser using the tiling approach made popular by Google Maps.&lt;br /&gt;&lt;br /&gt;As part of that conversation, &lt;a href="http://twitter.com/mjyoder/status/40438156381794304"&gt;Matt tweeted&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;RT @rdmpage: @mjyoder - I think 3D is the worse thing we could do, there's no natural mapping to 3D. &lt;- meh, where's the imagination?&lt;/blockquote&gt;&lt;br /&gt;Well, Matt's imagination has gone into overdrive, and he's &lt;a href="http://cyphy.blogspot.com/2011/02/you-are-in-maze-of-twisty-little.html"&gt;blogged about his ideas&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TWZCOrRP_iI/AAAAAAAAA18/vWgbdJNT0ec/3d_tree_browsing.jpg?imgmax=800" alt="3d_tree_browsing.jpg" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This issue deserves more exploration, but here are some quick thoughts. 3D has been used in a number of phylogeny browsers, such as Mike Sanderson's &lt;a href="http://loco.biosci.arizona.edu/paloverde/paloverde.html"&gt;Paloverde&lt;/a&gt;, &lt;a href="http://dx.doi.org/10.1186/1471-2105-5-48"&gt;Walrus&lt;/a&gt;, and the &lt;a href="http://iphylo.blogspot.com/2009/02/thoughts-on-wellcome-interactive-tree.html"&gt;Wellcome Trust's Tree of Life&lt;/a&gt;. I don't find any terribly successful, pretty as they may be. I think there are several problems with trees in general, and 3D versions in particular.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Trees aren't real&lt;/b&gt;&lt;br /&gt;Trees aren't real in the same way that the physical world is (or even imagined physical worlds). Trees are conceptual structures. The history of web interfaces is littered with attempts to visualise conceptual space, for example to summarise search results. These have been failures, a simple top ten list as used by Google wins. I don't think this is because Google's designers lack imagination, it's because it works. Furthermore, this is actually a very successful visualisation:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe title="YouTube video player" width="400" height="330" src="http://www.youtube.com/embed/ijLDxgALc2c" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;I think elaborate attempts to depict conceptual spaces on screens are mostly going to fail.  &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Trees are empty&lt;/b&gt;&lt;br /&gt;Compared to, say, a geographic map, trees are largely empty space. In a map every pixel counts, in that it potentially represents something. Think of the satellite view in Google Maps. Each pixel on the screen has information. Trees are largely empty, hence much of the display space is wasted. Moving trees to 3D just gives us more space to waste.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Trees don't have a natural ordering&lt;/b&gt;&lt;br /&gt;Even if we accept that trees are useful visualisations, they have problems. Given the tree &lt;code&gt;((1,2),(3,4));&lt;/code&gt; we have a lot of (perhaps too much) freedom in how we can depict that tree. For example, both diagrams below depict this tree. In the &lt;i&gt;x&lt;/i&gt;-axis there is a partial order of internal nodes (the ancestor of {1,2} must be to the right of the ancestor {1,2,3,4}), but the tree &lt;code&gt;((1,2),(3,4));&lt;/code&gt; says nothing about the relative ordering of {1,2} versus {3,4}. We are free to choose. A natural linear ordering would be divergence time, but estimates of those times can be contested, or unavailable.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TWY_XlWO8iI/AAAAAAAAA10/GLUuuL94HKg/order.png?imgmax=800" alt="order.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Phylogenies are &lt;i&gt;unordered trees&lt;/i&gt; in the sense that I can rotate any node about it's ancestor and still have the same tree (compare the two trees above). Phylogenies are like mobiles:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe title="YouTube video player" width="400" height="330" src="http://www.youtube.com/embed/_areh4qbaNE" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;The practical consequence of this is that different tree viewers can render the same tree in very different ways, making navigation across viewers unpredictable. Compare this to maps. Even if I use different projections, the maps remain recognisably similar, and most maps retain similar relationships between areas. If I look at a map of Glasgow and move left I will end up in the Atlantic Ocean, no matter if I use Google Maps or Microsoft Maps. Furthermore, trees grow in a way that maps don't (at least, not much). If I add nodes to a tree it may radically change shape, destroying navigation cues that I may have relied on before. Typically maps change by the addition of layers, not by moving bits around (paleogeographic maps excepted).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Trees aren't 3D&lt;/b&gt;&lt;br /&gt;There's nothing intrinsically 3D about trees, which means any mapping to 3D space is going to be arbitrary. Indeed, most 3D viewers simply avoid any mapping and show a 2D tree in 3D space, which seems rather pointless.&lt;br /&gt;&lt;br /&gt;Perhaps it's because I don't play computer games much (went through an Angry Birds phase, and occasionally pick up an X-Box controller, only to be mercilessly slaughtered by my son), but I'm not inspired by the analogy with computer games. I'm not denying that there are useful things to learn from games (I'm sure the controls in Google Earth owe something to games). But games also rely on a visceral connection with the play, and an understanding of the visual vocabulary (how to unlock treasure, etc.). Matt's 3D model requires users to learn a whole visual vocabulary, much of which (e.g., "Fruit on your tree? Someone has left comment(s) or feedback. ") seems forced.&lt;br /&gt;&lt;br /&gt;My sense is that the most successful interfaces make the minimal demands on users, don't fight their intuition, and don't force them to accept a particular visualisation of their own cognitive space. &lt;br /&gt;&lt;br /&gt;I'll write more about this once I get my 2D tree viewer into shape where it can be shown. It will be a lot less imaginative than Matt's vision, all I'm shooting for is that it is usable.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8822195509102808382?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8822195509102808382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8822195509102808382'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/why-3d-phylogeny-viewers-don-work.html' title='Why 3D phylogeny viewers don&amp;#39;t work'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_Gct8lVAxKqQ/TWZCOrRP_iI/AAAAAAAAA18/vWgbdJNT0ec/s72-c/3d_tree_browsing.jpg?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-6206130378517017911</id><published>2011-02-18T12:13:00.001Z</published><updated>2011-02-18T12:15:57.577Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='citation mutation'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='Australian Faunal Directory'/><title type='text'>Why metadata matters</title><content type='html'>Quick note to express the frustration I experience sometimes when dealing with taxonomic literature. As part of a frankly &lt;a href="http://en.wikipedia.org/wiki/Quixotism"&gt;Quixotic&lt;/a&gt; desire to link every article cited in the Australian Faunal Directory (AFD) to the equivalent online resource (for example, in the Biodiversity Heritage Library using BioStor, or to a publisher web site using a DOI) I sometimes come across references that I should be able to find yet can't. Often it turns out that the metadata for the article is incorrect. For example, take this reference:&lt;br /&gt;&lt;blockquote&gt;Report upon the Stomatopod crustaceans obtained by P.W. Basset-Smith Esq., surgeon R.N. during the cruise, in the Australia and China Sea, of H.M.S. "Penguin", commander W.V. Moore. Ann. Mag. Nat. Hist. Vol. 6 pp. 473-479 pl. 20B&lt;/blockquote&gt; which is in the Australian Faunal Directory (&lt;a href="http://lsid.tdwg.org/summary/urn:lsid:biodiversity.org.au:afd.publication:087892ae-2134-4bb4-83ae-8b8cbd15b299"&gt;urn:lsid:biodiversity.org.au:afd.publication:087892ae-2134-4bb4-83ae-8b8cbd15b299&lt;/a&gt;). Using my &lt;a href="http://biostor.org/openurl"&gt;OpenURL resolver in BioStor&lt;/a&gt; I failed to locate this article. Sometimes this is because the code I used to parse references from AFD mangles the reference, but not in this case. So, I &lt;a href="http://www.google.co.uk/search?q=Report+upon+the+Stomatopod+crustaceans+obtained+by+P.W.+Basset-Smith+Esq.,+surgeon+R.N.+during+the+cruise,+in+the+Australia+and+China+Sea,+of+H.M.S.+Penguin,+commander+W.V.+Moore&amp;ie=UTF-8&amp;oe=UTF-8"&gt;Google the title&lt;/a&gt; and find a page in the Zoological catalogue of Australia: Aplacophora, Polyplacophora, Scaphopoda:&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;iframe frameborder="0" scrolling="no" style="border:0px" src="http://books.google.co.uk/books?id=Sc7i6AL-GewC&amp;lpg=PA60&amp;ots=AShJOhTrZ0&amp;dq=Report%20upon%20the%20Stomatopod%20crustaceans%20obtained%20by%20P.W.%20Basset-Smith%20Esq.%2C%20surgeon%20R.N.%20during%20the%20cruise%2C%20in%20the%20Australia%20and%20China%20Sea%2C%20of%20H.M.S.%20Penguin%2C%20commander%20W.V.%20Moore&amp;pg=PA60&amp;output=embed" width=400 height=500&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Here's the relevant part of this page:&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TV5ia7-sr1I/AAAAAAAAA1s/fUNOFPOEJ8Y/zoocat.png?imgmax=800" alt="Zoocat" border="0" width="313" height="120" /&gt;&lt;br /&gt;Same as AFD, &lt;i&gt;Ann. Mag. Nat. Hist.&lt;/i&gt; volume 6,  pages 473-479, 1893.&lt;br /&gt;&lt;br /&gt;In despair I looked at the BHL page for &lt;a href="http://www.biodiversitylibrary.org/bibliography/15774"&gt;&lt;i&gt;The Annals and Magazine of Natural History&lt;/i&gt;&lt;/a&gt; and discover that &lt;b&gt;there is no volume 6 published in 1893&lt;/b&gt;. There is, however, &lt;b&gt;series 6&lt;/b&gt;. Oops! Browsing the BHL content I discover the start of the article I'm looking for on &lt;a href="http://biodiversitylibrary.org/page/27734740"&gt;BHL page 27734740 &lt;/a&gt;, &lt;b&gt;volume 11&lt;/b&gt; of series 6 of &lt;i&gt;The Annals and Magazine of Natural History&lt;/i&gt;. Gotcha! So, I can now &lt;a href="http://iphylo.org/~rpage/afd/id/087892ae-2134-4bb4-83ae-8b8cbd15b299"&gt;link AFD to BHL like this&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I should stress that in general AFD is an great resource for someone like me trying to link names to literature and, to be fair, with its reuse of volume numbers across series &lt;i&gt;The Annals and Magazine of Natural History&lt;/i&gt; can be a challenge to cite. Usually the bibliographic details in AFD are accurate enough to locate articles in BHL or CrossRef, but every so often references get mangled, misinterpreted, or someone couldn't resist adding a few "helpful" notes to a field in the database, resulting in my parser failing. What is slightly alarming is how often when I Google for the reference I find the same, erroneous metadata repeated across several articles. This, coupled with the inevitable &lt;a href="http://www.the-scientist.com/news/display/57698/"&gt;citation mutations&lt;/a&gt; can make life a little tricky. The bulk of the links I'm making are constructed automatically, but there are a few cases where one is lead on a wild goose chase to find the actual reference. &lt;br /&gt;&lt;br /&gt;Although this is an example of why it matters to have accurate metadata, it can also be seen as an argument for using identifiers rather than metadata. If these references had stable, persistent identifiers (such as DOIs) that taxonomic databases cited, then we wouldn't need detailed metadata, and we could avoid the pain of rummaging around in digital archives trying to make sense of what the author meant to cite. Until taxonomic databases routinely use identifiers for literature, names and literature will be as ships that pass in the night. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-6206130378517017911?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6206130378517017911'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/6206130378517017911'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/why-metadata-matters.html' title='Why metadata matters'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_Gct8lVAxKqQ/TV5ia7-sr1I/AAAAAAAAA1s/fUNOFPOEJ8Y/s72-c/zoocat.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-1296499321246254658</id><published>2011-02-06T15:18:00.001Z</published><updated>2011-02-06T15:18:32.189Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Atlas of Living Australia'/><category scheme='http://www.blogger.com/atom/ns#' term='Australian Faunal Directory'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='pagerank'/><title type='text'>Why is the Atlas of Living Australia is invisible to Google?</title><content type='html'>Jeff Atwood, one of the co-founders of &lt;a href="http://stackoverflow.com"&gt;Stack Overflow&lt;/a&gt; recently wrote a blog post &lt;a href="http://www.codinghorror.com/blog/2011/01/trouble-in-the-house-of-google.html"&gt;Trouble In the House of Google&lt;/a&gt;, where he noted that several sites that scrape Stack Overflow content (which Stack Overflow's CC-BY-SA license permits) appear &lt;b&gt;higher in Google's search rankings than the original Stack Overflow pages&lt;/b&gt;. When Stack Overflow chose the CC-BY-SA license they made the assumption that:&lt;br /&gt;&lt;blockquote&gt;...that we, as the canonical source for the original questions and answers, would always rank first...That's why Joel Spolsky and I were confident in sharing content back to the community with almost no reservations – because Google mercilessly penalizes sites that attempt to game the system by unfairly profiting on copied content.&lt;/blockquote&gt;Jeff Atwood's post goes on to argue that something is wrong with the way Google is ranking sites that derive content from other sites.&lt;br /&gt;&lt;br /&gt;I was reminded of this post when I started to notice that searches for fairly obscure Australian animals would often return my own web site &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt; as the first hit. In one sense this is personally gratifying, but it can also be frustrating because when I Google these obscure taxa it's usually because I'm trying to find data that isn't already in one of my projects.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TU6p-A22XPI/AAAAAAAAA1M/uvBaBv2bpac/unotata.pic1.JPG?imgmax=800" alt="unotata.pic1.JPG" border="0" width="200"  align="right" /&gt;But what I've also noticed is that the site that I obtained the data from, &lt;a href="http://www.environment.gov.au/biodiversity/abrs/online-resources/fauna/afd/home"&gt;Australian Faunal Directory&lt;/a&gt; (AFD), rarely appears in the Google search results. In fact, there are taxa for which Google doesn't find the corresponding page in AFD. For example, if you &lt;a href="http://www.google.com/search?sourceid=chrome&amp;ie=UTF-8&amp;q=Uxantis+notata"&gt;search for &lt;i&gt;Uxantis notata&lt;/i&gt;&lt;/a&gt; (shown here in an image from the &lt;a href="http://www1.dpi.nsw.gov.au/keys/fulgor/species/unotata.htm"&gt;Key to the planthoppers of Australia and New Zealand&lt;/a&gt;) the first hit(s) are from &lt;a href="http://iphylo.org/~rpage/afd/id/b87cdae6-371a-46d2-9ca8-5bdcb0d5fad5"&gt;my version of AFD&lt;/a&gt;:&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TU6rCAnip0I/AAAAAAAAA1U/4xXYomq5PwA/Snapshot%202011-02-06%2014-05-44.png?imgmax=800" alt="Snapshot 2011-02-06 14-05-44.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Neither the original AFD, nor the &lt;a href="http://www.ala.org.au/"&gt;Atlas of Living Australia&lt;/a&gt; (ALA), which also builds on AFD, appear in the top 10 hits. &lt;br /&gt;&lt;br /&gt;Initially I though this is probably an artefact. This is a pretty obscure taxon, maybe things like rounding error in computing &lt;a href="http://en.wikipedia.org/wiki/PageRank"&gt;PageRank&lt;/a&gt; are going to affect search rankings more than anything else. However, if I explicitly tell Google to search for &lt;a href="http://www.google.com/search?q=Uxantis+notata+site:environment.gov.au&amp;hl=en&amp;num=10&amp;lr=&amp;ft=i&amp;cr=&amp;safe=images&amp;tbs="&gt;&lt;i&gt;Uxantis notata&lt;/i&gt; in the domain &lt;code&gt;environment.gov.au&lt;/code&gt;&lt;/a&gt; &lt;b&gt;I get no hits whatsoever&lt;/b&gt;:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TU6sC6rYp9I/AAAAAAAAA1c/gvjOcw6l3A8/Snapshot%202011-02-06%2014-10-32.png?imgmax=800" alt="Snapshot 2011-02-06 14-10-32.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;Likewise, &lt;a href="http://www.google.co.uk/search?sourceid=chrome&amp;ie=UTF-8&amp;q=Uxantis+notata+site:ala.org.au"&gt;the same search restricted to &lt;code&gt;ala.org.au&lt;/code&gt;&lt;/a&gt; &lt;b&gt;finds nothing, nothing at all&lt;/b&gt;. Both AFD and Atlas of Living Australia have pages for this taxon, &lt;a href="http://www.environment.gov.au/biodiversity/abrs/online-resources/fauna/afd/taxa/b87cdae6-371a-46d2-9ca8-5bdcb0d5fad5"&gt;here&lt;/a&gt;, and &lt;a href="http://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:afd.taxon:b87cdae6-371a-46d2-9ca8-5bdcb0d5fad5"&gt;here&lt;/a&gt;, so clearly something is deeply wrong. &lt;br /&gt;&lt;br /&gt;Why are the original providers of the data not appearing in Google search results at all? For someone like me who argues that sharing data is a good thing, and sites that aggregate and repurpose data will ultimately benefit the original data providers (for example by sending traffic and Google Juice) this is somewhat worrying. It seems to reinforce the fear that many data providers have: "if I share my data someone will make a better web site than mine and people will go to that web site, rather than the one I've created with my hard-won data." It may well be that data aggregators will score higher than data providers in Google searches, but I hadn't expected data providers to be virtually invisible.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TU6uoAUYT3I/AAAAAAAAA1k/kY-eNL03gQ0/atlasaustraliasm.png?imgmax=800" alt="atlasaustraliasm.gif" border="0" width="48" height="47" align="right" /&gt;&lt;b&gt;Google isn't the problem&lt;/b&gt;&lt;br /&gt;If a web site that I hacked together in a few days does better in Google searches than the rather richer pages published by sites such as ALA (with a &lt;a href="http://www.csiro.au/news/Funds-for-Atlas-of-Living-Australia.html"&gt;budget of over $AU 30 million&lt;/a&gt;), something is wrong. Unlike the Stack Overflow example discussed above, I don't think the problem here is with Google. &lt;br /&gt;If we search in Google for an "iconic" Australian taxon by name, say the Koala &lt;i&gt;Phascolarctos cinereus&lt;/i&gt;, Wikipedia is the first hit (which should be &lt;a href="http://iphylo.blogspot.com/2009/09/google-wikipedia-and-eol.html"&gt;no surprise&lt;/a&gt;). ALA doesn't appear in the top ten. If we tell Google to just &lt;a href="http://www.google.com/search?client=safari&amp;rls=en&amp;q=Phascolarctos+cinereus+site:ala.org.au&amp;ie=UTF-8&amp;oe=UTF-8"&gt;search the domain &lt;code&gt;ala.org.au&lt;/code&gt;&lt;/a&gt; we get lots of pages from ALA, but not the actual species page for &lt;a href="http://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:afd.taxon:05d49ac2-0d88-45e7-8ae4-9164a274014d"&gt;&lt;i&gt;Phascolarctos cinereus&lt;/i&gt;&lt;/a&gt;. This suggests that there is something about the way ALA's website works that prevents Google indexing it properly. I'm also a little worried that a major biodiversity project which has as its aim &lt;br /&gt;&lt;blockquote&gt;...to improve access to essential information on Australia’s biodiversity&lt;/blockquote&gt; is effectively invisible to Google.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-1296499321246254658?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1296499321246254658'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1296499321246254658'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/why-is-atlas-of-living-australia-is.html' title='Why is the Atlas of Living Australia is invisible to Google?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_Gct8lVAxKqQ/TU6p-A22XPI/AAAAAAAAA1M/uvBaBv2bpac/s72-c/unotata.pic1.JPG?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4589053503317301865</id><published>2011-02-04T13:50:00.001Z</published><updated>2011-02-04T13:50:28.635Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='video'/><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='Web Hooks'/><category scheme='http://www.blogger.com/atom/ns#' term='screencast'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenURL'/><title type='text'>Web Hooks and OpenURL: the screencast</title><content type='html'>Yesterday I posted notes on &lt;a href="http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-making-databases.html"&gt;Web Hooks and OpenURL&lt;/a&gt;. That post was written when I was already late (you know, when you say to yourself "yeah, I've got time, it'll just take 5 minutes to finish this..."). The Web Hooks + OpenURL project is still very much a work in progress, but I thought a screen cast would help explain why I think this is going to make my life a lot easier. It shows an example where I look at a bibliographic record in one database (AFD, the &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt;), click on a link that takes me to &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; — where I can find the reference in BHL — then simply click on a button on the BioStor page to "automagically" update the AFD database. The "magic" is the Web Hook. The link I click on in the AFD database contains the identifier for that entry in the AFD, as well a a URL BioStor can call when it's found the reference (that URL is the "web hook").&lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://player.vimeo.com/video/19563196" width="400" height="300" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/19563196"&gt;Using Web Hooks and OpenURL&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4589053503317301865?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4589053503317301865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4589053503317301865'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-screencast.html' title='Web Hooks and OpenURL: the screencast'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-1922849057445060538</id><published>2011-02-03T18:46:00.001Z</published><updated>2011-02-03T18:46:18.709Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Web Hooks'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenURL'/><title type='text'>Web Hooks and OpenURL: making databases editable</title><content type='html'>For me one of the most frustrating things about online databases is that they often can't be edited. For example, I've recently created a version of the &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt;, which contains a list of all animals in Australia, and a fairly comprehensive bibliography of taxonomic publication on those animals. What I'd like to do is locate those publications online. Using various scripts I've found DOIs for some 2,500 articles, and located nearly 4,900 article in BHL, and added these to the database, but browsing the database (using, say, the &lt;a href="http://iphylo.blogspot.com/2011/01/quantum-treemaps-meet-bhl-and.html"&gt;quantum treemap interface&lt;/a&gt;) makes it clear there are lots of publications that I've missed.&lt;br /&gt;&lt;br /&gt;It would be great if I could go to the Australian Faunal Directory on CouchDB and edit these on that site, but that would require making the data editable, and that means adding a user interface. And that's potentially a lot of work. Then, if I go to another database (say, my &lt;a href="http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html"&gt;CouchDB version of the Catalogue of Life&lt;/a&gt;) and want to make that editable then I have to add an interface to that database as well. I could switch to using a wiki, which I've done for some projects (such as the &lt;a href="http://iphylo.blogspot.com/2010/06/linking-ncbi-to-wikipedia.html"&gt;NCBI to Wikipedia mapping&lt;/a&gt;), but wikis have their own issues (in particular, they don't easily support the kinds of queries I want to do).&lt;br /&gt;&lt;br /&gt;There is, as they say, a third way: &lt;a href="http://www.webhooks.org/"&gt;web hooks&lt;/a&gt;. I first came across web hooks when I discovered that  &lt;a href="http://code.google.com/p/support/wiki/PostCommitWebHooks"&gt;Post-Commit Web Hooks&lt;/a&gt; in Google Code. The idea is you can create a web service that gets called every time you commit code to the Google Code repository. For example, each time you commit code you can call a web hook that uses the Twitter API to tweet details of what you just committed (I tried this for a while, until some of my Twitter followers got seriously pissed off by the volume of tweets this was generating).&lt;br /&gt;&lt;br /&gt;What has this to do with making databases editable? Well, imagine the following scenario. A web page displays a publication, but no DOI. However, the web page embeds an OpenURL in the form of a &lt;a href="http://ocoins.info/"&gt;COinS&lt;/a&gt; (in other words, a URL with key-value pairs describing the publication). If you use a tool such as the &lt;a href="https://addons.mozilla.org/en-US/firefox/addon/4150"&gt;OpenURL Referrer&lt;/a&gt; in Firefox you can use an OpenURL resolver to find that publication. Examples of OpenURL resolvers include &lt;a href="http://bioguid.info"&gt;bioGUID&lt;/a&gt; and &lt;a href="http://biostor.org/openurl"&gt;BioStor&lt;/a&gt;. Let's say you find the publication, and it has a DOI. How do you tell the database about this? Well, you can try and find an email address of someone running the database so you can send them the information, but this is a hassle. What if the OpenURL resolver that you used to find the DOI could automatically tell the source database that it's found the DOI? That's the idea behind web hooks.&lt;br /&gt;&lt;br /&gt;I've started to experiment with this, and have most of the pieces working. Publication pages in &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt; have COinS that include two additional pieces of information: (1) the database identifier for the publication (in this case a UUID, in the hideously complex jargon of OpenURL this the "Referring Entity Identifier"), and (2) the URL of the web hook. The idea is that an OpenURL resolver can take the OpenURL and try and locate the article. If it succeeds it will call the web hook URL supplied by the database, tell it "hey, I've found this DOI for the publication with this database identifier". The database can then update its data, so the next time a user visits the page for that publication in the database, the user will see the DOI. This has the huge advantage over tools that just modify the web page on the fly, such as &lt;a href="http://ispiders.blogspot.com/2010/08/reference-parser-revived.html"&gt;David Shorthouse's reference parser&lt;/a&gt; of persistence: the database itself is updated, not just the web page.&lt;br /&gt;&lt;br /&gt;In order to make this work, all the database needs to do is have a web hook, namely a URL that accepts POST requests. The heavy lifting of searching for the publication, or enabling users to correct and edit the data can be devolved to a single place, namely the OpenURL resolver. As a first step I'm building an OpenURL resolver that displays a form the in which the user can edit bibliographic details, and launch searches in CrossRef (and soon BioStor). When the user is done they can close the form, which is when it calls the web hook with the edited data. The database can then choose to accept or reject the update.&lt;br /&gt;&lt;br /&gt;Given that it's easy to create the web hook, and trivial to get a database to output an OpenURL with its internal identifier and the URL of the web hook, this seems like a light-weight way of making databases editable.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-1922849057445060538?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1922849057445060538'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/1922849057445060538'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-making-databases.html' title='Web Hooks and OpenURL: making databases editable'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7360800832475343538</id><published>2011-01-18T18:20:00.001Z</published><updated>2011-01-18T18:20:46.164Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='treemap'/><category scheme='http://www.blogger.com/atom/ns#' term='Internet Explorer'/><category scheme='http://www.blogger.com/atom/ns#' term='Australian Faunal Directory'/><category scheme='http://www.blogger.com/atom/ns#' term='interface'/><category scheme='http://www.blogger.com/atom/ns#' term='quantum treemap'/><category scheme='http://www.blogger.com/atom/ns#' term='CSS'/><title type='text'>Quantum treemaps meet BHL and the Australian Faunal Directory</title><content type='html'>One of the things I'm enjoying about the &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt; is the chance to play with some ideas without worrying about breaking lots of code or, indeed, upsetting any users ('cos, let's face it, there aren't any). As a result, I can start to play with ideas that may one day find their way into other projects.&lt;br /&gt;&lt;br /&gt;One of these ideas is to use quantum treemaps to display an author's publications. For example, below is a treemap showing publications by &lt;a href="http://iphylo.org/~rpage/afd/author/Boulenger,G+A"&gt;G A Boulenger&lt;/a&gt; in my &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt; project. The publications are clustered by journal. If a publication has been found in &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; the treemap displays a thumbnail of that publication, otherwise it shows a white rectangle. At a glance we can see where the gaps are. You can view a publication's details simply by clicking on it.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TTXZ9q0TqZI/AAAAAAAAA08/_26luKj2XmM/boulenger.png?imgmax=800" alt="boulenger.png" border="0" width="318" height="600" /&gt;&lt;br /&gt;&lt;br /&gt;The entomologist &lt;a href="http://iphylo.org/~rpage/afd/author/Distant,W+L"&gt;W L Distant&lt;/a&gt; has a more impressive treemap, and clearly I need to find quite a few of his publications.&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TTXZ-38zW3I/AAAAAAAAA1A/Xk7J_Fns4BE/distant.png?imgmax=800" alt="distant.png" border="0" width="311" height="600" /&gt;&lt;br /&gt;I quite like the look of these, so may think about adding this display to BioStor. I may also think about using treemaps in my ongoing iPad projects. If you want to see where I'm going with this then take a look at Good et al. &lt;a href="http://www.mendeley.com/research/fluid-treemap-interface-personal-digital-libraries/"&gt;A fluid treemap interface for personal digital libraries&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;br /&gt;The quantum treemap is computed using some rather ugly PHP I wrote, based on this &lt;a href="http://www.cs.umd.edu/hcil/photomesa/download/layout-algorithms.shtml"&gt;Java code&lt;/a&gt;. I've not implemented all the refinements of the original Java code, so the quantum treemaps I create are sometimes suboptimal. To avoid too much visual cluster I haven't drawn a border around each cell, instead I use CSS gradients to indicate the area of the cell (if you're using Internet Explorer the gradient will be vertical rather than going from top left to bottom right). The journal name is overlain on the cell contents, but if you are using a decent browser (i.e., &lt;b&gt;not&lt;/b&gt; Internet Explorer) you can still click through this text to the underlying thumbnail because the text uses the CSS property &lt;br /&gt;&lt;code&gt;.overlay { pointer-events: none; }&lt;/code&gt;&lt;br /&gt;I learnt this trick from the Stack Overflow question &lt;a href="http://stackoverflow.com/questions/3793204/click-through-div-with-an-alpha-channel"&gt;Click through div with an alpha channel&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7360800832475343538?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7360800832475343538'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7360800832475343538'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/01/quantum-treemaps-meet-bhl-and.html' title='Quantum treemaps meet BHL and the Australian Faunal Directory'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_Gct8lVAxKqQ/TTXZ9q0TqZI/AAAAAAAAA08/_26luKj2XmM/s72-c/boulenger.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-43107762417323476</id><published>2011-01-14T08:14:00.001Z</published><updated>2011-01-14T09:02:38.815Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='identifiers'/><category scheme='http://www.blogger.com/atom/ns#' term='DOI'/><category scheme='http://www.blogger.com/atom/ns#' term='CrossRef'/><category scheme='http://www.blogger.com/atom/ns#' term='Phthiraptera'/><category scheme='http://www.blogger.com/atom/ns#' term='Cool URIs'/><category scheme='http://www.blogger.com/atom/ns#' term='domain names'/><title type='text'>The demise of phthiraptera.org and the perils of using Internet domain names as identifiers</title><content type='html'>&lt;blockquote&gt;When otherwise sensible technorati refer to "owning" a domain name, it makes me want to stick forks in my eyeballs. We do not "own" domain names. At best, we only lease them and there are manifold ways in which we could lose control of a domain name - through litigation, through forgetfulness, through poverty, through voluntary transfer, etc. Once you don't control a domain name anymore, then you can't control your domain-name-based persistent identifiers either. - Geoffrey Bilder &lt;a href="http://blogs.nature.com/mfenner/2009/02/17/interview-with-geoffrey-bilder"&gt;interviewed by Martin Fenner&lt;/a&gt;&lt;/blockquote&gt;Geoffery Bilder's comments about the unsuitability of URLs as long term identifiers (as opposed, say, to DOIs) came to mind when I discovered that the domain &lt;a href="http://www.phthiraptera.org/"&gt;phthiraptera.org&lt;/a&gt; is up for sale: &lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TS__0Q_HLvI/AAAAAAAAA00/z-04lOzSN3c/Snapshot%202011-01-14%2007-47-39.png?imgmax=800" alt="Snapshot 2011-01-14 07-47-39.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;This domain used to be home to a wealth of resources on lice (order Phthiraptera). I discovered that ownership of the domain had expired when a bunch of links to PDFs returned by an &lt;a href="http://ispecies.org/?q=Collodennyus"&gt;iSpecies search for &lt;i&gt;Collodennyus&lt;/i&gt;&lt;/a&gt; all bounced to the holding page above. Phthiraptera.org was owned by the late &lt;a href="http://vsmith.info/Bob-Dalgleish"&gt;Bob Dalgleish&lt;/a&gt;. After his death, ownership of the domain lapsed, and it's now up for sale. Although much of the content of Phthiraptera.org has been moved to &lt;a href="http://phthiraptera.info/"&gt;phthiraptera.info&lt;/a&gt;, URLs containing phthiraptera.org still turn up in search results, especially ones that have been cached (for example, in iSpecies). Given that much of the content is still available the loss isn't total, but anyone relying on links containing phthiraptera.org to point to content (such as a PDF), or to identify that content (such as a publication) will find themselves in trouble. Although ideally &lt;a href="http://www.w3.org/Provider/Style/URI"&gt;Cool URIs don't change&lt;/a&gt;, in practice they do, and with alarming frequency. Furthermore, in this case, because ownership of phthiraptera.org has lapsed, there's no opportunity to create redirects from URLs with phthiraptera.org to the equivalent content in phthiraptera.info (leaving aside the issue that phthiraptera.info is not a mirror of phthiraptera.org, so exactly what the redirects would point to is unclear).&lt;br /&gt;&lt;br /&gt;Identifiers based on domain names, such as URLs and LSIDs are attractive because the DNS helps ensure global uniqueness, and HTTP provides a way to resolve the identifier, but all this is contingent on the domain itself persisting. For more on this topic I recommend reading &lt;a href="http://blogs.nature.com/mfenner/2009/02/17/interview-with-geoffrey-bilder"&gt;Martin Fenner's interview of CrossRef's Geoffrey Bilder&lt;/a&gt;, from which I took the opening quote.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-43107762417323476?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/43107762417323476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/43107762417323476'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/01/demise-of-phthirapteraorg-and-perils-of.html' title='The demise of phthiraptera.org and the perils of using Internet domain names as identifiers'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_Gct8lVAxKqQ/TS__0Q_HLvI/AAAAAAAAA00/z-04lOzSN3c/s72-c/Snapshot%202011-01-14%2007-47-39.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7897395647806955118</id><published>2011-01-11T18:34:00.001Z</published><updated>2011-01-11T18:34:55.576Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='The Plant List'/><category scheme='http://www.blogger.com/atom/ns#' term='Creative Commons'/><title type='text'>Why won't The Plant List won't let me do this?</title><content type='html'>In my &lt;a href="http://iphylo.blogspot.com/2010/12/plant-list-nice-data-shame-it-not-open.html"&gt;last post&lt;/a&gt; I discussed why I thought the decision of &lt;a href="http://www.theplantlist.org"&gt;The Plant List&lt;/a&gt; to use a restrictive license (CC-BY-NC-ND) was such a poor choice. CC-BY-NC-ND states that &lt;br /&gt;&lt;blockquote&gt;You may not alter, transform, or build upon this work.&lt;/blockquote&gt;To make this point more concrete, I've created this site:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/theplantlist/"&gt;Experiments with The Plant List&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;to show the kinds of things that The Plant List's choice of license prevents the taxonomic community from doing. As a first step I'm exploring linking the names in the list to the primary scientific literature, as this video demonstrates:&lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://player.vimeo.com/video/18671689" width="400" height="300" frameborder="0"&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://vimeo.com/18671689"&gt;The Plant List&lt;/a&gt; from &lt;a href="http://vimeo.com/rdmpage"&gt;Roderic Page&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;&lt;br /&gt;For example, we can take a name like &lt;i&gt;Begonia zhengyiana&lt;/i&gt; Y.M.Shui, parse the bibliographic citation provided by The Plant List (via IPNI), and locate the actual paper online, in this case it's freely available as a PDF:&lt;br /&gt;&lt;br /&gt;&lt;iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.plantsystematics.com%2Fqikan%2Fmanage%2Fwenzhang%2Ff010097.pdf&amp;amp;embedded=true" width="400" height="500" style="border: none;"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;Now we can see a drawing of the plant, and instead of simply trusting that the compilers of The Plant List have correctly interpreted this paper, we can see for ourselves. Down the track, we could imagine mining this paper for details about the plant, such as its morphology and geographic distribution. This requires the link to the original literature, which The Plant List lacks. &lt;br /&gt;&lt;br /&gt;A good chunk of the recent plant taxonomic literature has DOIs, for example journals such as the &lt;i&gt;Kew Bulletin&lt;/i&gt; and &lt;i&gt;Novon&lt;/i&gt;. Playing with some scripts I've managed to associate nearly 9000 accepted names with a DOI, and that's by looking at only a few journals. There are lots more DOIs to be found, but because of the way botanical nomenclators record references (see my post &lt;a href="http://iphylo.blogspot.com/2009/05/nomenclators-digitised-literature-fail.html"&gt;Nomenclators + digitised literature = fail&lt;/a&gt;) it can be something of a challenge to find them. This task isn't helped by the fairly lax way some publishers enter data in CrossRef (Cambridge University Press I'm looking at you). The other obvious source of digitised literature is, of course, &lt;a href="http://www.biodiversitylibrary.org"&gt;BHL&lt;/a&gt;, and that's next on the list of resources to play with.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://iphylo.org/~rpage/theplantlist/"&gt;Experiments with The Plant List&lt;/a&gt; is very crude, and I've barely scratched the surface of linking names to primary literature. That said, given that there are exactly zero links between names and digital literature in The Plant List, I'd argue that my site adds value to the data in that The Plant List. And that's my point — by making data available for others to play with, you enable others to add value to that data. By choosing a CC-BY-NC-ND license, The Plant List has killed that possibility.&lt;br /&gt;&lt;br /&gt;So, my question for The Plant List is "why did you do that?"&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7897395647806955118?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7897395647806955118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7897395647806955118'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2011/01/why-won-plant-list-won-let-me-do-this.html' title='Why won&amp;#39;t The Plant List won&amp;#39;t let me do this?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-4672679377173148213</id><published>2010-12-29T19:22:00.001Z</published><updated>2010-12-29T19:40:08.730Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='data'/><category scheme='http://www.blogger.com/atom/ns#' term='Plant List'/><category scheme='http://www.blogger.com/atom/ns#' term='license'/><category scheme='http://www.blogger.com/atom/ns#' term='open data'/><category scheme='http://www.blogger.com/atom/ns#' term='Creative Commons'/><category scheme='http://www.blogger.com/atom/ns#' term='MOBOT'/><category scheme='http://www.blogger.com/atom/ns#' term='Kew'/><title type='text'>The Plant List: nice data, shame it's not open</title><content type='html'>&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TRt_2SQxrwI/AAAAAAAAA0s/bQHZhz44vB4/nd.large.png?imgmax=800" alt="nd.large.png" border="0" width="200" height="200" align="right" /&gt;The Plant List (&lt;a href="http://www.theplantlist.org/"&gt;http://www.theplantlist.org/&lt;/a&gt;) has been released today, complete with &lt;a href="http://www.mobot.org/events/Assets/10195ThePlantList.pdf"&gt;glowing press releases&lt;/a&gt;. The list includes some 1,040,426 names. I eagerly looked for the &lt;b&gt;Download&lt;/b&gt; button, but none is to be found. You can grab download individual search results (say, at family level), but not the whole data set.&lt;br /&gt;&lt;br /&gt;OK, so that makes getting the complete data set a little tedious (there are 620 plant families in the data set), but we can still do it without too much hassle (in fact, I've grabbed the complete data set while writing this blog post). Then I see that the data is licensed under a &lt;a href="http://creativecommons.org/licenses/by-nc-nd/3.0/"&gt;Creative Commons Attribution-NonCommercial-NoDerivs&lt;/a&gt; (CC BY-NC-ND) license. Creative Commons is good, right? In this case, not so much. The CC BY-NC-ND license includes the clause:&lt;br /&gt;&lt;blockquote&gt;You may not alter, transform, or build upon this work.&lt;/blockquote&gt;So, you can look but not touch. You can't take this data (properly attributed, or course) and build your own list, for example with references linked to DOIs, or to the &lt;a href="http://biodiversitylibrary.org"&gt;Biodiversity Heritage Library&lt;/a&gt; (which is, of course, exactly what I plan to do). That's a derivative work, and the creators of the Plant List don't want you to do that. Despite this, the Plant List want us to use the data:&lt;br /&gt;&lt;blockquote&gt;Use of the content (such as the classification, synonymised species checklist, and scientific names) for publications and databases by individuals and organizations for not-for-profit usage is encouraged, on condition that full and precise credit is given to The Plant List and the conditions of the Creative Commons Licence are observed.&lt;/blockquote&gt;Great, but you've pretty much killed that by using BY-NC-ND. Then there's this:&lt;br /&gt;&lt;blockquote&gt;If you wish to use the content on a public portal or webpage you are required to contact The Plant List editors at editors@theplantlist.org to request written permission and to ensure that credits are properly made.&lt;/blockquote&gt;Really? The whole point of Creative Commons is that the permissions are explicit in the license. So, actually I &lt;b&gt;don't&lt;/b&gt; need your permission to use the data on a public portal, CC BY-NC-ND gives me permission (but with the crippling limitation that I can't make a derivative work).&lt;br /&gt;&lt;br /&gt;So, instead of writing a post congratulating the Royal Botanic Gardens, Kew and Missouri Botanical Garden (MOBOT) for releasing this data, I'm left spluttering in disbelief that they would hamstring its use through such a poor choice of license. Kew and MOBOT could have made the Plant List available as open data using one of the licenses listed on  the &lt;a href="http://www.opendefinition.org/licenses/"&gt;Open Definition&lt;/a&gt; web site, such as putting the data in the public domain (for example, or using a &lt;a href="http://creativecommons.org/publicdomain/zero/1.0/"&gt;Creative Commons CC0 license&lt;/a&gt;). Instead, they've chosen a restrictive license which makes the data closed, effectively killing the possibility for people to build upon the effort they've put into creating the list. Why do biodiversity data providers seem determined to cling to data for dear life, rather than open it up and let people realise its potential? &lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-4672679377173148213?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4672679377173148213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/4672679377173148213'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/plant-list-nice-data-shame-it-not-open.html' title='The Plant List: nice data, shame it&amp;#39;s not open'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_Gct8lVAxKqQ/TRt_2SQxrwI/AAAAAAAAA0s/bQHZhz44vB4/s72-c/nd.large.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-96694687187465513</id><published>2010-12-23T17:11:00.001Z</published><updated>2010-12-23T17:16:36.811Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='OCR'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='names'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='spelling correction'/><category scheme='http://www.blogger.com/atom/ns#' term='Peter Norvig'/><title type='text'>BHL and OCR</title><content type='html'>Some quick notes on OCR. Revisiting my &lt;a href="http://iphylo.blogspot.com/2010/10/towards-interactive-djvu-file-viewer.html"&gt;DjVu viewer experiments&lt;/a&gt; it really struck me how "dirty" the OCR text is. It's readable, but if we were to display the OCR text rather than the images, it would be a little offputting. For example, in the paper &lt;b&gt;A new fat little frog (Leptodactylidae: &lt;i&gt;Eleutherodactylus&lt;/i&gt;) from lofty Andean grasslands of southern Ecuador&lt;/b&gt; (&lt;a href="http://biostor.org/reference/229"&gt;http://biostor.org/reference/229&lt;/a&gt;) there are 15 different variations of the frog genus &lt;i&gt;Eleutherodactylus&lt;/i&gt;:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Eleutherodactylus&lt;/li&gt;&lt;li&gt;Eleutheroclactylus&lt;/li&gt;&lt;li&gt;Eleuthewdactyliis&lt;/li&gt;&lt;li&gt;Eleiitherodactylus&lt;/li&gt;&lt;li&gt;Eleuthewdactylus&lt;/li&gt;&lt;li&gt;Eleuthewdactylus&lt;/li&gt;&lt;li&gt;Eleutherodactyliis&lt;/li&gt;&lt;li&gt;Eleutherockictylus&lt;/li&gt;&lt;li&gt;Eleutlierodactylus&lt;/li&gt;&lt;li&gt;Eleuthewdactyhts&lt;/li&gt;&lt;li&gt;Eleiithewdactylus&lt;/li&gt;&lt;li&gt;Eleutherodactyhis&lt;/li&gt;&lt;li&gt;Eleiithemdactylus&lt;/li&gt;&lt;li&gt;Eleuthemdactylus&lt;/li&gt;&lt;li&gt;Eleuthewdactyhis&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Of course, this is a recognised problem. Wei et al. &lt;b&gt;Name Matters: Taxonomic Name Recognition (TNR) in Biodiversity Heritage Library (BHL)&lt;/b&gt; (&lt;a href="http://hdl.handle.net/2142/14919"&gt;hdl:2142/14919&lt;/a&gt;) found that 35% of names in BHL OCR contained at least one wrong character. They compared the performance of two taxonomic name finding tools on BHL OCR (uBio's &lt;a href="http://www.ubio.org/index.php?pagename=soap_methods/taxonFinder"&gt;taxonFinder&lt;/a&gt; and &lt;a href="https://journals.ku.edu/index.php/jbi/article/viewArticle/34"&gt;FAT&lt;/a&gt;), neither of which did terribly well. Wei et al. found that different page types can influence the success of these algorithms, and suggested that automatically classifying pages into different categories would improve performance.&lt;br /&gt;&lt;br /&gt;Personally, it seems to me that this is not the way forward. It's pretty obvious looking at the versions of "Eleutherodactylus" above that there are recognisable patterns in the OCR errors (e.g., "u" becoming "ii", "ro" becoming "w", etc.). After reading Peter Norvig's elegant little essay &lt;a href="http://norvig.com/spell-correct.html"&gt;How to Write a Spelling Corrector&lt;/a&gt;, I suspect the way to improve the finding of taxonomic names is to build a "spelling corrector" for names. Central to this would be building a probabilistic model of the different OCR errors (such as "u" → "ii"), and use that to create a set of candidate taxonomic names the OCR string might actually be (the equivalent of Google's "did you mean", which is the subject of Norvig's essay). I had hoped to avoid doing this by using an existing tool, such as Tony Rees' &lt;a href="http://www.cmar.csiro.au/datacentre/taxamatch.htm"&gt;TAXAMATCH&lt;/a&gt;, but it's a website not a service, and it is just too slow. &lt;br /&gt;&lt;br /&gt;I've started doing some background reading on the topic of spelling correction and OCR, and I've created a group on Mendeley called &lt;a href="http://www.mendeley.com/groups/752871/ocr-optical-character-recognition/"&gt;OCR - Optical Character Recognition&lt;/a&gt; to bring these papers together. I'm also fussing with some simple code to find misspellings of a given taxonomic names in BHL text, use the &lt;a href="http://en.wikipedia.org/wiki/Needleman–Wunsch_algorithm"&gt;Needleman–Wunsch sequence alignment algorithm&lt;/a&gt; to align those misspellings to the correct name, and then extract the various OCR errors, building a matrix of the probabilities of the various transformations of the original text into OCR text.&lt;br /&gt;&lt;br /&gt;One use for this spelling correction would be in an interactive BHL viewer. In addition to showing the taxonomic names that uBio's taxonFinder has located in the text, we could flag strings that could be misspelt taxonomic names (such as "Eleutherockictylus") and provide an easy way for the user to either accept or reject that name. If we are going to invite people to help clean up BHL text, it would be nice to provide hints as to what the correct answer might be.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-96694687187465513?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/96694687187465513'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/96694687187465513'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/bhl-and-ocr.html' title='BHL and OCR'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3893488574012619038</id><published>2010-12-20T15:00:00.001Z</published><updated>2010-12-20T15:00:59.118Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><title type='text'>BioStor one year on: has it been a success?</title><content type='html'>One year ago I &lt;a href="http://iphylo.blogspot.com/2009/12/biostor.html"&gt;released BioStor&lt;/a&gt;, which scratched my itch regarding finding articles in the &lt;a href="http://www.biodiversitylibrary.org"&gt;Biodiversity Heritage Library&lt;/a&gt;. This anniversary seems to be a good time to think about where next with this project, but also to ask whether it's been successful. Of course, this rather hinges on what I mean by "success." I've certainly found &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; to be useful, both the experience of developing it, and actually using it. But it's time to be a little more hard-headed and look at some stats. So I'm going to share the Google Analytics stats for BioStor. Below is the report for Dec 20, 2009 to Dec 19, 2010, as a PDF. &lt;br /&gt;&lt;br /&gt;&lt;iframe width=100% height=560px frameborder=0 src=https://docs.google.com/viewer?a=v&amp;pid=explorer&amp;chrome=false&amp;embedded=true&amp;srcid=0B-PC5KKdhYCQNGZiODQxYTgtN2E2YS00OTk2LTgzMTEtNmE2MjYzMWY5OTk3&amp;hl=en&gt;&lt;/iframe&gt;&lt;b&gt;Visits&lt;/b&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TQ9vo2G0g0I/AAAAAAAAA0c/eoEWSmRlI1s/visits.png?imgmax=800" alt="visits.png" border="0" width="400" height="78" /&gt;&lt;br /&gt;&lt;br /&gt;BioStor had 63,824 visits over the year, and 197,076 pageviews. After an initial flurry of visits on its launch the number of visitors dropped off, then slowly grew. Numbers dipped during the middle of the year, then started to climb again.&lt;br /&gt;&lt;br /&gt;In order to discover whether these numbers are a little or a lot, it would be helpful to compare them with data from other biodiversity sites. Unfortunately, nobody seems to be making this information readily available. There is a slide in a &lt;a href="http://www.slideshare.net/chrisfreeland/bhl-tech-report-3526639"&gt;BHL presentation&lt;/a&gt; that shows BHL having had more than 1 million visits since January 2008, and in March 2010 it was receiving around 3000 visits per day, which is an order of magnitude greater than the traffic BioStor is currently getting. For another comparison, I looked at &lt;a href="http://scratchpads.eu/"&gt;Scratchpads&lt;/a&gt;, which currently comprise 193 sites. In November 2007 Scratchpads had 43,379 pageviews altogether, in November 2010 BioStor had 17,484 page views. For the period May-October 2009 &lt;a href="http://scratchpads.eu/about"&gt;Scratchpads had 74,109 visitors&lt;/a&gt;, for the equivalent period in 2010 BioStor had 28,110. So, BioStor is getting about a third of the traffic as the entire Scratchpad project.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Bounce rate&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;One of the more interesting charts is "Bounce rate", &lt;a href="http://www.google.com/support/analytics/bin/answer.py?hl=en&amp;answer=81986"&gt;defined by Google&lt;/a&gt; as&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Bounce rate is the percentage of single-page visits or visits in which the person left your site from the entrance (landing) page.&lt;/blockquote&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TQ9voAJk5bI/AAAAAAAAA0Y/geHLQZOwfcg/bouce.png?imgmax=800" alt="bouce.png" border="0" width="400" height="76" /&gt;&lt;br /&gt;The bounce rate for BioStor is pretty constant at around 65%, except for two periods in March and June, when it plummeted to around 20%. This corresponds to when I &lt;a href="http://iphylo.blogspot.com/2010/03/setting-up-local-wikisource.html"&gt;set up a Wikisource installation&lt;/a&gt; for BioStor so that the OCR text from BHL could be corrected. &lt;a href="http://phylo.bio.ku.edu/content/mark-t-holder"&gt;Mark Holder&lt;/a&gt; ran a student project that used the &lt;a href="http://biostor.org/wiki/"&gt;BioStor wiki&lt;/a&gt;, so I'm assuming that the drop in bounce rate reflects Mark's students spending time on the wiki. BHL OCR text would benefit from cleaning, but I'm not sure Wikisources is the way to do it as it feels a little clunky. Ideally I'd like to build upon the &lt;a href="http://iphylo.blogspot.com/2010/10/towards-interactive-djvu-file-viewer.html"&gt;interactive DjVu experiments&lt;/a&gt; to develop a user-friendly way to edit the underlying OCR text.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Is it just my itch?&lt;/b&gt;&lt;blockquote&gt;Every good work of software starts by scratching a developer's personal itch - Eric S. Raymond, &lt;a href="http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar"&gt;The Cathedral and the Bazaar&lt;/a&gt;&lt;/blockquote&gt;&lt;br /&gt;Looking at traffic by city, Glasgow (where I'm based) is the single largest source of traffic. This is hardly surprising, given that I wrote BioStor to solve a problem I was interested in, and the bulk of its content has been added by me using various scripts. This raises the possibility that BioStor has an active user community of *cough* one. However, looking at traffic by country, the UK is prominent (due to traffic primarily from Glasgow and London), but more visits come from the US. It seems I didn't end up making this site just for me.&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TQ9vp6BgDiI/AAAAAAAAA0g/-fE47xxl4j4/map.png?imgmax=800" alt="map.png" border="0" width="400" /&gt;&lt;b&gt;Google search&lt;/b&gt;&lt;br /&gt;Another measure of success is Google search rankings, which I've &lt;a href="http://iphylo.blogspot.com/2009/09/google-wikipedia-and-eol.html"&gt;used elsewhere to compare the impact of Wikipedia and EOL pages&lt;/a&gt;. As a quick experiment I Googled the top ten journals in BioStor and recorded where in the search results BioStor appeared. For all but the &lt;i&gt;Biological Bulletin&lt;/i&gt;, BioStor appeared in the top ten (i.e., on the first page of results):&lt;br /&gt;&lt;br /&gt;&lt;table style="border:1px solid black;"&gt;&lt;tr&gt;&lt;th&gt;Journal&lt;/th&gt;&lt;th&gt;Google rank of BioStor page&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Biological Bulletin&lt;/td&gt;&lt;td&gt;12&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Bulletin of Zoological Nomenclature&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Proceedings of the Entomological Society, Washington&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Proc. Linn. Soc. New South Wales&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Annals of the Missouri Botanical Garden&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Tijdschr. Ent.&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Transactions of The Royal Entomological Society of London&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Ann. Mag. nat. Hist&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Notes from the Leyden Museum&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Proceedings of the United States National Museum&lt;/td&gt;&lt;td&gt;4&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;This suggests that BioStor's content is a least findable.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where next?&lt;/b&gt;&lt;br /&gt;The sense I'm getting from these stats is that BioStor is being used, and it seems to be a reaosnably successful, small-scale project. It would be nice to play with the Google Analytics output a bit more, and also explore usage patterns more closely. For example, I invested some effort in adding the ability to &lt;a href="http://iphylo.blogspot.com/2010/04/biostor-gets-pdfs-with-xmp-metadata.html"&gt;create PDFs for BioStor articles&lt;/a&gt;, but I've no stats on how many PDFs have been downloaded.  Metadata in BioStor is editable, and edits are logged, but I've not explored the extent to which the content is being edited. If a serious effort is going to be made to clean up BHL content using crowd sourcing, I'll need to think of ways to engage users. The wiki experiments were a step in this direction, but I suspect that building a network around this task might prove difficult. Perhaps a better way is to build the network elsewhere, then try to engage it with this task (OCR correction). This was one reason behind my adopting Mendeley's OAuth API to provide a sign in facility for BioStor (see &lt;a href="http://iphylo.blogspot.com/2010/09/mendeley-connect.html"&gt;Mendeley connect&lt;/a&gt;). Again, I've no stats on the extent to which this feature of BioStor has been used. Time to give some serious thought to what else I can learn about how BioStor is being used.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3893488574012619038?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3893488574012619038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3893488574012619038'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/biostor-one-year-on-has-it-been-success.html' title='BioStor one year on: has it been a success?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_Gct8lVAxKqQ/TQ9vo2G0g0I/AAAAAAAAA0c/eoEWSmRlI1s/s72-c/visits.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3146915426980631017</id><published>2010-12-15T11:54:00.001Z</published><updated>2010-12-15T20:20:29.787Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='phylogeny'/><category scheme='http://www.blogger.com/atom/ns#' term='visualisation'/><category scheme='http://www.blogger.com/atom/ns#' term='NCBI'/><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='TreeBASE'/><title type='text'>TreeBASE, again</title><content type='html'>My views on TreeBASE are &lt;a href="http://iphylo.blogspot.com/2006/06/treebase-rocks.html"&gt;pretty&lt;/a&gt; &lt;a href="http://iphylo.blogspot.com/2010/05/treebase-ii-makes-me-pull-my-hair-out.html"&gt;well&lt;/a&gt; &lt;a href="http://iphylo.blogspot.com/2010/07/show-me-trees-playing-with-treebase-api.html"&gt;known&lt;/a&gt;. Lately I've been thinking a lot about how to "fix" TreeBASE, or indeed, move beyond it. I've made a couple of baby steps in this direction.&lt;br /&gt;&lt;br /&gt;The first step is that I've created a &lt;a href="http://www.mendeley.com/groups/734351/treebase/"&gt;group for TreeBASE papers on Mendeley&lt;/a&gt;. I've uploaded all the studies in TreeBASE as of December 13 (2010). Having these in Mendeley makes it easier to tidy up the bibliographic metadata, add missing identifiers (such as DOIs and PubMed ids), and correct citations to non-existent papers (which can occur if at the time the authors uploaded their data the planned to submit their paper to one journal, but it ending up being accepted in another). If you've a Mendeley account, feel free to join the group. If you've contributed to TreeBASE, you should find your papers already there.&lt;br /&gt;&lt;br /&gt;The second step is playing with CouchDB (this years new hotness), exploring ways to build a database of phylogenies that has nothing much to do with either a relational database or a triple store. CouchDB is a document store, and I'm playing with taking &lt;a href="http://www.nexml.org/"&gt;NeXML&lt;/a&gt; files from TreeBASE, converting them to something vaguely usable (i.e., JSON), and adding them to CouchDB. For fun, I'm using my &lt;a href="http://iphylo.blogspot.com/2010/06/linking-ncbi-to-wikipedia.html"&gt;NCBI to Wikipedia mapping&lt;/a&gt; to get images for taxa, so if TreeBASE has mapped a taxon to the NCBI taxonomy, and that taxon has a page in Wikipedia with an image, we get an image for that taxon. The reason for this is I'd really like a phylogeny database that was visually interesting. To give you some examples, here are trees from TreeBASE (displayed using SVG), together with thumbnails of images from Wikipedia:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TQinMUGsUDI/AAAAAAAAAz4/bcBU8Gk62M8/myzo.png?imgmax=800" alt="myzo.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQinUknij-I/AAAAAAAAAz8/JKAz6WWdPQ8/troidini.png?imgmax=800" alt="troidini.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQincpVO72I/AAAAAAAAA0A/Mdi6THsCPeo/protea.png?imgmax=800" alt="protea.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TQinmPNjziI/AAAAAAAAA0E/smppRTTyV1o/Snapshot%202010-12-15%2010-38-02.png?imgmax=800" alt="Snapshot 2010-12-15 10-38-02.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Everything (tree and images) is stored within a single document in CouchDB, making the display pretty trivial to construct. Obviously this isn't a proper interface, and there's things I'd need to do, such as order the images in such a way that they matched the placement of the taxa on the tree, but at a glance you can see what the tree is about. We could then envisage making the images clickable so you could find out more about that taxon (e.g., text from Wikipedia, lists of other trees in the database, etc.).&lt;br /&gt;&lt;br /&gt;We could expand this further by extracting geographical information (say, from the sequences included in the study) and make a map, or eventually a &lt;a href="http://iphylo.blogspot.com/2007/06/earth-not-flat-official.html"&gt;phylogeny on Google Earth&lt;/a&gt;) (see David Kidd's recent "Geophylogenies and the Map of Life" for a manifesto &lt;a href="http://dx.doi.org/10.1093/sysbio/syq043"&gt;doi:10.1093/sysbio/syq043&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;One of the big things missing from databases like TreeBASE is a sense of "fun", or serendipity. It's hard to find stuff, hard to discover new things, make new connections, or put things in context. And that's tragic. Try a Google image search for &lt;a href="http://www.google.com/images?client=safari&amp;rls=en&amp;q=treebase+phylogeny"&gt;treebase+phylogeny&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQirooYS1NI/AAAAAAAAA0M/8jnqXXw2Res/treebasephylogeny.png?imgmax=800" alt="treebasephylogeny.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;Call me crazy, but I looked at that and thought "Wow! This phylogeny stuff is cool!" Wouldn't it be great if that's the reaction people had when they looked at a database of evolutionary trees?&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3146915426980631017?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3146915426980631017'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3146915426980631017'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/treebase-again.html' title='TreeBASE, again'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_Gct8lVAxKqQ/TQinMUGsUDI/AAAAAAAAAz4/bcBU8Gk62M8/s72-c/myzo.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-8285026936685449427</id><published>2010-12-13T12:37:00.001Z</published><updated>2010-12-13T12:38:53.376Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Open Acess'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>How do I know if an article is Open Access?</title><content type='html'>&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TQYTcG6g5KI/AAAAAAAAAzs/X5LcDbnmaIY/open-access-logo.jpg.png?imgmax=800" alt="open-access-logo.jpg.png" border="0" width="128"  style="float:right;" /&gt;&lt;br /&gt;One of my pet projects is to build a &lt;a href="http://iphylo.blogspot.com/2010/08/viewing-scientific-articles-on-ipad.html"&gt;"Universal Article Reader"&lt;/a&gt; for the iPad (or similar mobile device), so that a reader can seemlessly move between articles from different publishers, follow up citations, and get more information on entities mentioned in those articles (e.g., species, molecules, localities, etc.). I've made various toys towards this, the latest being a &lt;a href="http://iphylo.blogspot.com/2010/12/viewing-scientific-articles-on-ipad.html"&gt;HTML5 clone of Nature's iPhone app&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One impediment to this is knowing whether an article is Open Access, and if so, what representations are available (i.e., PDF, HTML, XML). Ideally, the "Universal Article Reader" would be able to look at the web page for an article, determine whether it can extract and redisplay the text (i.e., is the article Open Access) and if so, can it, for example, grab the article in XML and reformat it.&lt;br /&gt;&lt;br /&gt;Some journals are entirely Open Access, so for these journals the first problem (is it Open Access?) is trivial, but a large number of journals have a mixed publishing model, some articles are Open Access, some aren't. One thing publishers could do that would be helpful would be to specify the access status of an article in a consistent manner. Here's a quick survey at how things stand at the moment.&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;Journal&lt;/th&gt;&lt;th&gt;Rights&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;PLoSOne&lt;/td&gt;&lt;td&gt;Embedded RDF, e.g. &amp;lt;license rdf:resource="http://creativecommons.org/licenses/by/2.5/" /&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Nature Communications&lt;/td&gt;&lt;td&gt;&amp;lt;meta name="access" content="Yes" /&amp;gt; for open, &amp;lt;meta name="access" content="No" /&amp;gt; for close &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Systematic Biology&lt;/td&gt;&lt;td&gt;&amp;lt;meta name="citation_access" content="all" /&amp;gt for open, this tag missing if closed&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;BioOne&lt;/td&gt;&lt;td&gt;Nothing for article, Open Access icon next to open access articles in table of contents&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;BMC Evolutionary Biology&lt;/td&gt;&lt;td&gt;&amp;lt;meta name ="dc.rights" content="http://creativecommons.org/licenses/by/2.0/" /&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Philosophical Transactions of the Royal Society&lt;/td&gt;&lt;td&gt;&amp;lt;meta name="citation_access" content="all" /&amp;gt for open access&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Microbial Ecology&lt;/td&gt;&lt;td&gt;No metadata (links and images in HTML)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Human Genomics and Proteomics&lt;/td&gt;&lt;td&gt;&amp;lt;meta name ="dc.rights" content="http://creativecommons.org/licenses/by/2.0/" /&amp;gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;A bit of a mess. Some publishers embed this information in &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; tags (which is good), some (such as PLoS) embed RDF (good, if a little more hassle), some leaves us in the dark, or give vidual clues such as logos (which mean nothing to a computer). In some ways this parallels the variety of ways journals have implemented RSS feeds, which has lead to some explicit &lt;a href="http://oxford.crossref.org/best_practice/rss/"&gt;Recommendations on RSS Feeds for Scholarly Publishers&lt;/a&gt;. Perhaps the time is right to develop equivalent recommendations for article metadata, so that apps to read the scientific literature can correctly determine whether they can display an article or not.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-8285026936685449427?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8285026936685449427'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/8285026936685449427'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/how-do-i-know-if-article-is-open-access.html' title='How do I know if an article is Open Access?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_Gct8lVAxKqQ/TQYTcG6g5KI/AAAAAAAAAzs/X5LcDbnmaIY/s72-c/open-access-logo.jpg.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7779171623148444448</id><published>2010-12-09T12:30:00.001Z</published><updated>2011-11-18T12:52:29.140Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='jQueryMobile'/><category scheme='http://www.blogger.com/atom/ns#' term='javascript'/><category scheme='http://www.blogger.com/atom/ns#' term='iPhone'/><category scheme='http://www.blogger.com/atom/ns#' term='iPad'/><category scheme='http://www.blogger.com/atom/ns#' term='Nature'/><category scheme='http://www.blogger.com/atom/ns#' term='fonts'/><category scheme='http://www.blogger.com/atom/ns#' term='demo'/><category scheme='http://www.blogger.com/atom/ns#' term='Android'/><category scheme='http://www.blogger.com/atom/ns#' term='ePub'/><category scheme='http://www.blogger.com/atom/ns#' term='article 2.0'/><title type='text'>Viewing scientific articles on the iPad: cloning the Nature.com iPhone app using jQuery Mobile</title><content type='html'>Over the last few months I've been exploring different ways to view scientific articles on the iPad, summarised &lt;a href="http://iphylo.blogspot.com/2010/09/viewing-scientific-articles-on-ipad.html"&gt;here&lt;/a&gt;. I've also made a few prototypes, either from scratch (such as my &lt;a href="http://iphylo.blogspot.com/2010/06/plos-doesn-ipad-or-web.html"&gt;response to the PLoS iPad app&lt;/a&gt;) or using &lt;a href="http://www.sencha.com/products/touch/"&gt;Sencha Touch&lt;/a&gt; (see &lt;a href="http://iphylo.blogspot.com/2010/09/touching-citations-on-ipad.html"&gt;Touching citations on the iPad&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Today, it's time for something a little different. The Sencha Touch framework I used earlier is huge and wasn't easy to get my head around. I was resigning myself to trying to get to grips with it when &lt;a href="http://jquerymobile.com"&gt;jQuery Mobile&lt;/a&gt; came along. Still in alpha, jQuery Mobile is very simple and elegant, and writing an app is basically a case of writing HTML (with a little Javascript here and there if needed). It has a few rough edges, but it's possible to create something usable very quickly. And, it's actually fun.&lt;br /&gt;&lt;br /&gt;So, to learn a it more about how to use it, I decided to see if I could write a "clone" of Nature.com's iPhone app (which I &lt;a href="http://iphylo.blogspot.com/2010/08/viewing-scientific-articles-on-ipad.html"&gt;reviewed earlier&lt;/a&gt;). Nature's app is in many ways the most interesting iOS app for articles because it doesn't treat the article as a monolithic PDF, but rather it uses the ePub format. As a result, you can view figures, tables, and references separately.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The clone&lt;/b&gt;&lt;a href="http://iphylo.org/~rpage/ipad/nature/"&gt;You can see the clone here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQDKnlA2n7I/AAAAAAAAAzg/smFsn6PMkzA/photo.PNG?imgmax=800" alt="photo.PNG" border="0" width="200" /&gt;&lt;/td&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TQDK4xYdlKI/AAAAAAAAAzk/C0aeQvSfd8c/photo.PNG?imgmax=800" alt="photo.PNG" border="0" width="200"  /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;I've tried to mimic the basic functionality of the Nature.com app in terms of transitions between pages, display of figures, references, etc. In making this clone I've focussed on just the article display. &lt;br /&gt;&lt;br /&gt;A web app is going to lack the speed and functionality of a native app, but is probably a lot faster to develop. It also works on a wider range of platforms. jQuery Mobile is committed to &lt;a href="http://jquerymobile.com/gbs/"&gt;supporting a wide range of platforms&lt;/a&gt;, so this clone should work on platforms other than the iPad.&lt;br /&gt;&lt;br /&gt;The Nature.com app has a lot of additional functionality apart from just displaying articles, such as list the latest articles from Nature.com journals, manage a user's bookmarks, and enable the user to buy subscriptions. Some of this functionality would be pretty easy to add to this clone, for example by consuming RSS feeds to get article lists. With a little effort one could have a simple, Web-based app to browse Nature content across a range of mobile devices.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Technical stuff&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Nature's app uses the ePub format, but Nature's web site doesn't provide an option to download articles in ePub format. However, if you use a HTTP debugging proxy (such as &lt;a href="http://www.charlesproxy.com/"&gt;Charles Proxy&lt;/a&gt;) when using Nature's app you can see the URLs needed to fetch the ePub file. &lt;br /&gt;&lt;br /&gt;I grabbed a couple of ePub files for articles in &lt;a href="http://www.nature.com/ncomms/index.html"&gt;&lt;i&gt;Nature communications&lt;/i&gt;&lt;/a&gt; and unzipped them (.epub files are zip files). The iPad app is a single HTML file that uses some Ajax calls to populate the different views. One Ajax call takes the &lt;code&gt;index.html&lt;/code&gt; that has the article text and replaces the internal and external links with calls to Javascript functions. An article's references, figure captions, and tables are stored in separate XML files, so I have some simple PHP scripts that read the XML and extract the relevant bits. Internal links (such as to figures and references) are handled by jQuery Mobile. External links are displayed within an iFrame.&lt;br /&gt;&lt;br /&gt;There are some intellectual property issues to address. &lt;i&gt;Nature&lt;/i&gt; isn't an Open Access journal, but some articles in &lt;i&gt;Nature Communications&lt;/i&gt; are (under the &lt;a href="http://creativecommons.org/licenses/by-nc-sa/3.0/"&gt;Commons Attribution-NonCommercial-Share Alike 3.0 Unported License&lt;/a&gt;), so I've used two of these as examples. When it displays an article, Nature's app uses &lt;a href="http://www.droidfonts.com/"&gt;Droid fonts&lt;/a&gt; for the article heading. These fonts are supplied as an SVG file contained within the ePub file. Droid fonts are available under an Apache License as TrueType fonts as part of the Android SDK. I couldn't find SVG versions of the fonts in the Android SDK, so I use the TrueType fonts (see Jeffrey Zeldman's &lt;a href="http://www.zeldman.com/2010/11/26/web-type-news-iphone-and-ipad-now-support-truetype-font-embedding-this-is-huge/"&gt;Web type news: iPhone and iPad now support TrueType font embedding. This is huge.&lt;/a&gt;). Oh, and I "borrowed" some of the CSS from the &lt;code&gt;style.css&lt;/code&gt; file that comes with each ePub file.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-7779171623148444448?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7779171623148444448'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/7779171623148444448'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/viewing-scientific-articles-on-ipad.html' title='Viewing scientific articles on the iPad: cloning the Nature.com iPhone app using jQuery Mobile'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_Gct8lVAxKqQ/TQDKnlA2n7I/AAAAAAAAAzg/smFsn6PMkzA/s72-c/photo.PNG?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-3897548025225067888</id><published>2010-12-08T06:24:00.001Z</published><updated>2010-12-08T06:24:13.465Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BIOONE'/><category scheme='http://www.blogger.com/atom/ns#' term='Zotero'/><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='CiteBank'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='JSTOR'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL-Europe'/><category scheme='http://www.blogger.com/atom/ns#' term='Drupal'/><title type='text'>First thoughts on CiteBank and BHL-Europe</title><content type='html'>This week saw the release of two tools from the Biodiversity Heritage Library, &lt;a href="http://citebank.org/"&gt;CiteBank&lt;/a&gt; and the &lt;a href="http://prototype.bhle.eu/"&gt;BHL-Europe portal&lt;/a&gt;. Both have actually been quietly around for a while, but were only publicly announced last week.&lt;br /&gt;&lt;br /&gt;In developing a new tool there are several questions to ask. Does something already exist that meets my needs? If it doesn't exist, can I build it using an existing framework, or do I need to start from scratch? As a developer it's awfully tempting sometimes to build something from scratch (I'm certainly guilty of this). Sometimes a more sensible approach is to build on something that already exists, particularly if what you are building upon is well supported. This is one of the attractions of &lt;a href="http://drupal.org"&gt;Drupal&lt;/a&gt;, which underlies CiteBank and Scratchpads. In my own work I've used Semantic Mediawiki to support editable, versioned databases, rather than roll my own. Perhaps the more difficult question for a developer is whether they need to build anything at all. What if there are tools already out there that, if not &lt;i&gt;exacty&lt;/i&gt; what you want, are close enough (or most likely will be by the time you finish your own tool).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;CiteBank&lt;/b&gt;&lt;br /&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TP8UnImUYbI/AAAAAAAAAzQ/yVscmVVG7Gc/bhlsquare_reasonably_small.png?imgmax=800" alt="bhlsquare_reasonably_small.png" border="0" width="128" height="128" align="right" /&gt;&lt;br /&gt;&lt;blockquote&gt;CiteBank is an open access platform to aggregate citations for biodiversity publications and deliver access to biodiversity related articles. CiteBank aggregates links to content from digital libraries, publishers, and other bibliographic systems in order to provide a single point of access to the world’s biodiversity literature, including content created by its community of users. CiteBank is a project of the Biodiversity Heritage Library (BHL).&lt;/blockquote&gt;&lt;br /&gt;I have two reactions to CiteBank. Firstly, Drupal's bibliographic tools really suck, and secondly, why do we need this? As I've argued earlier (see &lt;a href="http://iphylo.blogspot.com/2010/10/mendeley-bhl-and-of-life.html"&gt;Mendeley, BHL, and the "Bibliography of Life"&lt;/a&gt;), I can't see the rationale for having CiteBank separate from an existing bibliographic database such as &lt;a href="http://www.mendeley.com"&gt;Mendeley&lt;/a&gt; or &lt;a href=""http://www.zotero.org"&gt;Zotero&lt;/a&gt;. These tools are more mature, better supported, and address user needs beyond simply building lists of papers (e.g., citing papers when writing manuscripts).&lt;br /&gt;&lt;br /&gt;For me, one of BHL's goals should be integrating the literature they have scanned into mainstream scientific literature, which means finding articles, assigning DOIs, and becoming in effect a digital publishing platform (like &lt;a href="http://www.bioone.org"&gt;BioOne&lt;/a&gt; or &lt;a href="http://www.jstor.org"&gt;JSTOR&lt;/a&gt;). Getting to this point will require managing and cleaning metadata for many thousands of articles and books. It seems to me that you want to gather this metadata from as many sources as possible, and expose it to as many eyes (and algorithms) as possible to help tidy it up. I think this is a clear case of it being better to use an existing tool (such as Mendeley), rather than build a new one. If a good fraction of the world's taxonomists shared their person bibliographies on Mendeley we'd pretty much have the world's taxonomic literature in one place, without really trying.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;BHL-Europe&lt;/b&gt;&lt;br /&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TP8O5qEeyCI/AAAAAAAAAzI/kbRRwiRdbKg/logo.jpg?imgmax=800" alt="logo.jpg" border="0" width="200" align="right" /&gt;&lt;br /&gt;It's early days for BHL-Europe, and they've taken the "lets use an existing framework" approach, basing the BHL-Europe portal on &lt;a href="http://www.dismarc.org/"&gt;DISMARC&lt;/a&gt;, the later being a EU-funded project to "encourage and support the interoperability of music related data".&lt;br /&gt;&lt;br /&gt;BHL-Europe is the kind of web site only its developers could love. It's spectacularly ugly, and a classic example of what digital libraries came up with while Google was quietly eating their lunch. Here's the web site showing search results for "Zonosaurus":&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TP8ZVEnxFcI/AAAAAAAAAzY/nHZ5zCLU0Pc/bhleu.png?imgmax=800" alt="bhleu.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Yuck! &lt;a href="http://www.slideshare.net/andraz/semtech2010-do-semanticwebuserinterfaceshavetobeugly"&gt;Why do these things have to be so ugly?&lt;/a&gt;. DISMARC was designed to store metadata about digital objects, specifically music. Look at commercial music interfaces such as iTunes, &lt;a href="http://spotfy.com"&gt;Spotify&lt;/a&gt;, and &lt;a href="http://last.fm"&gt;Last.fm&lt;/a&gt;. Or even academic projects such as &lt;a href="http://research.mspace.fm/"&gt;mSpace&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;To be useful BHL-Europe really needs to provide an interface that reflects what its users care about, for example taxonomic names, classification, and geography. It can't treat scientific literature as a bunch of lifeless metadata objects (but then again, DISMARC managed to do this for music). &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where next?&lt;/b&gt;&lt;br /&gt;CiteBank and BHL-Europe seem further additions to the worthy but ultimately deeply unsatisfying attempts to improve access biodiversity literature. To date our field has failed to get to grips with aggregating metadata (outside of the library setting), creating social networks around that aggregation, and providing intuitive interfaces that enable users to search and browse productively. These are big challenges. I'd like to see the resources that we have put to better use, rather than being used to build tools where suitable alternatives already exist (CiteBank), or used to shoe horn data into generic tools that are unspeakably ugly (BHL-Europe portal) and not fit for purpose. Let's not reinvent the wheel, and let's not try and convince ourselves that squares make perfectly good wheels.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-3897548025225067888?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3897548025225067888'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/3897548025225067888'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/first-thoughts-on-citebank-and-bhl.html' title='First thoughts on CiteBank and BHL-Europe'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_Gct8lVAxKqQ/TP8UnImUYbI/AAAAAAAAAzQ/yVscmVVG7Gc/s72-c/bhlsquare_reasonably_small.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-2138689566783818707</id><published>2010-12-02T09:46:00.001Z</published><updated>2010-12-02T09:46:35.640Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='Atlas of Living Australia'/><category scheme='http://www.blogger.com/atom/ns#' term='BHL'/><category scheme='http://www.blogger.com/atom/ns#' term='Australian Faunal Directory'/><category scheme='http://www.blogger.com/atom/ns#' term='CouchDB'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenURL'/><title type='text'>Linking taxonomic databases to the primary literature: BHL and the Australian Faunal Directory</title><content type='html'>Continuing my hobby horse of linking taxonomic databases to digitised literature, I've been working for the last couple of weeks on linking names in the &lt;a href="http://www.environment.gov.au/biodiversity/abrs/online-resources/fauna/afd/home"&gt;Australian Faunal Directory&lt;/a&gt; (AFD) to articles in the &lt;a href="http://www.biodiversitylibrary.org/"&gt;Biodiversity Heritage Library (BHL)&lt;/a&gt;. AFD is a list of all animals known to occur in Australia, and it provides much of the data for the recently released &lt;a href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=1&amp;ved=0CCAQFjAA&amp;url=http%3A%2F%2Fwww.ala.org.au%2F&amp;ei=kmL3TMvWKsuyhAez-5XHDw&amp;usg=AFQjCNGFnaG4a9SkAfcrqO97s2ykCF0aIg"&gt;Atlas of Living Australia&lt;/a&gt;. The data is available as &lt;a href="http://www.environment.gov.au/biodiversity/abrs/online-resources/fauna/index.html"&gt;series of CSV files&lt;/a&gt;, and these contain quite detailed bibliographic references. My initial interest was in using these to populate &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt; with articles, but it seemed worthwhile to try and link the names and articles together. The Atlas of Living Australia links to BHL, but only via a name search showing BHL items that have a name string. This wastes valuable information. AFD has citations to individual books and articles that relate to the taxonomy of Australian animals — we should treat that as first class data.&lt;br /&gt;&lt;br /&gt;So, I cobbled together the CSV files, some scripts to extract references, ran them through the BioStor and bioGUID OpenURL resolvers, and dumped the whole thing in a CouchDB database. You can see the results at &lt;a href="http://iphylo.org/~rpage/afd/"&gt;Australian Faunal Directory on CouchDB&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TPdovmPBVbI/AAAAAAAAAzA/Qh2sC6R8NQA/afd.png?imgmax=800" alt="afd.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The site is modelled on my earlier experiment with putting the &lt;a href="http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-2010.html"&gt;Catalogue of Life on CouchDB&lt;/a&gt;. It's still rather crude, and there's a lot of stuff I need to work on, but it should illustrate the basic idea. You can browse the taxonomic hierarchy, view alternative names for each taxon, and see a list of publications related to those names. If a publication has been found in BioStor then the site displays a thumbnail of the first page, and if you click on the reference you see a simple article viewer I wrote in Javascript.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh3.ggpht.com/_Gct8lVAxKqQ/TPdmqPxf8wI/AAAAAAAAAy0/tpPYPP73sNo/v1.png?imgmax=800" alt="v1.png" border="0" width="400"  /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;For PDFs I'm experimenting with using Google's PDF viewer (the inspiration for the viewer above):&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/_Gct8lVAxKqQ/TPdnFq8qFBI/AAAAAAAAAy4/gpX4UJjRUVA/v2.png?imgmax=800" alt="v2.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How it was made&lt;/b&gt;&lt;br /&gt;Although in principle linking AFD to BHL via BioStor was fairly straight forward, these are lots of little wrinkles, such as errors in bibliographic metadata, and failure to parse some reference strings. To help address this I created a &lt;a href="http://www.mendeley.com/groups/679491/australian-faunal-directory/"&gt;public group on Mendeley&lt;/a&gt; where all the references I've extracted are stored. This makes it easy to correct errors, add identifiers such as DOIs and ISSNs, and upload PDFs. For each article a reference to the original record in AFD is maintained by storing the AFD identifier (a &lt;a href="http://en.wikipedia.org/wiki/Uuid"&gt;UUID&lt;/a&gt;) as a keyword.&lt;br /&gt;&lt;br /&gt;The taxonomy and the mapping to literature is stored in a &lt;a href="http://couchdb.apache.org/"&gt;CouchDB&lt;/a&gt; database, which makes a lot of things (such as uploading new versions of documents) a breeze.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;It's about the links&lt;/b&gt;&lt;br /&gt;The underlying motivation is that we are awash in biodiversity data and digitisation projects, but these are rarely linked together. And it's more than just linking, it's bring the data together so that we can compute over it. That's when things will start to get interesting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16081779-2138689566783818707?l=iphylo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2138689566783818707'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16081779/posts/default/2138689566783818707'/><link rel='alternate' type='text/html' href='http://iphylo.blogspot.com/2010/12/linking-taxonomic-databases-to-primary.html' title='Linking taxonomic databases to the primary literature: BHL and the Australian Faunal Directory'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_Gct8lVAxKqQ/TPdovmPBVbI/AAAAAAAAAzA/Qh2sC6R8NQA/s72-c/afd.png?imgmax=800' height='72' width='72'/></entry><entry><id>tag:blogger.com,1999:blog-16081779.post-7489249467015388833</id><published>2010-11-10T11:09:00.001Z</published><updated>2010-11-10T11:13:52.902Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='BioStor'/><category scheme='http://www.blogger.com/atom/ns#' term='duplicates'/><category scheme='http://www.blogger.com/atom/ns#' term='data cleaning'/><category scheme='http://www.blogger.com/atom/ns#' term='Mendeley'/><category scheme='http://www.blogger.com/atom/ns#' term='Challenge'/><title type='text'>Mendeley mangles my references: phantom documents and the problem of duplicate references</title><content type='html'>One issue I'm running into with Mendeley is that it can create spurious documents, mangling my references in the process. This appears to be due to some over-zealous attempts to de-duplicate documents. Duplicate documents is the &lt;a href="http://feedback.mendeley.com/forums/4941-mendeley-feedback?lang=en"&gt;number one problem&lt;/a&gt; faced by Mendeley, and has been discussed in some detail by Duncan Hull in his post  &lt;a href="http://duncan.hull.name/2010/09/01/mendeley/"&gt;How many unique papers are there in Mendeley?&lt;/a&gt;. Duncan focussed on the case where the same article may appear multiple times in Mendeley's database, which will inflate estimates of how many distinct references the database contains. It also has implications for metrics derived from the Mendeley, such as those displayed by &lt;a href="http://readermeter.org"&gt;ReaderMeter&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In this post I discuss the reverse problem, combining two or more distinct references into one. I've been uploading large collections of references based on harvesting metadata for journal articles. Although the metadata isn't perfect, it's usually pretty good, and in many cases linked to Open Access content in &lt;a href="http://biostor.org"&gt;BioStor&lt;/a&gt;. References that I upload appear in public groups listed on my profile, such as the group &lt;a href="http://www.mendeley.com/groups/637571/proceedings-of-the-entomological-society-of-washington/"&gt;Proceedings of the Entomological Society of Washington&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Reverse engineering Mendeley&lt;/b&gt;&lt;br /&gt;In the absence of a good description by Mendeley of how their tools work, we have to try and figure it out ourselves. If you click on a refernece that has been recently added to Mendeley you get a URL that looks like this: &lt;a href="http://www.mendeley.com/c/3708087012/g/584201/magalhaes-2008-a-new-species-of-kingsleya-from-the-yanomami-indians-area-in-the-upper-rio-orinoco-venezuela-crustacea-decapoda-brachyura-pseudothelphusidae/"&gt;http://www.mendeley.com/c/3708087012/g/584201/magalhaes-2008-a-new-species-of-kingsleya-from-the-yanomami-indians-area-in-the-upper-rio-orinoco-venezuela-crustacea-decapoda-brachyura-pseudothelphusidae/&lt;/a&gt; where &lt;b&gt;584201&lt;/b&gt; is the group id, &lt;b&gt;3708087012&lt;/b&gt; is the "remoteId" of the document (this is what it's called in the SQLite database that underlies the desktop client), and the rest of the URL is the article title, minus &lt;a href="http://en.wikipedia.org/wiki/Stop_words"&gt;stop words&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;After a while (perhaps a day or so) Mendeley gets around to trying to merge the references I've added with those it already knows about, and the URLs lose the group and remoteId and look like this: &lt;a href="http://www.mendeley.com/research/review-genus-saemundssonia-timmerman-phthiraptera-philopteridae-alcidae-aves-charadriiformes-including-new-species-new-host/"&gt;http://www.mendeley.com/research/review-genus-saemundssonia-timmerman-phthiraptera-philopteridae-alcidae-aves-charadriiformes-including-new-species-new-host/&lt;/a&gt; . Let's call this document the "canonical document"  (this document also has a UUID, which is what the Mendeley API uses to retrieve the document). Once the document gets one of these URLs Mendeley will also display how many people are "reading" that document, and whether anyone has tagged it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;But that's not my paper!&lt;/b&gt;&lt;br /&gt;The problem is that sometimes (and more often than I'd like) the canonical document bears little relation to the document I uploaded. For example, here is a paper that I uploaded to the group &lt;a href="http://www.mendeley.com/groups/637571/proceedings-of-the-entomological-society-of-washington/"&gt;Proceedings of the Entomological Society of Washington&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;img src="http://lh5.ggpht.com/_Gct8lVAxKqQ/TNpxD7JjI0I/AAAAAAAAAyg/6gsmdUbn4uk/16212462.gif?imgmax=800" alt="16212462.gif" border="0" width="100" height="154" align="left" /&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;b&gt;Review of the genus Saemundssonia Timmermann (Phthiraptera: Philopteridae) from the Alcidae (Aves: Charadriiformes), including a new species and new host records&lt;/b&gt; by Roger D Price, Ricardo L Palma, Dale H Clayton, &lt;i&gt;Proceedings of the Entomological Society of Washington&lt;/i&gt;, 105(4):915-924 (2003).&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;You can see the actual paper in BioStor: &lt;a href="http://biostor.org/reference/57185"&gt;http://biostor.org/reference/57185&lt;/a&gt;. To see the paper in the Mendeley group, browse it using the tag  &lt;a href="http://www.mendeley.com/groups/637571/proceedings-of-the-entomological-society-of-washington/papers/title/0/tag/Phthiraptera/"&gt;Phthiraptera&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh6.ggpht.com/_Gct8lVAxKqQ/TNpv9V6ioAI/AAAAAAAAAyY/xBLU_3UC9P4/group.png?imgmax=800" alt="group.png" border="0" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Note the &lt;b&gt;2&lt;/b&gt;, indicating that two people (including myself) have this paper in their library. The URL for this paper is &lt;a href="http://www.mendeley.com/research/review-genus-saemundssonia-timmerman-phthiraptera-philopteridae-alcidae-aves-charadriiformes-including-new-species-new-host/"&gt;http://www.mendeley.com/research/review-genus-saemundssonia-timmerman-phthiraptera-philopteridae-alcidae-aves-charadr
