Friday, January 14, 2011

The demise of and the perils of using Internet domain names as identifiers

When otherwise sensible technorati refer to "owning" a domain name, it makes me want to stick forks in my eyeballs. We do not "own" domain names. At best, we only lease them and there are manifold ways in which we could lose control of a domain name - through litigation, through forgetfulness, through poverty, through voluntary transfer, etc. Once you don't control a domain name anymore, then you can't control your domain-name-based persistent identifiers either. - Geoffrey Bilder interviewed by Martin Fenner
Geoffery Bilder's comments about the unsuitability of URLs as long term identifiers (as opposed, say, to DOIs) came to mind when I discovered that the domain is up for sale:

Snapshot 2011-01-14 07-47-39.png

This domain used to be home to a wealth of resources on lice (order Phthiraptera). I discovered that ownership of the domain had expired when a bunch of links to PDFs returned by an iSpecies search for Collodennyus all bounced to the holding page above. was owned by the late Bob Dalgleish. After his death, ownership of the domain lapsed, and it's now up for sale. Although much of the content of has been moved to, URLs containing still turn up in search results, especially ones that have been cached (for example, in iSpecies). Given that much of the content is still available the loss isn't total, but anyone relying on links containing to point to content (such as a PDF), or to identify that content (such as a publication) will find themselves in trouble. Although ideally Cool URIs don't change, in practice they do, and with alarming frequency. Furthermore, in this case, because ownership of has lapsed, there's no opportunity to create redirects from URLs with to the equivalent content in (leaving aside the issue that is not a mirror of, so exactly what the redirects would point to is unclear).

Identifiers based on domain names, such as URLs and LSIDs are attractive because the DNS helps ensure global uniqueness, and HTTP provides a way to resolve the identifier, but all this is contingent on the domain itself persisting. For more on this topic I recommend reading Martin Fenner's interview of CrossRef's Geoffrey Bilder, from which I took the opening quote.