Thursday, April 16, 2009

Short URLs

Short URLs have been a topic of discussion recently, perhaps sparked by the article URL Shorteners: Which Shortening Service Should You Use?. Many will have encountered short URLs in Twitter tweets. Leigh Dodds (@ldodds) asked
Remind me: why do we need short urls at all, rather than a better solution? Removing arbitrary limits (or better impl. thereof) seems better
I guess Leigh's talking about the need for short URLs in tweets, but I wonder about the more general question of why we need URL shorteners at all. Reading the Guardian (physical copy in a coffee shop) I keep coming across URLs in the text, such as bit.ly/seth52, that is short, no "http://" prefix, and labelled in a human readable form (the short URL's in Seth Finkelstein's column are all of the form bit.ly/seth[n]).

It occurs to me that these URLs are almost like tags, the names have locally significant meaning, and are memorable. In a sense the URL shortening service acts as a new namespace. Imagine if you can't get a desired domain name, but can get a customised URL with that name. The tyranny of the DNS as the sole naming authority is weakened a little. In some ways this mirrors how many people use the web. Instead of typing in full domain names, they enter a search term into Google and go to the site they want (often the top hit). Imagine if Google provided a URL shortening service (in a sense their search engine is a slightly clunky one already).

The other reasons I'm interested in this is because of ugly identifiers such as urn:lsid:zoobank.org:act:6FFAFC2C-D46B-4959-BA03-C38477B9DFF1. This version, bit.ly/polina is a bit nicer. Plus, I get usage statistics on the short version (meaning I don't need to implement this myself). If we use the Guardian as an example, perhaps journal publishers using LSIDs such as urn:lsid:zoobank.org:act:6FFAFC2C-D46B-4959-BA03-C38477B9DFF1 would prefer to use custom, shortened URLs to make the text more readable, and collect usage statistics as well.

2 comments:

Tony Hammond said...

Hi Rod:

So, how do you know those "bit.ly" strings are indeed URLs. I guess generally context is everything. But specifically is it because they contain a "/" characeter?

Properly URIs need a scheme component, although conventionally a "www." prefix on a DNS name has marked a string as being a webserver (or webserver document). I'm intrigued by knowing exactly what sets apart some strings as URLs and others not. (Btw, the bit.ly strings in Seth's columnn that you linked to are themselves not linked. Whatever that means.)

Cheers,

Tony

Roderic Page said...

I guess the "/" is the give away. It's interesting that the Guardian dispensed with "http://" from the printed version, they assume the reader is savvy enough to know that it's a HTTP URI. If we assume all identifiers are HTTP URIs, then "http://" becomes redundant. In the same way that nobody writes "mailto:".

I too noticed that the "bit.ly" strings aren't links. Odd...