URI Usability

URI Thoughts

Ever since I read Matthew Thomas’ outline of an ultimate weblogging system I’ve been thinking about URIs.

It is useful to be able to visit bbc.co.uk/football or bbc.co.uk/cricket and be redirected to the appropriate section in the BBC site (currently http://news.bbc.co.uk/sport1/hi/football/default.stm in the case of football).

Even more useful is the new URI scheme in place at ebay. my.ebay.co.uk, search.ebay.co.uk/ipod and search-completed.ebay.co.uk/microwave all do what you would expect them to (take you to “my ebay”, search current auctions for iPods and search completed auctions for microwaves) both saving time and increasing clarity and usability.

The question then is, what is the best URI scheme?

Of course, this depends on the site. One thing that I think is undisputed across sites is that URLs should not include extensions that give away technology choices. There is no advantage to the URL:

http://example.com/products/default.asp

over:

http://example.com/products/

and the former makes migrating away from ASP (even to ASP.NET) more problematic than it needs to be.

Another no-brainer is that URIs should be short. Short URIs are easier to remember and URIs over 78 characters long will wrap in some emails.

Jakob Nielsen (in
URL as UI) found that mixed case and non-alphanumerics confused users. So let’s ditch those too. That means that a query string (?x=1&y=8394) makes a URI harder to read and less usable. Links with query strings (or sometimes only complex query strings) are also ignored by some search engines. There is a good discussion of this in Toward’s Next Generation URLs.

Without producing a static version of a site every time you make a change (which may actually be a workable solution in some cases) you can use URI rewriting to let your users have usable URIs but give the webserver the query string it needs to serve up the right content. This can be done with ISAPI Rewrite on IIS (Lite version is free, if not Free) and mod_rewrite on Apache.

With URI rewriting it does not matter what URI the underlying technology needs you just need to decide on the appropriate scheme and write clever enough regular expressions to implement it.

It would be possible to run a site where every URI is of the form:

http://example.com/7
http://example.com/23
http://example.com/274383

but that fails the user in that it provides no information about the page and prevents user from using urls to navigate the site. It seems that some “directory structure” is required (although it may not reflect the actual directory structure of the site) and then a name for the page of content should be appended to that.

Cool URIs don’t change suggests dates as an effective
structure and suggests that categories are too changeable over time to work.

An Example

The posts on this site currently have URIs in this format:

http://bluebones.net/news/default.asp?action=view_story&story_id=97

There we have underscores, a question mark and an ampersand plus the URI tells you very little about what you might expect if you clicked on it. Ripe for improvement. Here are the possible schemes I have considered:

A

bluebones.net/posts/96

Very short but not too informative.

B

bluebones.net/posts/2001/06/14/96

Not so short and still fairly oblique.

C

bluebones.net/posts/2001/06/14/internetboggle
bluebones.net/posts/2005/03/13/awstatsoniis5witholdlogfiles

I like this but the longer titles make the URIs overlong.

D

bluebones.net/posts/internetboggle
bluebones.net/posts/wherewizardsstayuplatetheorigins 
oftheinternetbykatiehafnerandmatthewlyon

Even with the date structure removed some posts titles are too long to be included in the URI. And what to do about posts with the same title?

E

bluebones.net/posts/internetboggle
bluebones.net/posts/wherewizardsstayupla

Truncated to 20 characters is OK but you can imagine some horrorshow URI caused by truncating at exactly the wrong point and the content is not so clear. The issue of posts with the same title is even more relevant here.

F

bluebones.net/posts/internetboggle
bluebones.net/posts/wherewizardsstayuplate

This is the scheme I am planning on implementing. The need to cope with duplicates in any scheme that uses strings and not integer ids means that some work will have to be done (by the server) upon posting that is not being done now. Given that I am going to have to write some code anyway, why not just add an extra field when posting (which could be autopopulated with a default of the title with all spaces and punctuation removed) that is called “link” or similar and appears after /posts/ as above. Of course duplicates will have to be checked for and either reported to the user or otherwise dealt with here, too.

What do you think? Help me out before I make irreversible decisons by commenting below.

2 Replies to “URI Usability”

  1. It’s a tricky one. You see, whilst your proposed structure of
    http://bluebones.net/posts/posttitle is nice and short, and (as long as
    you keep your posttitles clear) leads to memorable urls, it does depend
    how you wish to structure your site.

    For instance: all I know from the URL is that posttitle is a post. If
    you use a date/archive structure eg
    http://bluebones.net/posts/2005/01/posttitle, i know it’s quite up to
    date – and if I’m finding you in Google, would be more likely to visit
    that url than, say, posts/2003/10/posttitle because that may not be so
    up-to-date.

    Also, your URL works like a breadcrumb trail:
    http://bluebones.net/2005/01/posttitle would be the post, but go up one
    level for january’s posts, and up another for a yearly index; it makes
    the post’s position within your site’s structure more obvious.

    I think memorable URLs are a red herring. Sure, top and second level
    URLs should be memorable, eg http://bluebones.net ,
    http://bluebones.net/about , but most people looking for a single post
    – ie a permalink – would either be clicking on it, pasting it into
    another form of content, or bookmarking it.

    Finally, there’s this little thing: with
    http://bluebones.net/posts/posttitle, you can NEVER use “posttitle”
    again. Each title is unique. Now, you may say _now_ that that’s not a
    problem, but you’ve mentioned the permanence of urls over time – and so
    in three years time you might accidentally use the same post title
    again, and you’d have to fiddle in your CMS to fix the problem.
    2005/01/posttitle is clearly a different post to 2006/03/posttitle,
    even if they have the same post titles. Does that make sense?

    I don’t think titles make post urls over long, in example C. I think
    there’s mileage in seperating the words in the post title with hyphens
    (as WordPress does by default) or with underscores, as I did in Movable
    Type; http://bluebones.net/2005/01/averylongposttitle LOOKS longer than
    http://bluebones.net/2005/01/a-very-long-post-title , simply because
    the sense of the words is confused.

    And no decision is irreversible. If you change your structure, simply
    make sure your .htaccess redirects old-structured-requests to the new
    structure, possibly with a 301 code (for “entity has moved”).

    My 2p, anyhow; you’re thinking along the right lines but I certainly
    thing your concepts of “overlong URLs” and “memorable URLs” are
    blinding you to more useful features that a longer structure would give
    you.

  2. I used Tom’s system above in the end. All his arguments held water and my final check (what does daringfireball.net do) agreed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.