Ooooooooooooo

o.com,
oo.com,
ooo.com,
oooo.com,
ooooo.com,
oooooo.com,
oooooo.com,
oooooooo.com,
oooooooo.com,
ooooooooo.com,
oooooooooo.com,
ooooooooooo.com,
oooooooooooo.com,
ooooooooooooo.com,
and oooooooooooooo.com
are all registered domain names (although those with 12, 13 and 14 Os have expired).

I am also amused by the fact that
worldcup2006.com,
worldcup2010.com,
worldcup2014.com,
worldcup2018.com,
worldcup2022.com,
worldcup2026.com,
worldcup2030.com,
worldcup2034.com,
worldcup2038.com,
worldcup2042.com,
worldcup2046.com,
worldcup2050.com,
worldcup2054.com,
worldcup2058.com,
worldcup2062.com,
worldcup2066.com and
worldcup2070.com
are all registered too. Quick, grab worldcup 2074.com before it goes!

Lomography or Digital Photography?

The new Lomo ambassadors for London had a meeting on Tuesday night upstairs in the Griffin pub on Leonard Street. It was terribly oversubscribed with people sitting on the floor and standing on the stairs. They announced some kind of project the constitution of which was left very vague. It might have been a semipermanent Lomo wall somewhere in London? They also showed the BBC documentary about lomography.

I had a different perspective watching the documentary in 2005 instead of 2001 (when it was shown on BBC Four). Many of the virtues of the Lomo Kompact have been superseded by cheap, high-quality digital cameras. Digital cameras are smaller (“take your camera with you wherever you go”), cheaper to take lots of photos with (“don’t think just shoot” — no film to buy, cheaper processing if you want photos), better for shooting from the hip (“try the shot from the hip” — because of the viewscreen on the back). And while the lomohomes are nice I’d also say that flickr is a far bigger, better, more usable version of the same thing.

The Lomo still scores on simplicity and build quality (far less worrying to drop a Kompact than a digicam). And it still has the tunnel lens with colour effects (though I seem to get less of this in my photos than others do). But the crazy, spontaneous, shooting-from-the-hip feel of lomography is surely in the province of the digital camera now?

The Diary of a Nobody

Yesterday, inspired by Pepys Diary I thought I’d set up something else from Project Gutenburg in a similar format. I initially thought of the Diary of Anne Frank but unbelievably that is still in copyright. Then I checked out The Diary
of a Nobody
and by strange coincidence it begins on April 3rd.

One late-night coding session later and I give you thediaryofanobody.com. As it is fiction, you are unlikely to want to "dip in" so I set it up so that you can start your RSS feed on any date you like and get the diary over the next 18 months or so.

Blogs in Action

Attended the Blogs in Action seminar organized by Six Apart and sponsored by Nokia Lifeblog. Highlight was most definitely John Dale talking about the very ambitious and very successful project to make blogs available to all Warwick University staff and students at Warwick blogs.

Perhaps the very best thing about the Warwick blogs was the “slice n dice” aspect. You can elect to get feeds of people on your course, people in your hall of residence, people with the same interests as you, individuals you pick. You can publish to everyone in the world, everyone at Warwick, a group of three people. They seemed to have thought of so many useful options. Plus all the design work seemed to be very user-focused. I really wish that this had been available to me as an undergraduate or (especially) now as a part-time postgraduate (where actual physical attendance is sporadic).

Tom Coates spoke about his blogging experiences. Another in the line of those who have been doing it fairly intensely for about five years and have gone through the cycle of loving it, feeling their privacy invaded, getting bored, and so on. I’m sure there’s a cycle that is repeated across bloggers that is just as predictable as the shock, denial, anger, depression, acceptance cycle of the grieving process. I liked it when he called Dave Winer the “Arch Demon of Webloggery”. His core point was that a weblog is a representation of a person (almost like a suit you wear) and that anything that went away from that (group blogs, blogs about one topic only) would have to have something (like money) to propel them along or they would only be short-lived.

Neil McIntosh of Guardian Unlimited talked about how the Guardian’s blogging was started on expenses on a credit card because of the difficulty of getting it through IT. Their blogs are closely watched for offensive and libellous comments but the default response is hands off. He explained how blogging enabled the newspaper to get valuable feedback and sometimes correct mistakes before they went to print. He went some way towards refuting his own quote, “mainstream media trying to do blogs is like watching a vicar disco dance”. What he didn’t do was explain how the Guardian could make money off blogging. He sees the work as experimentation in a new form of journalism rather than having a responsibility to produce revenue or even promote the Guardian brand.

Dominique Busso, CEO of VNUNet Europe talked about their “corporate blogs” — approximately one for each print magazine that VNU produce. He quoted Dan Gilmore: “my readers know more” and said that the blogs for the print journals stopped the print journalists having web envy of their online colleagues as they had done during the bubble.

Charlie Schick from Nokia Lifeblog tried to convince us that blogging from your phone is a good complement to blogging from your computer. He’s never seen my phone, then. One interesting point he made is that with cameras in phones geting better and going up to VGA quality and beyond, posting from your mobile is prohibitively slow and that 3G won’t fix this because it is still slow upstream, just fast(er) downstream.

Far more complete notes were made on this event by Suw Charman.

Mind Hacks at Foyles

Tom Stafford and Matt Webb were at Foyles on Charing Cross Road (London) on Wednesday night to publicize their book Mind Hacks (O’Reilly). They stepped through a few practical examples of the stuff from the book — why faded jeans make your legs look good; how eyes and the brain adapt to light and noise levels; why putting a pen in your mouth and pushing it back for three minutes makes you feel good — fairly successfully and made some jokes about leopards.

The Data Area Passed to the System Call is Too Small

I got the error, “the data area passed to the system call is too small” posting to this site. The problem was using HTTP GET for very long strings. Using HTTP POST with the exact same text works fine. Posting from Mozilla 1.0 to IIS 5. Absolutely nothing on the web explaining the error and only three references in Google so I thought I’d post my “answer”. More information on the differences between GET and POST.

Eclipse Site Lacks Spark

Eclipse just won the Jolt 2005 award in the “Languages and Development Environments” category.

But eclipse.org doesn’t mention it. The first link on the site in the main body is to a ‘white paper’ in PDF format last updated in 2003. The site uses frames. The FAQ was last updated in 2002. There are no screen shots linked anywhere off the home page despite the fact that there are 96 links in the main frame.

Compare and contrast basecamp which for all its qualities surely has less to shout about than Eclipse.

What am I missing here?

URI Usability

URI Thoughts

Ever since I read Matthew Thomas’ outline of an ultimate weblogging system I’ve been thinking about URIs.

It is useful to be able to visit bbc.co.uk/football or bbc.co.uk/cricket and be redirected to the appropriate section in the BBC site (currently http://news.bbc.co.uk/sport1/hi/football/default.stm in the case of football).

Even more useful is the new URI scheme in place at ebay. my.ebay.co.uk, search.ebay.co.uk/ipod and search-completed.ebay.co.uk/microwave all do what you would expect them to (take you to “my ebay”, search current auctions for iPods and search completed auctions for microwaves) both saving time and increasing clarity and usability.

The question then is, what is the best URI scheme?

Of course, this depends on the site. One thing that I think is undisputed across sites is that URLs should not include extensions that give away technology choices. There is no advantage to the URL:

http://example.com/products/default.asp

over:

http://example.com/products/

and the former makes migrating away from ASP (even to ASP.NET) more problematic than it needs to be.

Another no-brainer is that URIs should be short. Short URIs are easier to remember and URIs over 78 characters long will wrap in some emails.

Jakob Nielsen (in
URL as UI) found that mixed case and non-alphanumerics confused users. So let’s ditch those too. That means that a query string (?x=1&y=8394) makes a URI harder to read and less usable. Links with query strings (or sometimes only complex query strings) are also ignored by some search engines. There is a good discussion of this in Toward’s Next Generation URLs.

Without producing a static version of a site every time you make a change (which may actually be a workable solution in some cases) you can use URI rewriting to let your users have usable URIs but give the webserver the query string it needs to serve up the right content. This can be done with ISAPI Rewrite on IIS (Lite version is free, if not Free) and mod_rewrite on Apache.

With URI rewriting it does not matter what URI the underlying technology needs you just need to decide on the appropriate scheme and write clever enough regular expressions to implement it.

It would be possible to run a site where every URI is of the form:

http://example.com/7
http://example.com/23
http://example.com/274383

but that fails the user in that it provides no information about the page and prevents user from using urls to navigate the site. It seems that some “directory structure” is required (although it may not reflect the actual directory structure of the site) and then a name for the page of content should be appended to that.

Cool URIs don’t change suggests dates as an effective
structure and suggests that categories are too changeable over time to work.

An Example

The posts on this site currently have URIs in this format:

http://bluebones.net/news/default.asp?action=view_story&story_id=97

There we have underscores, a question mark and an ampersand plus the URI tells you very little about what you might expect if you clicked on it. Ripe for improvement. Here are the possible schemes I have considered:

A

bluebones.net/posts/96

Very short but not too informative.

B

bluebones.net/posts/2001/06/14/96

Not so short and still fairly oblique.

C

bluebones.net/posts/2001/06/14/internetboggle
bluebones.net/posts/2005/03/13/awstatsoniis5witholdlogfiles

I like this but the longer titles make the URIs overlong.

D

bluebones.net/posts/internetboggle
bluebones.net/posts/wherewizardsstayuplatetheorigins 
oftheinternetbykatiehafnerandmatthewlyon

Even with the date structure removed some posts titles are too long to be included in the URI. And what to do about posts with the same title?

E

bluebones.net/posts/internetboggle
bluebones.net/posts/wherewizardsstayupla

Truncated to 20 characters is OK but you can imagine some horrorshow URI caused by truncating at exactly the wrong point and the content is not so clear. The issue of posts with the same title is even more relevant here.

F

bluebones.net/posts/internetboggle
bluebones.net/posts/wherewizardsstayuplate

This is the scheme I am planning on implementing. The need to cope with duplicates in any scheme that uses strings and not integer ids means that some work will have to be done (by the server) upon posting that is not being done now. Given that I am going to have to write some code anyway, why not just add an extra field when posting (which could be autopopulated with a default of the title with all spaces and punctuation removed) that is called “link” or similar and appears after /posts/ as above. Of course duplicates will have to be checked for and either reported to the user or otherwise dealt with here, too.

What do you think? Help me out before I make irreversible decisons by commenting below.

AWStats on IIS 5 with Old Log Files

I wanted to use awstats to analyse my IIS 5 W3C format log files. I got the program up and running with aid of these instructions as well as conning it that the port (always 80) was actually the bytes sent parameter (for some reason it won’t run without that info being in the log file). And I could only get it to parse the last log file.

I looked at the instructions about how to parse old log files but this involved issuing a command at the commandline for each file. I have logs that go back to 2001. So in the end I wrote this perl script to issue all the necessary commands:

#! c:/perl/perl.exe

sub main {
    
    my $dir = "C:/WINNT/system32/LogFiles/W3SVC1";
    
    for (my $year = 01; $year < = 02; $year++) {
        for (my $month = 1; $month <= 12; $month++) {
            for (my $date = 1; $date <= 31; $date++) {
                my $file = $dir . "/" . get_filename($year, $month, $date);
                if (-e $file) {
                    my $cmd = "f:/inetpub/wwwroot/awstats/cgi-bin"
                        . "/awstats.pl  -config=bluebones.net -LogFile=""
                        . $file . "" -update";
                    print "$cmd
";
                    system($cmd);
                } else {
                    print $file . " does not exist
";
                }
            }
        }    
    }
}

sub get_filename {
    
    local($year, $month, $date) = ($_[0], $_[1], $_[2]);
    
    my $filename = "ex" . pad($year) . pad($month) . pad($date) . ".log";
    
    return $filename;
}

sub pad {
    
    my $num = pop;
    
    if ($num < 10) {
        $num = "0" . $num;
    }
    
    return $num;
}

main();

I had to stop it running in the middle when I hit the date that I added referer to the log file format, alter the conf file manually and then start it running again. But I got there in the end.

What the world needs is a nice, clean API for log files that comes with parsers that intrinsically understand all the various standard formats. That is, I want to be able to just point the program at any Apache, IIS or other standard log files and have it chomp them all up and let me programatically get at the data in any way I like (perhaps stick it all in a SQL database?) Crucially, the program should be able to "discover" the format of the log files by looking at the headers and there should be no configuration (unless you have really weird log files).

Then people can write beautiful graphical reports for this API and everyone can use them regardless of the format that the original logfiles were in. Surely someone has thought of this before? I've put it on my todo list.