Humane Text Formats

I write in plain text a lot. I want to put stuff on the web a lot.
Oftentimes it’s the stuff I already wrote in plain text. I wondered if I could learn some conventions that would convert to XHTML for no extra work after I’d written the plain text. In fact, I am writing this article now in Ultraedit and later it will go on bluebones.net in HTML. And as I wrote ultraedit I wanted to put a link in for that very reason but wasn’t sure whether to or not because it then makes this file html and I’d need to go back and put in <p> tags and so on. Let’s just say that I think learning one of these formats would be A Good Idea™.

For those wondering why I don’t write in HTML all the time check out these good reasons.

This seems to have been the rationale behind Markdown. There are also numerous other text formats like Textile and Almost Free Text
with similar or identical motivations. I don’t want to learn them all, so which one to pick? I couldn’t find a good comparison or even much of a list of alternatives via Google. Answer: have a face off.

The Test

I decided to use the text of this very article.
(Originally I decided I was going to use a BBC news story too but I’d learnt enough about the formats by the time I’d been through them all once!)

Some things I definitely want the winner to be able to do are:

  • Unordered lists
  • Like this one

and

# Code examples like this.
print "This is essential!"

The quality of tools available is also a big plus. For this to reap rewards I
must be able to go effortlessly from text to XHTML and (strongly preferred) back
again.

The Results

Almost Free Text

http://www.maplefish.com/todd/aft.html

An enviable set of outputs: HTML, LaTeX, lout, DocBook and RTF. You have to tell it explicitly to use other than 8 spaces for tabstops, or use tabs. My text editor is set to use spaces for tabs because they travel better (email, etc.) and it is set to 4 spaces. No titles on links. Does table of contents.
No line breaks allowed in link elements is a problem. Adds a whole load of
extra formatting by default – makes whole documents instead of snippets. HTML 4.0 Transitional. Doesn’t seem to be any way to make snippets or XHTML.

See Almost Free Text Test Results

Markdown

http://daringfireball.net/projects/markdown/

A format that comes from Daring Fireball. Default formatting of the Instiki(http://instiki.org/) Wiki. Choked on converting trademark symbol from the HTML character entity reference to Mardown and cannot do tables but otherwise superb and plain text looks right too – not marked up just “natural”.

Tools include html2text, a
Python script that converts a page of HTML into valid Markdown. There is also a PHP implementation. RedCloth
(Ruby) has limited support for Markdown.

See Markdown Test Results

reStructuredText

http://docutils.sourceforge.net/rst.html

The format of Python doc strings. Only a “very rough prototype” for converting HTML to reStructuredText (written in OCaml). Links are clunkier than in Markdown or Textile. Cannot set title attributes on links. Nice autonumbering footnotes. No simple way to avoid turning processed text into a full document. I would ideally like to process snippets for cutting and pasting into existing documents or standard headers and footers.

See reStructuredText Test Results

Textile

http://www.textism.com/tools/textile/

Originally created for Textpattern. There is a Movable Type plugin. An alternate to the default Markdown in Instiki. Does class, id, style, language attributes and lots of character entity replacement (em dash, curly quotes, that kind of thing). Only a rudimentary HTML=>Textile converter available. Very obviously meant to be turned into HTML (look at the headings h1, h2, etc.) and not so good as just a way of formatting plain text.

I had trouble making html => text tool work – no trademark and strict xml parsing just exited with error “Junk after document element at line 12” despite passing W3C validator test for XHTML 1.0 Strict.

Code doesn’t work. Breaks where it finds CR (can’t have 80 col source and XHTML must use word wrap).

RedCloth (Ruby) supports Textile.

See Textile Test Results

Others

RDoc – Originally created to produce documentation from Ruby source files. Offered as an alternative markup option by Instiki. Outputs XML, HTML, CHM (Compiled HTML) and RI (whatever that is). Commandline tool. Now part of core Ruby which ensures continued support but perhaps only as a documentation tool not in the more general sense that I want to use it.

StructuredText – Allows
embedded HTML and DHTML. Used by Zope and
the related ZWiki. Rather horribly uses
indentation rather than explicit heading markers. Supports tables. No way to
go from HTML to Structured Text. Somewhat similar to reStructured Text.

Other formats I didn’t have time to consider in depth or which I discounted for certain reasons: WikiWikiWeb formatting (no tools only as part of WikiWikiWeb), DocBook (for whole books not snippets), atx (can’t find enough info – seems to have been superseded by Markdown), RDTool (didn’t like =begin/=end and lesser than RDoc from the same community), YAML (aimed mainly at configuration files), MoinMoin formatting (no tools to use it separate from the Wiki it comes from), SeText (superseded by StructuredText and ReStructuredText), POD (Plain Old Documentation – the Perl documentation format).

Summary

Textile and Markdown were the only formats I investigated that were truly practical for snippets not full documents. Textile had better support for more HTML features at the expense of looking more like HTML and less like plain text in the first place. Since I can write HTML any time I want anyway and because it has the better tools, Markdown is my provisional “winner”. If anyone wants
to correct any errors above (in the comments section) I’m willing to revise my opinion. (Quick! Before I get wed to this syntax and can’t change!)

Feature Comparison Table

Plain Text Formats Feature Comparison Table
Format To HTML Tool From HTML Tool Tables? Link Titles? class Attribute? id Attribute? Output formats License
Almost Free Text Yes No No No No No HTML, LaTeX, lout, DocBook and RTF Clarified Artistic License
Markdown Yes Yes No Yes No No XHTML BSD-style
reStructuredText Yes Sort of Yes No Yes Auto Latex, XML, PseduoXML, HTML Python
Textile Yes No Yes Yes Yes Yes XHTML Textile License

Medical Imaging Lecture

The first Hounsfield Memorial Lecture was given Thursday 10
February 2005 at 17.30 by Professor Robert S. Balaban, Scientific Director,
National Heart, Lung & Blood Institute National Institutes of Health, USA.

I attended following the course on Computer Vision I did at Imperial last
term.

“Imaging: An Interface between Physiology and Medicine”
covered imaging techniques allowing noninvasive viewing of internal organs and
processes. Particular attention was paid to X-Rays from infrared, CT, MRI CT-
PET tumour detection, and CT-MRI. One practical illustration
was a trial run in a local hospital in the US where they used these imaging
techniques to detect whether those with chest pains in the ER have a real heart
problem or not. This is a big improvment on the current system of sitting
patients in a room and monitoring them until something goes really wrong.

Dr. Balaban also talked about image-guided robotic surgery.
MRI as the eyes of robotic surgery with a surgeon not in the room. With these
techniques the surgeon can not only see what is happening on the surface but
also in the internal organs underneath where he is operating.

Motion is the big problem for these realtime views of the insides of living
creatures – Dr. Balaban illustrated showing us a great band of movement that
ruined his pictures of cells in a muscle, then revealing that the muscle was in
the leg and the movement came just from respiration.

CSS Zen Garden Design

I’ve completed a first draft of a potential CSS Zen Garden design.

I’m not a designer by nature and I’m quite happy with it. What I need is some constructive criticism though. Does it not work in your browser? Do you think it is just crap pseudopr0n? Do I need a different image on my listitems? Let me know in the comments section …

Quick Javadoc Reference

I write most of my code in Ultraedit and when I’m writing java I really want to be able to access javadoc documentation for the Sun API as quick as possible. This is a description of the evolution of a little utility to look up javadoc documentation very quickly.

I know that IDEs like IntelliJ and Eclipse offer in-window documentation tooltips (this is actually executed best in Microsoft’s Visual Studio.NET of all the IDEs I have used). But these horking great programs put me off because they feel clunky and tend to specialise in one language. I prefer to use the tricks I learn with my text editor with everything I am working on rather than learning a set of shortcuts, etc. for every language. I know Emacs has tagfiles for every language and is on zillions of platforms but it doesn’t feel like a native Windows app and as that’s where I spend 99% of my time it just doesn’t cut it. Ultraedit makes me feel nimble and encourages cross-application serendipity.

I used to bring up a Run box with Win-R and write the full path to file I needed. So to get the docs on java.util.HashSet I’d do:

Win-R, "c:docsapijavautilHashSet.html", Enter

(obviously autocomplete would make it a little bit quicker than that). This was far too slow – interrupting your train of thought to look up documentation.

To speed things up I dropped a copy of every HTML file into the docs folder. So to get the HashSet documentation I need only type:

Win-R, "c:docsHashSet.html", Enter

(again with autocomplete speeding me up). This wasn’t too bad speedwise but all the links in the file launched would not work. So if you wanted to look at the docs of a superclass or of a method’s return value you had to go back to the Run box. Not ideal.

It seemed the only answer was to write a little utility. I chose Perl as it’s filehandling is simple and the language itself pretty fast. The code I ended up with after a first pass was:

#!c:perlinperl.exe

use strict;
use warnings;

$ARGV[0] || die "No arg supplied.
";

my $look_for = "/" . $ARGV[0] . ".html";
my @found;
my @files = search_dir("c:/docs/api");
if (@files == 1) {
    print "c:/progra~1/intern~1/IEXPLORE.EXE " . $files[0];
    exec("c:/progra~1/intern~1/iexplore " . $files[0]);
} else {
    # FIXME
}

sub search_dir {
    
    my $dir = pop;
    my @list = glob($dir . "/*");
    foreach (@list) {
        if (-d && $_ ne $dir && ! /class-use/) {
            search_dir($_);
        } elsif (/$look_for/) {
            push @found, $_;
        }
    }
    return @found;
}

I had to put in special cases so that one directory was not traversed indefinitely and to avoid the class-use directories that are not of interest (they contain links to classes that used the class).

This worked just fine (excluding the failure to handle multiple matches) but had a noticeable delay. The time I saved typing:

Win-R, docs.pl HashSet, Enter 

compared to the longer version was lost in the searching of the filesystem. Worse the time was now idle time instead of active time and that seems longer (think of waiting for a bus).

What was taking the time was traversing the filesystem looking at filenames.
So I decided that work could be done just once and cached. I ran “find” at a Cygwin bash prompt from the api directory and put the output in a quickref.txt file. I used “grep -v class-use” to remove the class-use directories and replaced /c/ (my symlink to the root of the C: drive under Cygwin) with c:/ to produce a list of paths to all the relevant files.

Now all I needed was code to read this file and find the correct entry.

#!c:/perl/bin/perl.exe

use strict;
use warnings;

$ARGV[0] || die "No arg supplied.
";

my $look_for = "/" . $ARGV[0] . ".html";
open(FILE, "<quickref.txt");

while (<FILE>) {
    if (/$look_for/) {
        exec("c:/progra~1/intern~1/iexplore " . $_);
    }
}

This code was a good deal shorter (the work’s largely been done in creating quickref.txt) and faster. I speeded it up even more by forcing Internet Explorer (quicker startup time than the otherwise superior Firefox) and short-circuiting at the first match (only conflict that occurs often is java.sql.Date and java.util.Date anyway).

All that remained was to create a shortcut on my path to the file called ‘d’ allowing me to launch (for example) fully linked HashSet documentation with:

Win-R, "d Hashset", Enter

A future version could generate the quickref.txt file if it is not present. But as this is such a simple utility I doubt it will have a future version. Other javadocs can be simply added by appending their paths to quickref.txt. Obviously the browser commandline should not be hardcoded and I could consider respecting ESR’s $BROWSER attribute. If you think this is overkill or have suggestions for improvements or just simply think I’m a nutter please comment below.

Lawrence Lessig Launches Creative Commons UK

Lawrence Lessig spoke at UCL on Monday 4th October to launch
the UK version of Creative Commons. This is an
initiative designed to work alongside copyright to help
authors of content to mark their content available for reuse.

The talk was very modern, in the sense that it was closer to
entertainment than your average lecture.
There were hundreds of
slides that Lawrence flicked
through with a thumb switch while he was talking (many containing just a single
word that he said simultaneously with its appearance). And also
plenty of sound
and video
. Some of my favourites were the Peanuts video with Hey Ya!
running over the top of it and the contest-winning
video explaining Creative Commons
(7MB MPEG).

Lawrence discussed the many advantages of Creative Commons licensing with
particular reference to his own book, ‘Free
Culture
‘. He cited the Free Culture Wiki where
people are free to make changes and annotations to copy of the book and also the
fact that people had recorded an Audiobook version
within a few days of its release. He is certain that these activities have
ultimately raised his profile and increased sales of the book.

Another interesting part of the talk was the discussion of the component
parts of Creative Commons licenses. The licenses come in three
formats
: a human
-readable description of their intentions; a lawyer-readable legal document; and
a computer-readable RDF that display intentions electronically. This last one
is the most interesting, making a search engine of reusable
content
posssible (his example was photographs of the Empire State Building
with no royalties payable for use; the possibilities are enormous). Also
interesting was the Creative Commons wrapping of the GNU Public License to create
human and computer readable versions and international lawyer readable
versions.

Lawrence was a very entertaining speaker and also (perhaps surprisingly)
showed little of the zealot. When questions of dismantling corporations or
copyright law altogether came up he was quick to point out the moderation of his
position and that he was only seeking a way for those who wished to
make content
available to do so
. That could well be the key to his success.

Update 2004-10-10 Audio of the talk (80MB).

Decemberists at Brighton Freebutt

I can’t remember how I heard about The
Decemberists
. But they are one of the truly original bands recording in the
world right now. I’ve made it something of a mission to get other people
listening to them. Especially as they seem to be quite SO unfamous – the
closest thing they have to a world tour is a handful of dates in the UK.

The Freebutt in Brighton where I saw them is not a large or famous venue. The
level of advertising given over to them was verging on the nonexistent –
standing outside the venue I couldn’t tell that they were on that night. I
missed them at the Water Rats in King’s Cross on Friday (that will teach me to
have so many different email addresses) so I couldn’t miss out on their next
nearest gig. I just couldn’t miss renditions of lyrics like:

“We will remember this when we are old and ancient.
Though the specifics might be vague
and I’ll say your camisole was a sprightly light magenta
when in fact it was a nappy blueish grey.”

The lead singer (Colin Meloy) wouldn’t look out of place in Weezer but
the rest of the band would have. The tiny drummer (Ezra Holbrook) who (I think)
supplies the voice of the widow in ‘The Chimbley Sweep’, the burly beer-sipping
guitarist (Chris Funk) and a double bassist (Nate Query) so gentle he had
trouble getting back to the stage through the crowd
for the encore. And seated
somewhere I couldn’t see but only hear due to the crowd was Jenny Conlee on
accordion.

They played note-perfect renditions of a selection of their songs changing
things only to add a wry twist. Even the atmospheric screams and falsetto
voices were inch-perfect. The one significant change from the recordings was
during the climactic rendition of The Chimbley Sweep when a series of traded
riffs between Meloy and Funk turned into a tongue-in-cheek Hendrix-like playing
of the guitar behind the head
and string snapping crescendo.

Meloy supplied a little banter between songs. After playing the new song he
said with a tiny hint of self-congratulation, “That was about a Spanish princess
on her way to her coronation, this is about architects” and introduced The
Chimbley Sweep as being, “about my childhood.”

After much clapping from the crowd Meloy returned for an encore, being joined by
the rest of the band in fits and starts through the three songs. He sang
Morrissey’s Everyday is Like Sunday (sounding strange with his so very American
accent wrapped around it) and two Decemberists tracks with perhaps even more
vigour than in the main show.

If you’re reading this for a recommendation then go see them and get their
albums too
. Great band.

Set list (from memory, missing some and bound to be wrong):

  • This Soldiering Life
  • Infanza
  • Here I Dreamt I was an Architect
  • Grace Cathedral Hill
  • July, July
  • Billy Liar
  • Los Angeles, I’m Yours
  • Legionnaire’s Lament

Encore:

  • Every Day is Like Sunday
  • Red Right Ankle
  • I Was Meant For the Stage

Network/Internet Boggle

I’ve expanded Boggle to take text entry, be playable over a LAN/the internet and do all the scoring, etc.

List of the most important added features:

  • Network or internet play for an unlimited number of players.
  • Full scrabble words type dictionary.
  • Results so far totalled and displayed.
  • Alter the size, font and colour of the dice and save as themes.
  • Alter length of a game.
  • Computer opponents (actually they are far too good to really play but they tell you what words you could have got).

Fit But You Know It (Remix)

Been listening to the b side version of “Fit But You Know It” by the Streets a lot (featuring Kano, Donnae’o, Lady Sovereign, Tinchy Stryder). Check it out, I think it actually might be better than the a side. Can’t find the lyrics anywhere on the web so here’s my best stab, mail any corrections to streets@bluebones.net:

Updated 19 July with the correct lyrics from Funmilola Thomas.

Fit But You Know It

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

Kano:
Once upon a time I was on a chick
She was on ‘im but I knew that she was full of shit
She knew she was fine, she knew she was fit
I linked her, she said she was on, but she was on my dick

It was all a game, it was all a trick
I’ll never fall for it again, that’s out of order b’
I should’a cooled it, but now I’m cool
I boom, bounce back but a fool, old school, watch her come running back

I ain’t a player, I crush a lot
I’m like pull a lot
Plus I got like a hundred gash
And you ain’t one of em

I like to have fun when I’m done with em
Plus I knows you like my songs cos you was humming em
But you’re a teaser playing games cuz im a cheater
You really think you’re the 2004 Mona Lisa

You ain’t Beyonce
You ain’t either
Come down from cloud nine
You really think you’re a diva

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

Donae’o:
Yeah uh
You might be buff and all that
Doing your stuff and all that
But I don’t care enough and all that
Cos I got a girl

And the reason for me being so stush is easily reasoned
Cos for me being heathen, simply it ain’t needed

And just because you got double Ds
Don’t mean you can trouble me
Those sausage lips they’ve got to get, get got some girls not with it
But the guy there he’s hot for it
Go check for him he’s next to kim
He’s into airhead, skin and bone
But that’s not for me I like big and bold

Mike Skinner:
Since I’ve been thinking deeply
I might try to see
About maybe going down to get my ears pierced
It’s a slightly different result for me
Than the one I was unroathed and like usually
I’m told it will prepare my mind surely
For having a wifey four timely a week
You know know how to deal with pain like I fought see
And of course you’re required to buy jewelry

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

Lady Sovereign:
<spoken>
I saw u lookin at me Oi what are you turning away for man diz is shite ur
lyk a sparrow dat dnt work u dnt chirps ur a boy</spoken>

Anyway, yo,

And I fink I’m nice
I know I’m nice
Cos your eyes look twice
Up down left right
Left right left right

Was wearing baggy jeans and a tight top
Cleavage tryin to show im den i saw sumfin in his trousers growin
And I know you was looking at me
Cos the girls behind me were looking off key
Don’ I know this
You’re just a sparrow
That really don’t work cos u not chirps, boy you’re hopeless

Tinchy Stryder:
She’s one of them girls who knows she’s looking more than nice
Even on a bad day I’m looking more than twice
Her eyes – it’s like they’re hypnotising all these guys
Meanwhile although she’s calling shots – no surprise
In her head she knows it – that cos of her they fantasize
So she controls it those mind games with other guys
But I don’t care if she’s buff and that
She ain’t gonna get no love from me
She’s just another girl to me
I’m Strider man thats that

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

You’re just too damn vain for me girl
Playing in a different league from me girl
I ain’t even trying to speak girl
You ain’t playing right

Internet Explorer 5 HTTP 500 Error on Ampersand entity in Query String

I’ve got zillions of 500 errors in my log files (well, a handful each day). All are from Internet Explorer 5 on Windows NT (specifically the User-Agent is “Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT)”. I’m pretty certain that the problem is ampersand character entities in anchor tag href attributes. It seems that IE5 is barfing on &amp; being used in querystrings rather than just an ampersand. But of course the W3C’s HTML validator won’t have anything but the character entity without telling me off. Is there an answer?