Humane Text Formats

I write in plain text a lot. I want to put stuff on the web a lot.
Oftentimes it’s the stuff I already wrote in plain text. I wondered if I could learn some conventions that would convert to XHTML for no extra work after I’d written the plain text. In fact, I am writing this article now in Ultraedit and later it will go on bluebones.net in HTML. And as I wrote ultraedit I wanted to put a link in for that very reason but wasn’t sure whether to or not because it then makes this file html and I’d need to go back and put in <p> tags and so on. Let’s just say that I think learning one of these formats would be A Good Idea™.

For those wondering why I don’t write in HTML all the time check out these good reasons.

This seems to have been the rationale behind Markdown. There are also numerous other text formats like Textile and Almost Free Text
with similar or identical motivations. I don’t want to learn them all, so which one to pick? I couldn’t find a good comparison or even much of a list of alternatives via Google. Answer: have a face off.

The Test

I decided to use the text of this very article.
(Originally I decided I was going to use a BBC news story too but I’d learnt enough about the formats by the time I’d been through them all once!)

Some things I definitely want the winner to be able to do are:

  • Unordered lists
  • Like this one

and

# Code examples like this.
print "This is essential!"

The quality of tools available is also a big plus. For this to reap rewards I
must be able to go effortlessly from text to XHTML and (strongly preferred) back
again.

The Results

Almost Free Text

http://www.maplefish.com/todd/aft.html

An enviable set of outputs: HTML, LaTeX, lout, DocBook and RTF. You have to tell it explicitly to use other than 8 spaces for tabstops, or use tabs. My text editor is set to use spaces for tabs because they travel better (email, etc.) and it is set to 4 spaces. No titles on links. Does table of contents.
No line breaks allowed in link elements is a problem. Adds a whole load of
extra formatting by default – makes whole documents instead of snippets. HTML 4.0 Transitional. Doesn’t seem to be any way to make snippets or XHTML.

See Almost Free Text Test Results

Markdown

http://daringfireball.net/projects/markdown/

A format that comes from Daring Fireball. Default formatting of the Instiki(http://instiki.org/) Wiki. Choked on converting trademark symbol from the HTML character entity reference to Mardown and cannot do tables but otherwise superb and plain text looks right too – not marked up just “natural”.

Tools include html2text, a
Python script that converts a page of HTML into valid Markdown. There is also a PHP implementation. RedCloth
(Ruby) has limited support for Markdown.

See Markdown Test Results

reStructuredText

http://docutils.sourceforge.net/rst.html

The format of Python doc strings. Only a “very rough prototype” for converting HTML to reStructuredText (written in OCaml). Links are clunkier than in Markdown or Textile. Cannot set title attributes on links. Nice autonumbering footnotes. No simple way to avoid turning processed text into a full document. I would ideally like to process snippets for cutting and pasting into existing documents or standard headers and footers.

See reStructuredText Test Results

Textile

http://www.textism.com/tools/textile/

Originally created for Textpattern. There is a Movable Type plugin. An alternate to the default Markdown in Instiki. Does class, id, style, language attributes and lots of character entity replacement (em dash, curly quotes, that kind of thing). Only a rudimentary HTML=>Textile converter available. Very obviously meant to be turned into HTML (look at the headings h1, h2, etc.) and not so good as just a way of formatting plain text.

I had trouble making html => text tool work – no trademark and strict xml parsing just exited with error “Junk after document element at line 12″ despite passing W3C validator test for XHTML 1.0 Strict.

Code doesn’t work. Breaks where it finds CR (can’t have 80 col source and XHTML must use word wrap).

RedCloth (Ruby) supports Textile.

See Textile Test Results

Others

RDoc – Originally created to produce documentation from Ruby source files. Offered as an alternative markup option by Instiki. Outputs XML, HTML, CHM (Compiled HTML) and RI (whatever that is). Commandline tool. Now part of core Ruby which ensures continued support but perhaps only as a documentation tool not in the more general sense that I want to use it.

StructuredText – Allows
embedded HTML and DHTML. Used by Zope and
the related ZWiki. Rather horribly uses
indentation rather than explicit heading markers. Supports tables. No way to
go from HTML to Structured Text. Somewhat similar to reStructured Text.

Other formats I didn’t have time to consider in depth or which I discounted for certain reasons: WikiWikiWeb formatting (no tools only as part of WikiWikiWeb), DocBook (for whole books not snippets), atx (can’t find enough info – seems to have been superseded by Markdown), RDTool (didn’t like =begin/=end and lesser than RDoc from the same community), YAML (aimed mainly at configuration files), MoinMoin formatting (no tools to use it separate from the Wiki it comes from), SeText (superseded by StructuredText and ReStructuredText), POD (Plain Old Documentation – the Perl documentation format).

Summary

Textile and Markdown were the only formats I investigated that were truly practical for snippets not full documents. Textile had better support for more HTML features at the expense of looking more like HTML and less like plain text in the first place. Since I can write HTML any time I want anyway and because it has the better tools, Markdown is my provisional “winner”. If anyone wants
to correct any errors above (in the comments section) I’m willing to revise my opinion. (Quick! Before I get wed to this syntax and can’t change!)

Feature Comparison Table

Plain Text Formats Feature Comparison Table
Format To HTML Tool From HTML Tool Tables? Link Titles? class Attribute? id Attribute? Output formats License
Almost Free Text Yes No No No No No HTML, LaTeX, lout, DocBook and RTF Clarified Artistic License
Markdown Yes Yes No Yes No No XHTML BSD-style
reStructuredText Yes Sort of Yes No Yes Auto Latex, XML, PseduoXML, HTML Python
Textile Yes No Yes Yes Yes Yes XHTML Textile License

12 thoughts on “Humane Text Formats

  1. Andrew

    I did a similar comparison and decided to stick to HTML. My reasoning: http://www.fileformat.info/news/2005/03/04/humane_text_formats.htm

  2. Thomas David Baker

    Well, I’ve written a couple of posts here and a good few other text files recently and apart from underlying the section headings with equals and then dashes for the subsections my foray into humane text formats has left me in pretty much the same positionas I was before I started: plain text becomes pseudo XHTML at the first sign of a link and then at the end I go through and clean it all up to valid XHTML. So I seem to have followed your path too.

  3. Marcelo Huerta

    You are wrong on two counts about reStructuredText. 1. The class atribute is supported, with the class:: directive (for paragraphs) and the definition of roles (for spans). 2. There are ‘id’ definitions… but they are autogenerated.

  4. Jakub Narebski

    What about AsciiDoc (http://www.methods.co.nz/asciidoc/), how it compares to mentioned formats? I know it has output/conversion to PDF, XHTML, HTML Help, manpage or plain text.

  5. xah lee

    Or, you could and probably should, use a HTML helping editor, such as Emacs. When such a editor is well done, it takes away the pain of writing the markup. It is, after all, the sole intended purpose of such editors. I introduce: Emacs and HTML http://xahlee.org/emacs/emacs_html.html Xah xah@xahlee.org ∑ http://xahlee.org/

  6. Michael

    I think readability of code is an issue, and thats why any HTML helper tool won’t be sufficient for a lot of us. I get sick of everything being XML this and XML that, thankfully YAML/JSON seem to be getting used in some projects instead and hopefully the same goes for markup’s like those mentioned here. Embedded HTML was critical, which some text2html languages didn’t have (but should have) Personally, my use of tables leans me to Textile, but I will try Markdown and see how it goes also. Great article!

  7. Tim

    I went through a similar exercise, although I focused on Textile and Markdown because they are directly supported by Django which I use for web development. I am using Textile and it works well in general. Markdown has an advantage that is specifically relevant to my application: if you want to allow other users to enter marked up text (e.g. article submissions) to your site, you want to be able to escape html so they can’t do screwy things with your page layout. Textile uses quotes in URL syntax, which is one of the characters you need to escape to completely protect against embedded HTML errors. I believe that Markdown text can be completely escaped prior to processing into HTML, which effectively enforces Markdown-only markups. For this reason I am considering changing from Textile to Markdown. tim

  8. Pingback: Sweetraskels Blog» Blog Archive » Is HTML a Humane Markup Language?

  9. Pingback: Is HTML a Humane Markup Language? - Programming

  10. Pingback: Is HTML a Humane Markup Language? | The CyberwBlog

  11. Pingback: Writing with Markdown and Textile | Words on a page

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>