Ning Appathon

Following last month’s Ning Apps launch, we’re excited to
announce that we’ll be holding a special developer event called the
Ning Appathon at our offices in Palo Alto, CA on Thursday,
November 5th from 6pm-10pm.

The event will include:

  • An overview of Ning Apps and our OpenSocial implementation
  • Presentations from existing Ning Apps developers
  • A chance to meet members of the Ning Engineering and Developer
    Advocacy teams
  • Free pizza and beer

Most importantly, we’ll be announcing the start of a
week-long app development competition which will include awards for
new applications in addition to ported applications. Prizes and
details will be revealed at the event.

Location

Ning
167 Hamilton Ave
2nd Floor
Palo Alto, CA 94301

Date

Thursday, November 5th

Time

6pm-10pm

Prize Info

To be announced at the event!

All attendees will receive a complimentary Ning hoodie, so be
sure to tell us your shirt size when RSVPing. You can attend solo
or bring one colleague, we only ask that you RSVP by 9pm PST on
Thursday, October 29th. All attendees must be at least 21 years
of age.

More Details and Registration

League Table Generator

I’ve written a simple league table generator in the style of my fixtures generator.

Example table

Source code, for those that are interested in such things …

Source code on github

<?php

require_once('masort.php');

function main() {
    $s = "";
    $cmd = (isset($_REQUEST['cmd']) ? $_REQUEST['cmd'] : null);
    $id = (isset($_GET['id']) ? $_GET['id'] : null);
    $title = "";
    if ($cmd === 'add') {
        list($id, $display) = add($id, $_POST['results']);
        $s .= $display;
        $title = ($id ? "Table $id" : 'Create Table');
    }
    if ($id) {
        if (isset($_GET['txt'])) {
            $s .= "<pre>" . table_text($id) . "</pre>";
        } else {
            $s .= table($id);
        }
        $title = "Table $id";
    }
    $s = head($title, $id) . $s;
    $s .= input_form($id);
    $s .= ($id ? results($id) : "");
    $s .= table_links();
    $s .= foot();
    echo $s;
}

// String of HTML input form for results.
function input_form($id) {
    $instructions = '<p class="instructions"><b>Enter results in format</b> <code>name X - Y name</code> <b>to ';
    $instructions .= ($id ? "add to the" : "start a new");
    $instructions .= " table.</b></p>";
    ob_start();
    ?>
    <form method="POST">
        <input type="hidden" name="cmd" value="add" />
        <input type="hidden" name="id" value="<?php echo h($id); ?>" />
        <?php echo $instructions; ?>
        <textarea name="results"></textarea>
        <p><input type="submit" value="Add" /></p>
    </form>
    <?php
    echo ($id ? "" : "<p>Example: <pre>Liverpool 1 - 0 Man Utd\nEverton 2 - 0 Aston Villa\nLiverpool 3 - 1 Everton\nAston Villa 0 - 0 Man Utd</pre>");
    return ob_get_clean();
}

// Add results to a table, creating the table if necessary.
//TODO if we just created a table we won't display it here but we should.
function add($provided_id, $s) {
    $id = ($provided_id ? $provided_id : generate_id());
    $results = parse_results($s);
    $added = 0;
    foreach ($results as $r) {
        extract($r);
        $sql = "INSERT INTO result (table_id, home, away, for, against) VALUES ";
        $sql .= "(" . q($id) . ", " . q($home) . ", " . q($away) . ", " . q($for) . ", " . q($against) . ")";
        $added += db($sql);
    }
    if (! $provided_id) {
        header("Location: " . self_ref_url() . "?id=" . $id);
        return;
    }
    ob_start();
    ?>
    <p class="success">Added <?php echo $added; ?> results to the table.</p>
    <?php
    return array($id, ob_get_clean());
}

// String of HTML display of table $id.
function table($id) {
    $table = generate_table($id);
    $s = "<table><thead><tr><th>Team</th><th>P</th><th>W</th><th>D</th><th>L</th><th>F</th><th>A</th><th>Pts</th></tr></thead><tbody>";
    foreach ($table as $team) {
        extract(hmap($team));
        $s .= "<tr><td>$name</td><td class=\"n\">$played</td><td class=\"n\">$won</td><td class=\"n\">$drawn</td><td class=\"n\">$lost</td><td class=\"n\">$for</td><td class=\"n\">$against</td><td class=\"n\">$points</td></tr>";
    }
    $s .= "</tbody>";
    $s .= '<p><a href="' . self_ref_url() . '?id=' . h($id) . '&txt=1">Text version</a></p>';
    return $s;
}

// String of display of table $id suitable for display in monospace font.
function table_text($id) {
    $EXTRA_PADDING = 2;
    $table = generate_table($id);
    list($longest, $numeric) = array(array(), array());
    foreach ($table as $team) {
        foreach (hmap($team) as $k => $v) {
            $longest[$k] = (isset($longest[$k]) && $longest[$k] >= mb_strlen($v) ? $longest[$k] : mb_strlen($v));
            $numeric[$k] = (isset($numeric[$k]) ? $numeric[$k] && is_numeric($v) : is_numeric($v));
        }
    }
    $s = "";
    foreach ($longest as $k => $max) {
        $display = ucwords(strlen($k) > $longest[$k] ? substr($k, 0, 1) : $k);
        if ($numeric[$k]) {
            $s .= str_pad($display, $max + $EXTRA_PADDING, " ", STR_PAD_LEFT);
        } else {
            $s .= str_pad($display, $max + $EXTRA_PADDING);
        }
    }
    foreach ($table as $team) {
        $s .= "\n";
        foreach (hmap($team) as $k => $v) {
            if ($numeric[$k]) {
                $s .= str_pad($v, $longest[$k] + $EXTRA_PADDING, " ", STR_PAD_LEFT);
            } else {
                $s .= str_pad($v, $longest[$k] + $EXTRA_PADDING);
            }
        }
    }
    $s .= '<p><a href="' . self_ref_url() . '?id=' . h($id) . '">HTML version</a></p>';
    return $s . "\n";
}

// String of HTML results.
function results($id) {
    $rs = get_results($id);
    $s = '<table><tbody>';
    foreach ($rs as $r) {
        extract(hmap($r));
        $s .= "<tr><td>$home</td><td>$for</td><td>-</td><td>$against</td><td>$away</td></tr>";
    }
    return $s . "</tbody></table>";
}

// String of HTML links to all known tables.
function table_links() {
    $sql = "SELECT DISTINCT(table_id) AS id FROM result ORDER BY table_id";
    $rs = db($sql);
    if (! is_array($rs)) { return ""; }
    $s = "";
    foreach ($rs as $r) {
        extract(hmap($r));
        $s .= '<p><a href="?id=' . $id . '">Table ' . $id . '</a></p>';
    }
    return $s;
}

// ********** Helpers **********

function get_results($id) {
    $sql = "SELECT home, away, for, against FROM result WHERE table_id = " . q($id);
    return db($sql);
}

function generate_table($id) {
    $rs = get_results($id);
    $table = array();
    foreach ($rs as $r) {
        extract($r);
        $table = add_result($table, $home, $for, $against);
        $table = add_result($table, $away, $against, $for);
    }
    masort($table, 'points_d,for_d,against_a'); //TODO sort should be more complicated for GD etc.
    return $table;
}

function parse_results($s) {
    $s = preg_replace('/[ \t]+/', ' ', $s);
    $matches = explode("\n", $s);
    $results = array();
    foreach ($matches as $match) {
        if (preg_match('/^(.*?) (\d+) - (\d+) (.*?)$/', $match, $details)) {
            $results[] = array('home' => trim($details[1]), 'for' => trim($details[2]), 'against' => trim($details[3]), 'away' => trim($details[4]));
        }
    }
    return $results;
}

function add_result($table, $team, $for, $against) {
    if (! isset($table[$team])) {
        $table[$team] = array('name' => $team, 'played' => 0, 'won' => 0, 'drawn' => 0, 'lost' => 0, 'for' => 0, 'against' => 0, 'points' => 0);
    }
    if ($for > $against) {
        $table[$team]['won'] += 1;
        $table[$team]['points'] += 3;
    } else if ($for < $against) {
        $table[$team]['lost'] += 1;
    } else {
        $table[$team]['drawn'] += 1;
        $table[$team]['points'] += 1;
    }
    $table[$team]['played'] += 1;
    $table[$team]['for'] += $for;
    $table[$team]['against'] += $against;
    return $table;
}

// Get next table id in the database.  Unsafe.
function generate_id() {
    $sql = "SELECT IFNULL(MAX(table_id), 0) + 1 AS result FROM result";
    $rs = db($sql);
    return $rs[0]['result'];
}

// ********* Header/Footer **********

// String of HTML header.
function head($title, $id) {
    ob_start();
    ?>
    < !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
    <html>
        <head>
            <title>League Table Generator<?php if ($title) { echo " - $title"; } ?></title>
            <link rel="stylesheet" href="blueprint/screen.css" type="text/css" media="screen, projection">
            </link><link rel="stylesheet" href="blueprint/print.css" type="text/css" media="print">
            <!--[if lt IE 8]>
            </link><link rel="stylesheet" href="css/blueprint/ie.css" type="text/css" media="screen, projection">
            < ![endif]-->
            <link rel="stylesheet" type="text/css" href="table.css" />
        </link></head>
        <body>
            <div class="container">
                <div class="span-10 last">
                    <h1>Table Generator</h1>
                    <p>This program is part of <a href="/2009/09/league-table-generator">bluebones.net</a></p>
                    <?php if ($title) { echo "<h2>$title"; } ?>
                    <?php if ($id) { ?>
                        <p><a href="<?php echo h($_SERVER['SCRIPT_NAME']); ?>">New Table</a></p>
                    <?php } ?>

    <?php
    return ob_get_clean();
}

// String of HTML footer.
function foot() {
   ob_start();
   ?>
                </div>
            </div>
        </body>
    </html>
    <?php
   return ob_get_clean();
}

// ********** Utilities **********

function self_ref_url() {
    $host  = $_SERVER['HTTP_HOST'];
    $uri   = $_SERVER['PHP_SELF'];
    return "http://$host$uri";
}

// SQL-quote a string.
function q($s) {
    return "'" . str_replace("'", "''", $s) . "'";
}

// HTML escaping to prevent XSS
function h($s) {
    return htmlentities($s);
}

// HTML escape the values of an assoc array
function hmap($a) {
    $new = array();
    foreach ($a as $k => $v) {
        $new[$k] = h($v);
    }
    return $new;
}

// Exec query on db $id creating it if necessary and returning array of results if a SELECT.
function db($sql) {
    $db = sqlite_open('results');
    // Create table if it doesn't exist.  Ignore error if it does.
    @sqlite_exec($db, 'CREATE TABLE result (home VARCHAR(255), away VARCHAR(255), for INT, against INT, table_id INT)');
    if (strpos($sql, "SELECT") === 0) {
        $q = sqlite_query($db, $sql);
        return sqlite_fetch_all($q, SQLITE_ASSOC);
    } else {
        return sqlite_exec($db, $sql);
    }
}

main();

/*
Copyright (c) 2009 Thomas David Baker

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
*/

UK Postcode Regex

Most UK postcode regexes seem to be strict validators. For London Cinema I am more interested in what the user meant than I am in making sure they have entered an exactly legal postcode.

The Google geocoding stuff that I use in my geocoder is surprisingly unliberal in what it accepts (for example it won’t accept a postcode with a missing space), probably because it is using a strict validator like those you can find all over the web. I wrote this disambiguation function that tries to turn possibly-funky user input into a canonical postcode. The test uses genuine inputs to London Cinema from the last few days.

function disambiguate_uk_postcode($s) {
    $target = str_replace(' ', '', mb_strtoupper($s));
    $postcode_finder = '/^([A-Z][A-Z]?)([O0-9][O0-9]?)([A-Z]?)([O0-9])([A-Z][A-Z])$/';
    if (preg_match($postcode_finder, $target, $matches)) {
        return $matches[1] . str_replace('O', '0', $matches[2]) . $matches[3]
            . " " . str_replace('O', '0', $matches[4]) . $matches[5];
    } else {
        return $s;
    }
}

function test_disambiguate_uk_postcode() {
    assert(disambiguate_uk_postcode('se 15 5 ed') === 'SE15 5ED');
    assert(disambiguate_uk_postcode('se229ef') === 'SE22 9EF');
    assert(disambiguate_uk_postcode('wc1n1as') === 'WC1N 1AS');
    assert(disambiguate_uk_postcode('w111pg') === 'W11 1PG');
    assert(disambiguate_uk_postcode('e113bz') === 'E11 3BZ');
    assert(disambiguate_uk_postcode('cro 5al') === 'CR0 5AL');
    assert(disambiguate_uk_postcode('E15JA') === 'E1 5JA');
    assert(disambiguate_uk_postcode('ha80hb') === 'HA8 0HB');
    assert(disambiguate_uk_postcode('E4 7 DT') === 'E4 7DT');
    assert(disambiguate_uk_postcode('SW179HN') === 'SW17 9HN');
    assert(disambiguate_uk_postcode('south woodford') === 'south woodford');
}

/*
Copyright (c) 2009 Thomas David Baker

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
*/

PHP Wrapper for Google Maps API Geocoding

My PHP wrapper for the Google Maps API geocoding service. See below the code for the back story. Please do report bugs or ask questions in the comments.

Code

Example:


<?php

/*
A PHP wrapper for the Google Maps API geocoding services.
Requires json_decode be available.
Licensed under the MIT/X11 license (see below).
Thomas David Baker, <bakert@gmail.com>

Example:

// Can take streetnames ("Broadway"), longer addresses ("High Street, Kensington"), 
// postcodes/zipcodes ("SW1A 1AA", "90210") or points of interest ("Buckingham Palace", "Mount Everest").

$results = Geocoder::simpleGeocode("Broadway");
foreach ($results as $result) {
    echo $result['address'] . "\n";
    echo $result['longitude'] . "\n";
    echo $result['latitude'] . "\n";
}
*/

class Geocoder {
    
    // Use your google maps API key here, or provide it as a parameter on each call.
    const API_KEY = null;
    // You may want to change this here, or you can provide it as a parameter on each call.
    const HOST = "maps.google.co.uk";

    // Get an array of possible geocoding matches for an address or an empty array if none found.
    // Matches are of the type array('q' => <original search string>, 'address' => <best effort at a street address', 
    // 'longitude' => <longitude>, 'latitude' => <latitude>
    // Return value of null signals an error somewhere along the way.
    public static function simpleGeocode($addr, $host=self::HOST, $key=self::API_KEY) {
        $data = self::geocode($addr, $host, $key);
        if (! ($data && $data['Status']['code'])) {
            return null;
        }
        $statusCode = $data['Status']['code'];
        if ($statusCode == "602" || ! $data['Placemark']) {
            return array();
        } else if ($statusCode != "200") {
            return null;
        }
        $result = array();
        foreach ($data['Placemark'] as $placemark) {
           $result[] = self::parsePlacemark($placemark);
        }
        return $result;
    }

    // Get the Google Maps API JSON output as an assoc. array for the specified address.
    // Return value of null means the data could not be retrieved, false means could not be decoded.
    public static function geocode($addr, $host=self::HOST, $key=self::API_KEY) {
        if (! $key) { throw new Exception("Add your Google Maps API key to the source to use this function without passing it as a parameter."); }
        $url = "http://" . self::HOST . "/maps/geo?output=json&oe=utf-8&q=" . urlencode($addr) . "&key=" . $key;
        $json = file_get_contents($url);
        if (! $json) { return null; }
        $data = json_decode($json, true);
        if (! $json) { return false; }
        return $data;
    }

    // Takes a member of the Google Maps API 'Placemark' array and converts it to something flatter and more manageable.
    // Return value is assoc array with keys 'address', 'longitude' and 'latitude'
    public static function parsePlacemark($placemark) {
        $result = array();
        $result['address'] = $placemark['address'];
        $coordinates = (($placemark['Point']['coordinates']) ? $placemark['Point']['coordinates'] : array());
        $result['longitude'] = $coordinates[0];
        $result['latitude'] = $coordinates[1];
        return $result;
    }

}

/*
Copyright (c) 2008 Thomas David Baker

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
*/

Back Story …

Every since Google made their Google Maps API geocoding services available for the UK last year I’ve been meaning to turn off the handmade scraper I use on London Cinema and use that.

Today London Cinema sent me an email because the scraper had identified the canonical form of a user-entered address as being and it couldn’t find a latitude and longitude for it:

hotels%5Cx26ie%3DUTF8%5Cx26hl%3Den%5Cx26f%3Dq%5Cx26sampleq%3D1%22+onclick%3D%22return+loadUrl%28this.href%29%22%5Cx3ehotels%5Cx3c%2Fa%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx3ca+href%3D%22%2Fmaps%3Fq%3Dhotels+in+manchester%2C+lancashire%5Cx26ie%3DUTF8%5Cx26hl%3Den%5Cx26f%3Dq%5Cx26sampleq%3D1%22+onclick%3D%22return+loadUrl%28this.href%29%22%5Cx3ehotels+in+manchester%2C+lancashire%5Cx3c%2Fa%5Cx3e%5Cx3c%2Fdiv%5Cx3e%5Cx3cfont+size%3D%22+1%22%5Cx3eLocation%3A%5Cx3c%2Ffont%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx3cform+id%3Drnl_form+action%3D%22%2Fmaps%22+onSubmit%3D%22document.getElementById%28%27user_q%27%29.value+%3Ddocument.getElementById%28%27q_d%27%29.value%22%5Cx3e%5Cx3cinput+type%3Dtext+size%3D22+id%3Drnl_near+name%3Dnear+%2F%5Cx3e+%5Cx3cinput+type%3Dhidden+id%3Duser_q+name%3Dq+value%3D%22%22%2F%5Cx3e%5Cx3cinput+type%3Dhidden+name%3Df+value%3Dp+%2F%5Cx3e%5Cx3cinput+type%3Dsubmit+name%3DbtnG+value%3D%22Search+Maps%22%2F%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx3cfont+size%3D-1%5Cx3e%5Cx3cinput+type%3Dcheckbox+name%3Drl+checked+value%3D1+%2F%5Cx3e+Make+this+my+default+location%5Cx3cbr%2F%5Cx3e%5Cx3c%2Ffont%5Cx3e%5Cx3c%2Fform%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx3cb%5Cx3eExamples%3A%5Cx3c%2Fb%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx26nbsp%3B%5Cx26nbsp%3B%5Cx3cb%5Cx3e%5Cx26%23183%3B%5Cx3c%2Fb%5Cx3e+%5Cx3cspan+dir%3Dltr%5Cx3eglasgow%5Cx3c%2Fspan%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx26nbsp%3B%5Cx26nbsp%3B%5Cx3cb%5Cx3e%5Cx26%23183%3B%5Cx3c%2Fb%5Cx3e+%5Cx3cspan+dir%3Dltr%5Cx3ebuckingham+palace+road+SW1%5Cx3c%2Fspan%5Cx3e%5Cx3cbr%2F%5Cx3e%5Cx3cbr%5Cx3e%5Cx3cspan+id%3Dfeatco+class%3D%22noprint+hdr%22+style%3Dfont-weight%3Abold%5Cx3eBrowse+popular+maps%5Cx3c%2Fspan%5Cx3e%5Cx3cdiv+id%3Dfc_0+class%3Dnoprint+style%3D%22padding-top%3A+2pt%22%5Cx3e%5Cx3ca+href%3D%22%2Fmaps%2Fms%3Fmsa%3D0

That seemed like a good reason to get around to it!

String Representation of XML Objects in PHP

There’s got to be an easier way to do this.

Perhaps I am unsophisticated. But sometimes when I am debugging I just want to print strings to see what is going on. When working with PHP’s DOM XML stuff, this is difficult. var_dump and print_r don’t do what I’d like. What I really want is just to see the XML of the DOMDocument or the DOMElement or the node list or whatever it is I happen to have in my variable (I may not even know).

There may be a much easier way to get a string representation of an arbitrary XML object in PHP. If so, please link it up in the comments. Failing that, here’s a rough pass at the kind of function I need:

    function printXml($xml) {
        $s = self::xmlToString($xml);
        print "<pre>" . htmlentities($s) . "</pre>";
    }

    function xmlToString($xml) {
        if ($xml instanceof DOMDocument) {
            $s = $xml->saveXml();
        } else if ($xml->length) {
            $s = '';
            foreach ($xml as $element) {
                $s .= self::xmlToString($element);
            }
        } else {
            $s = self::xmlToStringProper($xml);
        }
        return $s;
    }

    function xmlToStringProper($node) {
        $dom = new DOMDocument();
        $xmlContent = $dom->importNode($node, true);
        $dom->appendChild($xmlContent);
        return $dom->saveXml();
    }

History Meme

11:41:13 bakert@bluebones:~$  history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
4611 vi
3728 u
3367 cd
2817 ruby
1832 ls
1589 svn
1230 td
852 mysql
790 ssh
635 gd

My First Program

I wrote my first programs at the age of 8. They are preserved on Tom 1, a 15 minute cassette tape. It starts with copied listings from the ZX Spectrum Introduction book. But before long there is a program called ‘askage’ which I’m pretty sure is an original composition. Here’s the code in it’s entirety:

10 PRINT "How old are you?"
20 INPUT A$
30 PRINT "Really? You look much older than that!" 

Deal or No Deal Player Selection Is Not Random

Fairly obvious one, this. UK Deal or No Deal contestant’s names flash up at the beginning of the show as if one is being selected at random. However, we can see very easily that this is not really happening.

If selection was random (one of 22 potential contestants randomly selected at the start of the show) 9.3% of contestants would have to wait 50 or more shows for their turn. 1 in 100 contestants would have to wait 97 shows or more. In practice, this doesn’t happen. There have currently been more than 500 contestants and only one contestant (Lucy Harrington) has had to wait as long as 50 shows (she waited exactly 50). Filming 3 shows per day (15 per week), it would not be practical for contestants to ever wait much longer than 30 shows.

Spoiling my not-very-amazing detective work, Producer Glenn Hugill is on record as saying:

“No it’s not random. For some reason the Radio Times said it was, but that didn’t come from us. It’s always been a selection otherwise we can’t guarantee people that they will play within a reasonable time period/have people in the audience etc. The players for each week are selected on Monday and confirmed in threes each weekday morning.”

But I didn’t read that until after I’d written a little ruby program to tell me how long I might have to wait under random conditions!

Here’s a table of wait times in a truly random scenario:

Will sit out 0 shows ... 4.5 percent chance (cumulative: 4.5)
Will sit out 1 show ... 4.3 percent chance (cumulative: 8.9)
Will sit out 2 shows ... 4.1 percent chance (cumulative: 13.0)
Will sit out 3 shows ... 4.0 percent chance (cumulative: 17.0)
Will sit out 4 shows ... 3.8 percent chance (cumulative: 20.8)
Will sit out 5 shows ... 3.6 percent chance (cumulative: 24.4)
Will sit out 6 shows ... 3.4 percent chance (cumulative: 27.8)
Will sit out 7 shows ... 3.3 percent chance (cumulative: 31.1)
Will sit out 8 shows ... 3.1 percent chance (cumulative: 34.2)
Will sit out 9 shows ... 3.0 percent chance (cumulative: 37.2)
Will sit out 10 shows ... 2.9 percent chance (cumulative: 40.1)
Will sit out 11 shows ... 2.7 percent chance (cumulative: 42.8)
Will sit out 12 shows ... 2.6 percent chance (cumulative: 45.4)
Will sit out 13 shows ... 2.5 percent chance (cumulative: 47.9)
Will sit out 14 shows ... 2.4 percent chance (cumulative: 50.2)
Will sit out 15 shows ... 2.3 percent chance (cumulative: 52.5)
Will sit out 16 shows ... 2.2 percent chance (cumulative: 54.7)
Will sit out 17 shows ... 2.1 percent chance (cumulative: 56.7)
Will sit out 18 shows ... 2.0 percent chance (cumulative: 58.7)
Will sit out 19 shows ... 1.9 percent chance (cumulative: 60.6)
Will sit out 20 shows ... 1.8 percent chance (cumulative: 62.4)
Will sit out 21 shows ... 1.7 percent chance (cumulative: 64.1)
Will sit out 22 shows ... 1.6 percent chance (cumulative: 65.7)
Will sit out 23 shows ... 1.6 percent chance (cumulative: 67.3)
Will sit out 24 shows ... 1.5 percent chance (cumulative: 68.7)
Will sit out 25 shows ... 1.4 percent chance (cumulative: 70.2)
Will sit out 26 shows ... 1.4 percent chance (cumulative: 71.5)
Will sit out 27 shows ... 1.3 percent chance (cumulative: 72.8)
Will sit out 28 shows ... 1.2 percent chance (cumulative: 74.1)
Will sit out 29 shows ... 1.2 percent chance (cumulative: 75.2)
Will sit out 30 shows ... 1.1 percent chance (cumulative: 76.4)
Will sit out 31 shows ... 1.1 percent chance (cumulative: 77.4)
Will sit out 32 shows ... 1.0 percent chance (cumulative: 78.5)
Will sit out 33 shows ... 1.0 percent chance (cumulative: 79.4)
Will sit out 34 shows ... 0.9 percent chance (cumulative: 80.4)
Will sit out 35 shows ... 0.9 percent chance (cumulative: 81.3)
Will sit out 36 shows ... 0.9 percent chance (cumulative: 82.1)
Will sit out 37 shows ... 0.8 percent chance (cumulative: 82.9)
Will sit out 38 shows ... 0.8 percent chance (cumulative: 83.7)
Will sit out 39 shows ... 0.7 percent chance (cumulative: 84.4)
Will sit out 40 shows ... 0.7 percent chance (cumulative: 85.2)
Will sit out 41 shows ... 0.7 percent chance (cumulative: 85.8)
Will sit out 42 shows ... 0.6 percent chance (cumulative: 86.5)
Will sit out 43 shows ... 0.6 percent chance (cumulative: 87.1)
Will sit out 44 shows ... 0.6 percent chance (cumulative: 87.7)
Will sit out 45 shows ... 0.6 percent chance (cumulative: 88.2)
Will sit out 46 shows ... 0.5 percent chance (cumulative: 88.8)
Will sit out 47 shows ... 0.5 percent chance (cumulative: 89.3)
Will sit out 48 shows ... 0.5 percent chance (cumulative: 89.8)
Will sit out 49 shows ... 0.5 percent chance (cumulative: 90.2)
Will sit out 50 shows ... 0.4 percent chance (cumulative: 90.7)
Will sit out 51 shows ... 0.4 percent chance (cumulative: 91.1)
Will sit out 52 shows ... 0.4 percent chance (cumulative: 91.5)
Will sit out 53 shows ... 0.4 percent chance (cumulative: 91.9)
Will sit out 54 shows ... 0.4 percent chance (cumulative: 92.3)
Will sit out 55 shows ... 0.4 percent chance (cumulative: 92.6)
Will sit out 56 shows ... 0.3 percent chance (cumulative: 92.9)
Will sit out 57 shows ... 0.3 percent chance (cumulative: 93.3)
Will sit out 58 shows ... 0.3 percent chance (cumulative: 93.6)
Will sit out 59 shows ... 0.3 percent chance (cumulative: 93.9)
Will sit out 60 shows ... 0.3 percent chance (cumulative: 94.1)
Will sit out 61 shows ... 0.3 percent chance (cumulative: 94.4)
Will sit out 62 shows ... 0.3 percent chance (cumulative: 94.7)
Will sit out 63 shows ... 0.2 percent chance (cumulative: 94.9)
Will sit out 64 shows ... 0.2 percent chance (cumulative: 95.1)
Will sit out 65 shows ... 0.2 percent chance (cumulative: 95.4)
Will sit out 66 shows ... 0.2 percent chance (cumulative: 95.6)
Will sit out 67 shows ... 0.2 percent chance (cumulative: 95.8)
Will sit out 68 shows ... 0.2 percent chance (cumulative: 96.0)
Will sit out 69 shows ... 0.2 percent chance (cumulative: 96.1)
Will sit out 70 shows ... 0.2 percent chance (cumulative: 96.3)
Will sit out 71 shows ... 0.2 percent chance (cumulative: 96.5)
Will sit out 72 shows ... 0.2 percent chance (cumulative: 96.6)
Will sit out 73 shows ... 0.2 percent chance (cumulative: 96.8)
Will sit out 74 shows ... 0.1 percent chance (cumulative: 96.9)
Will sit out 75 shows ... 0.1 percent chance (cumulative: 97.1)
Will sit out 76 shows ... 0.1 percent chance (cumulative: 97.2)
Will sit out 77 shows ... 0.1 percent chance (cumulative: 97.3)
Will sit out 78 shows ... 0.1 percent chance (cumulative: 97.5)
Will sit out 79 shows ... 0.1 percent chance (cumulative: 97.6)
Will sit out 80 shows ... 0.1 percent chance (cumulative: 97.7)
Will sit out 81 shows ... 0.1 percent chance (cumulative: 97.8)
Will sit out 82 shows ... 0.1 percent chance (cumulative: 97.9)
Will sit out 83 shows ... 0.1 percent chance (cumulative: 98.0)
Will sit out 84 shows ... 0.1 percent chance (cumulative: 98.1)
Will sit out 85 shows ... 0.1 percent chance (cumulative: 98.2)
Will sit out 86 shows ... 0.1 percent chance (cumulative: 98.3)
Will sit out 87 shows ... 0.1 percent chance (cumulative: 98.3)
Will sit out 88 shows ... 0.1 percent chance (cumulative: 98.4)
Will sit out 89 shows ... 0.1 percent chance (cumulative: 98.5)
Will sit out 90 shows ... 0.1 percent chance (cumulative: 98.5)
Will sit out 91 shows ... 0.1 percent chance (cumulative: 98.6)
Will sit out 92 shows ... 0.1 percent chance (cumulative: 98.7)
Will sit out 93 shows ... 0.1 percent chance (cumulative: 98.7)
Will sit out 94 shows ... 0.1 percent chance (cumulative: 98.8)
Will sit out 95 shows ... 0.1 percent chance (cumulative: 98.9)
Will sit out 96 shows ... 0.1 percent chance (cumulative: 98.9)
Will sit out 97 shows ... 0.0 percent chance (cumulative: 99.0)
Will sit out 98 shows ... 0.0 percent chance (cumulative: 99.0)
Will sit out 99 shows ... 0.0 percent chance (cumulative: 99.0)

Scraping the Web

Some of us have spent years scraping news sites. Others have spent them downloading government data. Others have spent them grabbing catalog records for books. And each time, in each community, we reinvent the same things over and over again: scripts for doing crawls and notifying us when things are wrong, parsers for converting the data to RDF and XML, visualizers for plotting it on graphs and charts.

It’s time to start sharing our knowledge and our tools. But more than that, it’s time for us to start building a bigger picture together. To write robust crawl harnesses that deal gracefully with errors and notify us when a regexp breaks. To start converting things into common formats and making links between data sets. To build visualizers that will plot numbers on graphs or points on maps, no matter what the source of the input.

We’ve all been helping to build a Web of data for years now. It’s time we acknowledge that and start doing it together.

Oh yes. theinfo.org.