Practical URL Validation

There’s a lot of code out there that deals with URL validation. Basically all of it is concerned with “does this URL meet the RFC spec?” That’s actually not that interesting a question if you are validating user input. You don’t want ‘gopher://whatever/’ or ‘’ as valid URLs in your system if you have asked the user for the address of a web page.

What’s more, because of punycode-powered internationalized URLs most URL validating code will tell you that real URLs that people can use today are invalid (PHP’s parse_url is not even utf8 safe).

Here’s some PHP code that validates URLs in a more practical way. It uses the list of TLDs in static::$validTlds from the IANA list of valid TLDs and assumes the presence of a utf8-safe $this->parseUrl such as Joomla’s version.

    (c) 2013 Thomas David Baker, MIT License

     * Return true if url is valid, false otherwise.
     * Note that this is not the RFC definiton of a valid URL.  For example we
     * differ from the RFC in only accepting http and https URLs, not accepting
     * single word hosts, and accepting any characters in hostnames (as modern
     * browsers will punycode translate them to ASCII automatically).
     * @param string $url Url to validate.  Must include 'scheme://' to have any
     *                    chance of validating.
     * @return boolean
    public function validUrl($url) {
        $parts = $this->parseUrl($url);

        // We must be able to recognize this as some form of URL.
        if (!$parts) {
            return false;

        // SCHEME.
        // Must be qualified with a scheme.
        if (!isset($parts['scheme']) || !$parts['scheme']) {
            return false;
        // Only http and https are acceptable.  No ftp or similar.
        if (!in_array($parts['scheme'], ['http', 'https'])) {
            return false;

        // If a URL has unrecognized bits then it is not valid - for example the
        // 'z' in ''.
        // This check invalidates URLs that use a user - we don't allow those.
        $partsCheck = $parts;
        $partsCheck['scheme'] .= '://';
        if (isset($partsCheck['port'])) {
            $partsCheck['port'] = ':' . $partsCheck['port'];
        if (isset($partsCheck['query'])) {
            $partsCheck['query'] = '?' . $partsCheck['query'];
        if (isset($partsCheck['fragment'])) {
            $partsCheck['fragment'] = '#' . $partsCheck['fragment'];
        if (implode('', $partsCheck) !== $url) {
            return false;

        // HOST.
        if (!isset($parts['host']) || !$parts['host']) {
            return false;
        // Single word hosts are not acceptable.
        if (strpos($parts['host'], '.') === false) {
            return false;
        if (strpos($parts['host'], ' ') !== false) {
            return false;
        if (strpos($parts['host'], '--') !== false) {
            return false;
        if (strpos($parts['host'], '-') === 0) {
            return false;
        // Cope with internationalized domain names.
        $host = idn_to_ascii($parts['host']);

        $hostSegments = explode('.', $host);
        // The IANA lists TLDs in uppercase, so we do too.
        $tld = mb_strtoupper(array_pop($hostSegments));
        if (!$tld) {
            return false;
        if (!in_array(mb_strtoupper($tld), static::$validTlds)) {
            return false;
        $domain = array_pop($hostSegments);
        if (!$domain) {
            return false;

        // PATH.
        if (isset($parts['path']) && substr($parts['path'], 0, 1) !== '/') {
            return false;

        // If you made it this far you're golden.
        return true;

Looking at the list of interesting URLs from and elsewhere it allows all of the following:

And disallows all of these:

# Invalid URLs
http://##/ should be encoded
:// should fail quux
# The following URLs are valid by the letter of the law but we don't want to allow them.

parse_url Is Not UTF-8 Safe

Handily the good folks at Joomla have written a UTF-8 safe version:

	 * Does a UTF-8 safe version of PHP parse_url function
	 * @param   string  $url  URL to parse
	 * @return  mixed  Associative array or false if badly formed URL.
	 * @see
	 * @since   11.1
	public static function parse_url($url)
		$result = false;

		// Build arrays of values we need to decode before parsing
		$entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%24', '%2C', '%2F', '%3F', '%23', '%5B', '%5D');
		$replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "$", ",", "/", "?", "#", "[", "]");

		// Create encoded URL with special URL characters decoded so it can be parsed
		// All other characters will be encoded
		$encodedURL = str_replace($entities, $replacements, urlencode($url));

		// Parse the encoded URL
		$encodedParts = parse_url($encodedURL);

		// Now, decode each value of the resulting array
		if ($encodedParts)
			foreach ($encodedParts as $key => $value)
				$result[$key] = urldecode(str_replace($replacements, $entities, $value));
		return $result;

Although non-ASCII characters are not legal in URLs if you want to parse possibly wonky data or internationalized (例子.测试) and other non-ASCII URLs (✪ that translate to ASCII via Punycode then this is very handy.

Keeping Control of Your Z-Indexes With LESS

A colleague of mine recently took advantage of LESS in a simple but very useful way.

/* Relatively positioned dialog overlay. */
@z-index-dlg-rel-overlay: -2;

/* Position of box shadow to dialog. */
@z-index-dlg-box-shadow: -1;

/* Close button on dialog boxes. */
@z-index-dlg-close: 1;

/* Overlay on the dialog itself making it "disabled." */
@z-index-dlg-disabled: 2;

/* Modal dialog position. */
@z-index-dialog: 100;

/* Main window overlay for dialog. */
@z-index-dlg-overlay: 1000;

/* Position of main (topmost) dialog. */
@z-index-dlg-main: 10000;

/* Main content loading spinner. */
@z-index-cnt-loading: 1;

/* Position of orangle flag triangle on auth pages. */
@z-index-ath-triangle: 1;

/* Absolutely positioned delete (X) button. */
@z-index-btn-delete: 10;

/* Position of downward triangle on content title. */
@z-index-tri-title: 5;

/* Position drawers below dialogs. */
@z-index-drawer: @z-index-dialog - 1;

/* Position of floating navbar. */
@z-index-navbar: 1000;

/* Position of popup about menu. */
@z-index-nav-about: 100;

/* Position of search box on nav menu. */
@z-index-nav-search: 1;

He gathered every z-index on the site into a single LESS file as a variable. Now whenever you add something with a z-index you can think about “yes, I want it on top but not over the floating nav or a modal dialog” rather than just jamming 999999 and hoping for the best. Neat!

Vertically Centering Text in a Fixed-Size Element With No Overflow

Most of the blog posts/tutorials on the web that show you how to center text vertically using either line-height (only works for a single line of text) or using display: table-cell which allows the text to exceed a fixed size (table cells are allowed to get taller than their height if the text is long). Here’s a version that clips the overflow by surrounding the display: table-cell element in a containing div.

Vertically centered text in fixed sized boxes

    /* Wrapper exists to prevent long text exceeding the bounding box. */
    .wrapper {
        /* height and width here act as a maximum for when the text is very long. */
        height: 100px;
        width: 100px;
        overflow: hidden;
        background: #eee; /* just so we can see what's going on in the demo */
    /* Cell does the work of allowing us to vertical-align: middle. */
    .cell {
        /* height and width here act as a minimum for when the text is not long. */
        height: 100px; /* subtract any padding on wrapper from height and width here */
        width: 100px;
        display: table-cell;
        text-align: center;
        vertical-align: middle;

<div class="wrapper">
        <div class="cell">
            This is a test.
<div class="wrapper">
        <div class="cell">
            This is a test with much longer text.  This is a test with much longer text.  This is a test with much longer text.  This is a test with much longer text.  This is a test with much longer text.

How to Get Reasonably Priced Pay As You Go Data (and Calls and Texts) on an Unlocked iPhone in the US

Here is a system that has worked for me in the USA on an iPhone 4 and an iPhone 5 in July 2012 and February 2013. Both phones were not locked to any particular carrier but they were not jailbroken.

  1. Visit an AT&T store or otherwise get a GoPhone SIM that fits your phone (micro sim or nano sim). Choose whatever rate plan suits your needs. I’m partial to the $25 for 30 days unlimited texts and 250 minutes. Note that the salesperson will almost certainly tell you that you won’t be able to get data on this plan – they are wrong just say ok!
  2. Connect your phone to wifi somewhere (Starbucks if all else fails!) Visit through your phone’s browser. This site is going to make twiddling your internet settings a lot easier – it doesn’t jailbreak or carrier-unlock your phone it just changes internet access settings in a simple way. Choose Create APN, “USA” and “GoPhone”. Press “Install” to install the settings.
  3. Set up a PIN for your account by getting it texted to the phone. You can choose “Forgot Password” from the GoPhone login page to set it up.
  4. Sign in to your GoPhone account and add a data package (200MB for 30 days for $15 for light usage or 1GB for 30 days for $25 for heavier usage). You’ll need a credit/debit card to do this entirely online or you can use a refill card you bought in a store.

And that’s it. You may need to remove the APN settings that you installed when/if you leave the US.

How to Create Movie Barcodes

How to make Movie Barcodes (OS X)

Inspired by the movie barcode tumblr I wanted to make some barcodes of my wife’s films.

Here’s one I made for her latest short:

Twinkle, Twinkle Barcode
Twinkle, Twinkle Barcode

I found some instructions on Mr. Reid’s site and with the help of the comments and some tinkering I came up with this final process on OSX.

You will need mplayer and ImageMagick installed, both of which can be found in MacPorts.

$ mkdir /tmp/barcode
$ cd /tmp/barcode
$ mplayer -framedrop -speed 100 -vf framestep=90 -nosound -vo jpeg [path/to/movie-file]
$ mogrify *.jpg -resize 1x288\! *.jpg
$ montage -geometry +0+0 -tile x1 *.jpg barcode.png

Deployment – If It Hurts, Do It Again

If deployment hurts, do it again.

Frequent painful deployments force you to automate and improve the process.

Your development environment may be wonderful. Your test coverage may be high. But your code has to go live. If that’s an infrequent process of giant merges and close-your-eyes-and-hope code pushes you risk bugs. “Little and often” reduces risk and decreases pain.

The logical conclusion: continuous deployment.

git add -p

I tell people to use git add -p not git add FILE or git add .

The benefits are huge:

  1. You decide exactly what’s in your commit. Any later inspection of your commit is improved — code review, git blame, git log, git bisect — everything. No more “Changed WidgetManager to use Widgets not Sprockets, corrected docs for SprocketManager, deleted four unnecessary files and added a new log image.”
  2. You review your changes before committing. Another chance to spot bugs or realize that your documentation doesn’t make sense or that you left in a TODO or a debug statement.
  3. You won’t accidentally commit work in progress.

You aren’t welcome on my project if you don’t use git add -p! And git add ., frankly, should be an error!

Recruitment Emails

Recruiter emails are getting better.

This was where we used to be:

I wanted to shoot you a quick note on hot new Silicon Valley startup
MonkeyLab. MonkeyLab is x months old, has y million dollars in funding
and is going to transform the home monkey-rearing market.

When can we talk on the phone?

Now the low bar is a mention of my linkedin profile and some skill I have that they need.

The high bar is an email that isn’t from the recruiter but from the hiring manager. Mentioning specific things that are only true about me. These get a response at least.


My name is Bob and I’m not a recruiter. I run the Advanced Something group
at MonkeyLabs.

I see from your LinkedIn profile that you are a ruby programmer and that you
have some experience of Erlang. We’re building an Erlang program with
ruby web app interface and we’d love to add you to the team.

I’m building for the future so even if you aren’t looking for an opportunity
now perhaps we can have a chat?

Does this represent a newfound earnestness on the part of those in charge of recruitment in Silicon Valley? Or is it just an arms race between my personal filter and the content of their emails? An arms race they can win because of the increasing amount of personal information available about me on the internet. In some sense it doesn’t matter. Forcing this kind of more sophisticated personal email that explains why I might be interested makes the initial interaction much more valuable either way.

Now I’m just waiting for this email:

Hi Tom,

My name is Paul Graham. I’m not a recruiter.

We’re looking for a tall developer with blue eyes who wants to
simultaneously build the next generation Magic: the Gathering app and do
something worthy-yet-technical like Kiva while being fabulously

The app is going to be written entirely in Haskell. You will work alongside
Simon Peyton-Jones, Steve Yegge and Aaron Swartz.

In your spacious private offices in London, San Francisco and Tokyo there
will be live music from the Decemberists and Amanda Palmer.

Here’s my personal cell number, call any time of day or night.



PS Spectrum BASIC > Lisp