bluebones.net

May 7, 2013

There’s a lot of code out there that deals with URL validation. Basically all of it is concerned with “does this URL meet the RFC spec?” That’s actually not that interesting a question if you are validating user input. You don’t want ‘gopher://whatever/’ or ‘http://10.1.2.3’ as valid URLs in your system if you have asked the user for the address of a web page.

What’s more, because of punycode-powered internationalized URLs most URL validating code will tell you that real URLs that people can use today are invalid (PHP’s parse_url is not even utf8 safe).

Here’s some PHP code that validates URLs in a more practical way. It uses the list of TLDs in static::$validTlds from the IANA list of valid TLDs and assumes the presence of a utf8-safe $this->parseUrl such as Joomla’s version.

    (c) 2013 Thomas David Baker, MIT License

    /**
     * Return true if url is valid, false otherwise.
     *
     * Note that this is not the RFC definiton of a valid URL.  For example we
     * differ from the RFC in only accepting http and https URLs, not accepting
     * single word hosts, and accepting any characters in hostnames (as modern
     * browsers will punycode translate them to ASCII automatically).
     *
     * @param string $url Url to validate.  Must include 'scheme://' to have any
     *                    chance of validating.
     *
     * @return boolean
     */
    public function validUrl($url) {
        $parts = $this->parseUrl($url);

        // We must be able to recognize this as some form of URL.
        if (!$parts) {
            return false;
        }

        // SCHEME.
        // Must be qualified with a scheme.
        if (!isset($parts['scheme']) || !$parts['scheme']) {
            return false;
        }
        // Only http and https are acceptable.  No ftp or similar.
        if (!in_array($parts['scheme'], ['http', 'https'])) {
            return false;
        }

        // CHECK FOR 'EXTRA PARTS'.
        // If a URL has unrecognized bits then it is not valid - for example the
        // 'z' in 'www.google.com:80z'.
        // This check invalidates URLs that use a user - we don't allow those.
        $partsCheck = $parts;
        $partsCheck['scheme'] .= '://';
        if (isset($partsCheck['port'])) {
            $partsCheck['port'] = ':' . $partsCheck['port'];
        }
        if (isset($partsCheck['query'])) {
            $partsCheck['query'] = '?' . $partsCheck['query'];
        }
        if (isset($partsCheck['fragment'])) {
            $partsCheck['fragment'] = '#' . $partsCheck['fragment'];
        }
        if (implode('', $partsCheck) !== $url) {
            return false;
        }

        // HOST.
        if (!isset($parts['host']) || !$parts['host']) {
            return false;
        }
        // Single word hosts are not acceptable.
        if (strpos($parts['host'], '.') === false) {
            return false;
        }
        if (strpos($parts['host'], ' ') !== false) {
            return false;
        }
        if (strpos($parts['host'], '--') !== false) {
            return false;
        }
        if (strpos($parts['host'], '-') === 0) {
            return false;
        }
        // Cope with internationalized domain names.
        $host = idn_to_ascii($parts['host']);

        $hostSegments = explode('.', $host);
        // The IANA lists TLDs in uppercase, so we do too.
        $tld = mb_strtoupper(array_pop($hostSegments));
        if (!$tld) {
            return false;
        }
        if (!in_array(mb_strtoupper($tld), static::$validTlds)) {
            return false;
        }
        $domain = array_pop($hostSegments);
        if (!$domain) {
            return false;
        }

        // PATH.
        if (isset($parts['path']) && substr($parts['path'], 0, 1) !== '/') {
            return false;
        }

        // If you made it this far you're golden.
        return true;
    }

Looking at the list of interesting URLs from http://mathiasbynens.be/demo/url-regex and elsewhere it allows all of the following:

http://foo.com/blah_blah
http://foo.com/blah_blah/
http://foo.com/blah_blah_(wikipedia)
http://foo.com/blah_blah_(wikipedia)_(again)
http://www.example.com/wpstyle/?p=364
https://www.example.com/foo/?bar=baz&inga=42&quux
http://✪df.ws/123
http://➡.ws/䨹
http://⌘.ws
http://⌘.ws/
http://foo.com/blah_(wikipedia)#cite-1
http://foo.com/blah_(wikipedia)_blah#cite-1
http://foo.com/unicode_(✪)_in_parens
http://foo.com/(something)?after=parens
http://☺.damowmow.com/
http://code.google.com/events/#&product=browser
http://j.mp
http://foo.com/?q=Test%20URL-encoded%20stuff
http://مثال.إختبار
http://例子.测试
http://उदाहरण.परीक्षा
http://1337.net
http://a.b-c.de

And disallows all of these:

# Invalid URLs
http://
http://.
http://..
http://../
http://?
http://??
http://??/
http://#
http://##
http://##/
http://foo.bar?q=Spaces should be encoded
//
//a
///a
///
http:///a
foo.com
rdar://1234
h://test
http:// shouldfail.com
:// should fail
http://foo.bar/foo(bar)baz quux
ftps://foo.bar/
http://-error-.invalid/
http://a.b--c.de/
http://-a.b.co
http://0.0.0.0
http://10.1.1.0
http://10.1.1.255
http://224.1.1.1
http://1.1.1.1.1
http://123.123.123
http://3628126748
http://.www.foo.bar/
http://www.foo.bar./
http://.www.foo.bar./
http://10.1.1.1
http://10.1.1.254
# The following URLs are valid by the letter of the law but we don't want to allow them.
http://userid:password@example.com:8080
http://userid:password@example.com:8080/
http://userid@example.com
http://userid@example.com/
http://userid@example.com:8080
http://userid@example.com:8080/
http://userid:password@example.com
http://userid:password@example.com/
http://-.~_!$&\'()*+,;=:%40:80%2f::::::@example.com
http://142.42.1.1/
http://142.42.1.1:8080/
http://223.255.255.254
ftp://foo.bar/baz

April 6, 2013April 6, 2013

parse_url Is Not UTF-8 Safe

Handily the good folks at Joomla have written a UTF-8 safe version:

        /**
	 * Does a UTF-8 safe version of PHP parse_url function
	 *
	 * @param   string  $url  URL to parse
	 *
	 * @return  mixed  Associative array or false if badly formed URL.
	 *
	 * @see     http://us3.php.net/manual/en/function.parse-url.php
	 * @since   11.1
	 */
	public static function parse_url($url)
	{
		$result = false;

		// Build arrays of values we need to decode before parsing
		$entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%24', '%2C', '%2F', '%3F', '%23', '%5B', '%5D');
		$replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "$", ",", "/", "?", "#", "[", "]");

		// Create encoded URL with special URL characters decoded so it can be parsed
		// All other characters will be encoded
		$encodedURL = str_replace($entities, $replacements, urlencode($url));

		// Parse the encoded URL
		$encodedParts = parse_url($encodedURL);

		// Now, decode each value of the resulting array
		if ($encodedParts)
		{
			foreach ($encodedParts as $key => $value)
			{
				$result[$key] = urldecode(str_replace($replacements, $entities, $value));
			}
		}
		return $result;
	}

Although non-ASCII characters are not legal in URLs if you want to parse possibly wonky data or internationalized (例子.测试) and other non-ASCII URLs (✪df.ws) that translate to ASCII via Punycode then this is very handy.

March 23, 2013

Keeping Control of Your Z-Indexes With LESS

A colleague of mine recently took advantage of LESS in a simple but very useful way.

/* Relatively positioned dialog overlay. */
@z-index-dlg-rel-overlay: -2;

/* Position of box shadow to dialog. */
@z-index-dlg-box-shadow: -1;

/* Close button on dialog boxes. */
@z-index-dlg-close: 1;

/* Overlay on the dialog itself making it "disabled." */
@z-index-dlg-disabled: 2;

/* Modal dialog position. */
@z-index-dialog: 100;

/* Main window overlay for dialog. */
@z-index-dlg-overlay: 1000;

/* Position of main (topmost) dialog. */
@z-index-dlg-main: 10000;

/* Main content loading spinner. */
@z-index-cnt-loading: 1;

/* Position of orangle flag triangle on auth pages. */
@z-index-ath-triangle: 1;

/* Absolutely positioned delete (X) button. */
@z-index-btn-delete: 10;

/* Position of downward triangle on content title. */
@z-index-tri-title: 5;

/* Position drawers below dialogs. */
@z-index-drawer: @z-index-dialog - 1;

/* Position of floating navbar. */
@z-index-navbar: 1000;

/* Position of popup about menu. */
@z-index-nav-about: 100;

/* Position of search box on nav menu. */
@z-index-nav-search: 1;

He gathered every z-index on the site into a single LESS file as a variable. Now whenever you add something with a z-index you can think about “yes, I want it on top but not over the floating nav or a modal dialog” rather than just jamming 999999 and hoping for the best. Neat!

March 9, 2013

Vertically Centering Text in a Fixed-Size Element With No Overflow

Most of the blog posts/tutorials on the web that show you how to center text vertically using either line-height (only works for a single line of text) or using display: table-cell which allows the text to exceed a fixed size (table cells are allowed to get taller than their height if the text is long). Here’s a version that clips the overflow by surrounding the display: table-cell element in a containing div.

<style>
    /* Wrapper exists to prevent long text exceeding the bounding box. */
    .wrapper {
        /* height and width here act as a maximum for when the text is very long. */
        height: 100px;
        width: 100px;
        overflow: hidden;
        background: #eee; /* just so we can see what's going on in the demo */
    }
    /* Cell does the work of allowing us to vertical-align: middle. */
    .cell {
        /* height and width here act as a minimum for when the text is not long. */
        height: 100px; /* subtract any padding on wrapper from height and width here */
        width: 100px;
        display: table-cell;
        text-align: center;
        vertical-align: middle;
    }
</style>

<div class="wrapper">
        <div class="cell">
            This is a test.
        </div>
</div>
<hr>
<div class="wrapper">
        <div class="cell">
            This is a test with much longer text.  This is a test with much longer text.  This is a test with much longer text.  This is a test with much longer text.  This is a test with much longer text.
    </div>
</div>

February 24, 2013February 24, 2013

How to Get Reasonably Priced Pay As You Go Data (and Calls and Texts) on an Unlocked iPhone in the US

Here is a system that has worked for me in the USA on an iPhone 4 and an iPhone 5 in July 2012 and February 2013. Both phones were not locked to any particular carrier but they were not jailbroken.

Visit an AT&T store or otherwise get a GoPhone SIM that fits your phone (micro sim or nano sim). Choose whatever rate plan suits your needs. I’m partial to the $25 for 30 days unlimited texts and 250 minutes. Note that the salesperson will almost certainly tell you that you won’t be able to get data on this plan – they are wrong just say ok!
Connect your phone to wifi somewhere (Starbucks if all else fails!) Visit unlockit.co.nz through your phone’s browser. This site is going to make twiddling your internet settings a lot easier – it doesn’t jailbreak or carrier-unlock your phone it just changes internet access settings in a simple way. Choose Create APN, “USA” and “GoPhone”. Press “Install” to install the settings.
Set up a PIN for your account by getting it texted to the phone. You can choose “Forgot Password” from the GoPhone login page to set it up.
Sign in to your GoPhone account and add a data package (200MB for 30 days for $15 for light usage or 1GB for 30 days for $25 for heavier usage). You’ll need a credit/debit card to do this entirely online or you can use a refill card you bought in a store.

And that’s it. You may need to remove the APN settings that you installed when/if you leave the US.

January 29, 2013

How to Create Movie Barcodes

How to make Movie Barcodes (OS X)

Inspired by the movie barcode tumblr I wanted to make some barcodes of my wife’s films.

Here’s one I made for her latest short:

I found some instructions on Mr. Reid’s site and with the help of the comments and some tinkering I came up with this final process on OSX.

You will need mplayer and ImageMagick installed, both of which can be found in MacPorts.

$ mkdir /tmp/barcode
$ cd /tmp/barcode
$ mplayer -framedrop -speed 100 -vf framestep=90 -nosound -vo jpeg [path/to/movie-file]
$ mogrify *.jpg -resize 1x288\! *.jpg
$ montage -geometry +0+0 -tile x1 *.jpg barcode.png

January 14, 2013January 14, 2013

Deployment – If It Hurts, Do It Again

If deployment hurts, do it again.

Frequent painful deployments force you to automate and improve the process.

Your development environment may be wonderful. Your test coverage may be high. But your code has to go live. If that’s an infrequent process of giant merges and close-your-eyes-and-hope code pushes you risk bugs. “Little and often” reduces risk and decreases pain.

The logical conclusion: continuous deployment.

November 27, 2012November 27, 2012

Historical Visa Bulletin Status for EB-2 Rest of World

(Not China/India/Mexico/Philippines)

Dec 2012	C
Nov 2012	C
Oct 2012	01JAN12
Sep 2012	01JAN09
Aug 2012	01JAN09
Jul 2012	01JAN09
Oct 2007 to Jun 2012	C
Sep 2007	01JAN07
Aug 2007	U
Apr 2002 to Jul 2007	C

Source: http://www.travel.state.gov/visa/bulletin/bulletin_1770.html

November 26, 2012

git add -p

I tell people to use git add -p not git add FILE or git add .

The benefits are huge:

You decide exactly what’s in your commit. Any later inspection of your commit is improved — code review, git blame, git log, git bisect — everything. No more “Changed WidgetManager to use Widgets not Sprockets, corrected docs for SprocketManager, deleted four unnecessary files and added a new log image.”
You review your changes before committing. Another chance to spot bugs or realize that your documentation doesn’t make sense or that you left in a TODO or a debug statement.
You won’t accidentally commit work in progress.

You aren’t welcome on my project if you don’t use git add -p! And git add ., frankly, should be an error!

Posts

Learning Clojure

Practical URL Validation