Handily the good folks at Joomla have written a UTF-8 safe version:
/**
* Does a UTF-8 safe version of PHP parse_url function
*
* @param string $url URL to parse
*
* @return mixed Associative array or false if badly formed URL.
*
* @see http://us3.php.net/manual/en/function.parse-url.php
* @since 11.1
*/
public static function parse_url($url)
{
$result = false;
// Build arrays of values we need to decode before parsing
$entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%24', '%2C', '%2F', '%3F', '%23', '%5B', '%5D');
$replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "$", ",", "/", "?", "#", "[", "]");
// Create encoded URL with special URL characters decoded so it can be parsed
// All other characters will be encoded
$encodedURL = str_replace($entities, $replacements, urlencode($url));
// Parse the encoded URL
$encodedParts = parse_url($encodedURL);
// Now, decode each value of the resulting array
if ($encodedParts)
{
foreach ($encodedParts as $key => $value)
{
$result[$key] = urldecode(str_replace($replacements, $entities, $value));
}
}
return $result;
}
Although non-ASCII characters are not legal in URLs if you want to parse possibly wonky data or internationalized (例子.测试) and other non-ASCII URLs (✪df.ws) that translate to ASCII via Punycode then this is very handy.

One Reply to “parse_url Is Not UTF-8 Safe”