Handily the good folks at Joomla have written a UTF-8 safe version:
/** * Does a UTF-8 safe version of PHP parse_url function * * @param string $url URL to parse * * @return mixed Associative array or false if badly formed URL. * * @see http://us3.php.net/manual/en/function.parse-url.php * @since 11.1 */ public static function parse_url($url) { $result = false; // Build arrays of values we need to decode before parsing $entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%24', '%2C', '%2F', '%3F', '%23', '%5B', '%5D'); $replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "$", ",", "/", "?", "#", "[", "]"); // Create encoded URL with special URL characters decoded so it can be parsed // All other characters will be encoded $encodedURL = str_replace($entities, $replacements, urlencode($url)); // Parse the encoded URL $encodedParts = parse_url($encodedURL); // Now, decode each value of the resulting array if ($encodedParts) { foreach ($encodedParts as $key => $value) { $result[$key] = urldecode(str_replace($replacements, $entities, $value)); } } return $result; }
Although non-ASCII characters are not legal in URLs if you want to parse possibly wonky data or internationalized (例子.测试) and other non-ASCII URLs (✪df.ws) that translate to ASCII via Punycode then this is very handy.
One Reply to “parse_url Is Not UTF-8 Safe”