解决 Codeigniter 1.7.2 中文 URL 问题

继承 Codeigniter URI 的 URI 类,在 application/libraries/ 下创建 MY_URI.php:

<?php

class MY_URI extends CI_URI {

	/**
	 * Filter segments for malicious characters
	 *
	 * @access	private
	 * @param	string
	 * @return	string
	 */
	function _filter_uri($str)
	{
		if ($str != '' && $this->config->item('permitted_uri_chars') != '' && $this->config->item('enable_query_strings') == FALSE)
		{
			mb_regex_encoding($this->config->item('permitted_uri_chars_encoding'));
			if ( ! mb_eregi("^[".$this->config->item('permitted_uri_chars')."]+$", $str))
			{
				show_error('The URI you submitted has disallowed characters.', 400);
			}
		}

		// Convert programatic characters to entities
		$bad	= array('$', 		'(', 		')',	 	'%28', 		'%29');
		$good	= array('&#36;',	'&#40;',	'&#41;',	'&#40;',	'&#41;');

		return str_replace($bad, $good, $str);
	}

}

修改 config.php 配置:

$config['permitted_uri_chars'] = 'a-z 0-9~%.:_\-\x{2E80}-\x{9fa5}';
$config['permitted_uri_chars_encoding'] = 'utf8';

permitted_uri_chars_encoding 是我新增的,表示 URL 的字符编码。
我整站使用 utf8,所以这里设置 permitted_uri_chars_encoding 为 utf8。

原理?这是什么原理?
不要问我原理,自己去看代码去,哥最近比较忙。

— EOF —

Nginx 下架设 Codeigniter 使用 ORG_PATH_INFO

我的 Codeigniter 中原来设置的 uri_protocol 是 PATH_INFO,我的 nginx 的配置可以看:
http://hily.me/blog/2010/02/nginx-path-info/
今天发现在 uri 中带中文的时候,PATH_INFO 会被截断,如下:

array(42) {
[“HOSTNAME”]=>
string(0) “”
[“PATH”]=>
string(28) “/usr/local/bin:/usr/bin:/bin”
[“TMP”]=>
string(4) “/tmp”
[“TMPDIR”]=>
string(4) “/tmp”
[“TEMP”]=>
string(4) “/tmp”
[“OSTYPE”]=>
string(0) “”
[“MACHTYPE”]=>
string(0) “”
[“MALLOC_CHECK_”]=>
string(1) “2”
[“USER”]=>
string(3) “www”
[“HOME”]=>
string(9) “/home/www”
[“FCGI_ROLE”]=>
string(9) “RESPONDER”
[“QUERY_STRING”]=>
string(0) “”
[“REQUEST_METHOD”]=>
string(3) “GET”
[“CONTENT_TYPE”]=>
string(0) “”
[“CONTENT_LENGTH”]=>
string(0) “”
[“SCRIPT_FILENAME”]=>
string(33) “/work/www/aimon/webapps/index.php”
[“SCRIPT_NAME”]=>
string(12) “/%e6%b5%8b%e”
[“PATH_INFO”]=>
string(7) “8%af%95”
[“REQUEST_URI”]=>
string(19) “/%E6%B5%8B%E8%AF%95”
[“DOCUMENT_URI”]=>
string(17) “/index.php/测试”
[“DOCUMENT_ROOT”]=>
string(23) “/work/www/aimon/webapps”
[“SERVER_PROTOCOL”]=>
string(8) “HTTP/1.1”
[“GATEWAY_INTERFACE”]=>
string(7) “CGI/1.1”
[“SERVER_SOFTWARE”]=>
string(12) “nginx/0.7.64”
[“REMOTE_ADDR”]=>
string(14) “192.168.16.158”
[“REMOTE_PORT”]=>
string(4) “3077”
[“SERVER_ADDR”]=>
string(13) “192.168.16.67”
[“SERVER_PORT”]=>
string(2) “80”
[“SERVER_NAME”]=>
string(11) “aimon.local”
[“REDIRECT_STATUS”]=>
string(3) “200”
[“HTTP_ACCEPT”]=>
string(191) “image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/x-silverlight, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, */*”
[“HTTP_ACCEPT_LANGUAGE”]=>
string(5) “zh-cn”
[“HTTP_ACCEPT_ENCODING”]=>
string(13) “gzip, deflate”
[“HTTP_USER_AGENT”]=>
string(104) “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; TheWorld)”
[“HTTP_HOST”]=>
string(11) “aimon.local”
[“HTTP_CONNECTION”]=>
string(10) “Keep-Alive”
[“ORIG_PATH_INFO”]=>
string(19) “/%e6%b5%8b%e8%af%95”
[“ORIG_SCRIPT_NAME”]=>
string(19) “/%e6%b5%8b%e8%af%95”
[“ORIG_SCRIPT_FILENAME”]=>
string(40) “/work/www/aimon/webapps/index.php/测试”
[“PATH_TRANSLATED”]=>
string(30) “/work/www/aimon/webapps8%af%95”
[“PHP_SELF”]=>
string(19) “/%e6%b5%8b%e8%af%95”
[“REQUEST_TIME”]=>
int(1265556620)
}

左右检查 nginx 的配置,还是没发现是什么问题,应该是 nginx 的 bug 吧。
后来发现上面的 HTTP 头中有一项:ORIG_PATH_INFO,正是我需要的.
再次打开 CI 的配置文件,发现 CI 也支持 ORIG_PATH_INFO:

/*
|————————————————————————–
| URI PROTOCOL
|————————————————————————–
|
| This item determines which server global should be used to retrieve the
| URI string. The default setting of “AUTO” works for most servers.
| If your links do not seem to work, try one of the other delicious flavors:
|
| ‘AUTO’ Default – auto detects
| ‘PATH_INFO’ Uses the PATH_INFO
| ‘QUERY_STRING’ Uses the QUERY_STRING
| ‘REQUEST_URI’ Uses the REQUEST_URI
| ‘ORIG_PATH_INFO’ Uses the ORIG_PATH_INFO
|
*/
$config[‘uri_protocol’] = “ORIG_PATH_INFO”;

多次测试发现,在 uri 中不包含中文的情况下,CI 工作正常,$_SERVER 中只包含 PATH_INFO,不存在 ORIG_PATH_INFO。
在 uri 中包含中文时,CI 工作异常,$_SERVER 中同时包含 PATH_INFO 和 ORIG_PATH_INFO,但是二者的值不相等,且 ORIG_PATH_INFO 为正确的值。
因此我猜想,ORIG_PATH_INFO 是 php 解析错 PATH_INFO 时才存在的,这时可以取 ORIG_PATH_INFO 的值来代替。
根据这个原理,修改一下 CI 的 URI.php 中的函数 _fetch_uri_string:

	
else
{
	$uri = strtoupper($this->config->item('uri_protocol'));

	if ($uri == 'REQUEST_URI')
	{
		$this->uri_string = $this->_parse_request_uri();
		return;
	}

	$this->uri_string = (isset($_SERVER[$uri])) ? $_SERVER[$uri] : @getenv($uri);
	// added by hily
	if (!$this->uri_string && $uri == 'ORIG_PATH_INFO' && isset($_SERVER['PATH_INFO'])) $this->uri_string = $_SERVER['PATH_INFO'];
}

2010-02-23修改:直接使用 REQUEST_URI即可!

要支持中文,还需要做一个 URL 解码:

function _explode_segments()
{
	foreach(explode("/", preg_replace("|/*(.+?)/*$|", "\\1", $this->uri_string)) as $val)
	{
		// Filter segments for security
		$val = trim($this->_filter_uri($val));

		if ($val != '')
		{
			//$this->segments[] = $val;
			// modified by hily
			$this->segments[] = urldecode($val);
		}
	}
}

— EOF —