PHP UTF 编码的 URL 字符串
当我在 Firefox 中(在地址行中)输入类似 http://www.example.com/?query=Траливали,会自动编码为< a href="http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8" rel="nofollow noreferrer">http://www.example .com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8。
但 URL 像 http://www.example.com/#ajax_call?query=Траливали 未转换。
其他浏览器如IE8根本不转换查询。
问题是:如何检测(在 PHP 中)查询是否已编码?如何解码呢?
我尝试过:
$str = iconv('cp1251', 'utf-8', urldecode($str) );
$str = utf8_decode(urldecode($str));
$str = (urldecode($str));
来自 http://php.net/manual/en/function 的许多函数。 urldecode.php 没有任何效果。
测试:
$str = $_GET['str'];
d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8') );
d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);
d('Траливали' == $str);
d(urldecode($str));
d(utf8_decode(urldecode($str)));
!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!
返回:
[假] [错误的] [错误的] ������� ???? [true]
某种解决方案: http://www.example.com/Траливали/ - 将查询作为 url 部分发送并使用 mod_rewrite 进行解析。
When I type in Firefox (in the address line) URL like http://www.example.com/?query=Траливали, it is automatically encoded to http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8.
But URL like http://www.example.com/#ajax_call?query=Траливали is not converted.
Other browsers such as IE8 do not convert query at all.
The question is: how to detect (in PHP) if query is encoded? How to decode it?
I've tried:
$str = iconv('cp1251', 'utf-8', urldecode($str) );
$str = utf8_decode(urldecode($str));
$str = (urldecode($str));
many functions from http://php.net/manual/en/function.urldecode.php
Nothing works.
Test:
$str = $_GET['str'];
d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8'));
d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);
d('Траливали' == $str);
d(urldecode($str));
d(utf8_decode(urldecode($str)));
!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!
Returns:
[false]
[false]
[false]
���������
????
[true]
Some kind of a solution: http://www.example.com/Траливали/ - send a query as a url part and parse with mod_rewrite.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在片段无效后,它不会被转换为具有 URL 的
query
部分。RFC 3986 定义 URI 由以下部分组成
:无法更改。因此,
会妥善处理,
不会妥善处理。如果我们查看
URL2
,IE 实际上会通过将片段检测为#ajax_call?query=Траливали
来正确处理 URL,而无需查询。片段总是最后并且从不发送到服务器。IE 将正确编码
URL1
的查询组件,因为它将检测到它作为查询。至于 PHP 中的解码,
%D2
等会在$_GET['query']
变量中自动解码。之所以没有正确填充$_GET
变量,是因为在URL2
中,没有按照标准进行查询。另外,最后一件事......当执行
'Траливали' == $_GET['query']
时,只有当您的 PHP 脚本本身以 UTF-8 编码时,这才会成立。您的文本编辑器应该能够告诉您文件的编码。It is not converted as having the
query
part of the URL after the fragment is not valid.RFC 3986 defines a URI as composed of the following parts:
The order cannot be changed. Therefore,
will be handled properly while
will not. If we look at
URL2
, IE actually handles the URL properly by detecting the fragment as#ajax_call?query=Траливали
without a query. Fragment is always last and are never sent to the server.IE will properly encode the query component of
URL1
as it will detect it as a query.As for decoding in PHP,
%D2
and similar is automatically decoded in the$_GET['query']
variable. The reason why the$_GET
variable was not properly populated was because inURL2
, there is no query according to the standard.Also, one last thing... when doing
'Траливали' == $_GET['query']
, this will only be true if your PHP script itself is encoded in UTF-8. Your text editor should be able to tell you the encoding of your file.但这实际上应该已经由 php 完成了;-)
编辑 你说“没有任何作用” - 你在尝试什么?如果文本没有按照您想要的方式显示在屏幕上,例如,当您
echo $_GET['query'];
时,您的问题可能是您为发送回的页面指定的编码浏览器。添加一行
,看看是否有帮助。
but this should actually have been done already by php ;-)
edit you're stating "nothing works" - what are you trying? if the text doesn't appear on screen as you want it, when you
echo $_GET['query'];
for example, your problem might be the encoding you are specifying for the page sent back to the browser.Include a line
and see if it helps.
遗憾的是,片段的编码方式取决于浏览器:
关于浏览器在将国际(读取:非 ASCII)字符转换为
%nn
转义序列之前使用什么编码来编码的问题,“大多数浏览器通过发送 UTF-默认情况下,在 URL 栏中手动输入的任何文本上都包含 8 个数据,并在所有后续链接上使用页面编码。” (相同来源)。How the fragment is encoded, is unfortunately, browser-dependent:
As to the question of what encoding the browser uses to encode international (read: non-ASCII) characters before converting them to
%nn
escape sequences, "most browsers deal with this by sending UTF-8 data by default on any text entered in the URL bar by hand, and using page encoding on all followed links." (same source).您可以使用
UTF8::autoconvert_request()
来实现此目的。查看http://code.google.com/p/php5-utf8/ 了解更多信息。
You could use
UTF8::autoconvert_request()
for this.Take a look at http://code.google.com/p/php5-utf8/ for more information.
URL 仅限于某些 ascii 字符。非 url 友好字符应该是 url 编码的(您看到的 %hh 编码)。某些浏览器可能会自动对 addr 行中出现的 url 进行编码。
URLs are limited to certain ascii chars. Non-url friendly chars are supposed to be url-encoded (the %hh encoding you see). Some browsers might automatically encode urls that appear on the addr line.
答案很简单:字符串始终被编码。正如 HTTP 标准中所述。
Firefox 显示是什么 - 这并不重要。
此外,由于 PHP 自动解码查询字符串,因此也不需要解码。
请注意,“%D2%F0%E0%EB%E8%E2%E0%EB%E8”是单字节编码,因此,您的页面可能是 1251。至少 HTTP 标头向浏览器说明了这一点。
而AJAX总是使用utf-8。
因此,您只需为页面使用单一编码 (utf-8),或者区分 ajax 调用和常规调用。
至于片段 - 不要使用片段值将其发送到服务器。有一个 JS 变量,然后使用它两次 - 设置片段并使用 JSON 发送到服务器。
The answer is easy: string being encoded always. As it's stated in the HTTP standard.
And what is firefox displays - it doesn't matter.
Also, as PHP decode query string automatically, no decoding required either.
Note that '%D2%F0%E0%EB%E8%E2%E0%EB%E8' is single-byte encoding, so, you have your page probably in 1251. At least HTTP header says that to the browser.
While AJAX always use utf-8.
So, you have just to either use single encoding (utf-8) for your pages, or distinguish ajax calls from regular ones.
As for the fragment - do not use a fragment value to send it to the server. Have a JS variable, and then use it twice - to set a fragment and to send to the server using JSON.
RFC 1738 规定,只有字母数字、特殊字符
$-_.+!*'(),"
和保留字符;/?:@=&
在URL。无论 PHP 是否自动解码查询字符串,都可以使用 rawurldecode() 进行编码。双重解码不会有任何危险。RFC 1738 states that only alphanumerics, the special characters
$-_.+!*'(),"
and reserved characters;/?:@=&
are unencoded within a URL. Everything else is encoded by the HTTP client, i.e. Web browser. You can use rawurldecode() whether or not PHP automatically decodes the query string. There's no danger in double-decoding.