PHP UTF 编码的 URL 字符串

发布于 2024-09-11 17:04:38 字数 1636 浏览 13 评论 0原文

当我在 Firefox 中(在地址行中)输入类似 http://www.example.com/?query=Траливали,会自动编码为< a href="http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8" rel="nofollow noreferrer">http://www.example .com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8。

但 URL 像 http://www.example.com/#ajax_call?query=Траливали 未转换。

其他浏览器如IE8根本不转换查询。

问题是:如何检测(在 PHP 中)查询是否已编码?如何解码呢?

我尝试过:

  1. $str = iconv('cp1251', 'utf-8', urldecode($str) );

  2. $str = utf8_decode(urldecode($str));

  3. $str = (urldecode($str));

  4. 来自 http://php.net/manual/en/function 的许多函数。 urldecode.php 没有任何效果。

测试:

$str = $_GET['str'];

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8') );

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);

d('Траливали' == $str);

d(urldecode($str));

d(utf8_decode(urldecode($str)));

!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!

返回:

[假] [错误的] [错误的] ������� ???? [true]

某种解决方案: http://www.example.com/Траливали/ - 将查询作为 url 部分发送并使用 mod_rewrite 进行解析。

When I type in Firefox (in the address line) URL like http://www.example.com/?query=Траливали, it is automatically encoded to http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8.

But URL like http://www.example.com/#ajax_call?query=Траливали is not converted.

Other browsers such as IE8 do not convert query at all.

The question is: how to detect (in PHP) if query is encoded? How to decode it?

I've tried:

  1. $str = iconv('cp1251', 'utf-8', urldecode($str) );

  2. $str = utf8_decode(urldecode($str));

  3. $str = (urldecode($str));

  4. many functions from http://php.net/manual/en/function.urldecode.php
    Nothing works.

Test:

$str = $_GET['str'];

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8'));

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);

d('Траливали' == $str);

d(urldecode($str));

d(utf8_decode(urldecode($str)));

!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!

Returns:

[false]
[false]
[false]
���������
????
[true]

Some kind of a solution: http://www.example.com/Траливали/ - send a query as a url part and parse with mod_rewrite.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

哑剧 2024-09-18 17:04:38

在片段无效后,它不会被转换为具有 URL 的 query 部分。

RFC 3986 定义 URI 由以下部分组成

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment

:无法更改。因此,

URL1: http://www.example.com/?query=Траливали#ajax_call

会妥善处理,

URL2: http://www.example.com/#ajax_call?query=Траливали

不会妥善处理。如果我们查看 URL2,IE 实际上会通过将片段检测为 #ajax_call?query=Траливали 来正确处理 URL,而无需查询。片段总是最后并且从不发送到服务器

IE 将正确编码 URL1 的查询组件,因为它将检测到它作为查询。

至于 PHP 中的解码,%D2 等会在 $_GET['query'] 变量中自动解码。之所以没有正确填充$_GET变量,是因为在URL2中,没有按照标准进行查询。

另外,最后一件事......当执行 'Траливали' == $_GET['query'] 时,只有当您的 PHP 脚本本身以 UTF-8 编码时,这才会成立。您的文本编辑器应该能够告诉您文件的编码。

It is not converted as having the query part of the URL after the fragment is not valid.

RFC 3986 defines a URI as composed of the following parts:

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment

The order cannot be changed. Therefore,

URL1: http://www.example.com/?query=Траливали#ajax_call

will be handled properly while

URL2: http://www.example.com/#ajax_call?query=Траливали

will not. If we look at URL2, IE actually handles the URL properly by detecting the fragment as #ajax_call?query=Траливали without a query. Fragment is always last and are never sent to the server.

IE will properly encode the query component of URL1 as it will detect it as a query.

As for decoding in PHP, %D2 and similar is automatically decoded in the $_GET['query'] variable. The reason why the $_GET variable was not properly populated was because in URL2, there is no query according to the standard.

Also, one last thing... when doing 'Траливали' == $_GET['query'], this will only be true if your PHP script itself is encoded in UTF-8. Your text editor should be able to tell you the encoding of your file.

香草可樂 2024-09-18 17:04:38
rawurldecode($_GET['query']);

但这实际上应该已经由 php 完成了;-)

编辑 你说“没有任何作用” - 你在尝试什么?如果文本没有按照您想要的方式显示在屏幕上,例如,当您 echo $_GET['query']; 时,您的问题可能是您为发送回的页面指定的编码浏览器。

添加一行

header("Content-Type: text/html; charset=utf-8");

,看看是否有帮助。

rawurldecode($_GET['query']);

but this should actually have been done already by php ;-)

edit you're stating "nothing works" - what are you trying? if the text doesn't appear on screen as you want it, when you echo $_GET['query']; for example, your problem might be the encoding you are specifying for the page sent back to the browser.

Include a line

header("Content-Type: text/html; charset=utf-8");

and see if it helps.

酷炫老祖宗 2024-09-18 17:04:38

遗憾的是,片段的编码方式取决于浏览器

片段 ID(哈希)是否通过应用 RFC 强制的 URL 转义规则进行编码?
MSIE:没有
Firefox:部分
Safari:是
歌剧:否
铬:否
安卓:是

关于浏览器在将国际(读取:非 ASCII)字符转换为 %nn 转义序列之前使用什么编码来编码的问题,“大多数浏览器通过发送 UTF-默认情况下,在 URL 栏中手动输入的任何文本上都包含 8 个数据,并在所有后续链接上使用页面编码。” (相同来源)。

How the fragment is encoded, is unfortunately, browser-dependent:

Is fragment ID (hash) encoded by applying RFC-mandated URL escaping rules?
MSIE: NO
Firefox: PARTLY
Safari: YES
Opera: NO
Chrome: NO
Android: YES

As to the question of what encoding the browser uses to encode international (read: non-ASCII) characters before converting them to %nn escape sequences, "most browsers deal with this by sending UTF-8 data by default on any text entered in the URL bar by hand, and using page encoding on all followed links." (same source).

酒几许 2024-09-18 17:04:38

您可以使用 UTF8::autoconvert_request() 来实现此目的。

查看http://code.google.com/p/php5-utf8/ 了解更多信息。

You could use UTF8::autoconvert_request() for this.

Take a look at http://code.google.com/p/php5-utf8/ for more information.

漆黑的白昼 2024-09-18 17:04:38

URL 仅限于某些 ascii 字符。非 url 友好字符应该是 url 编码的(您看到的 %hh 编码)。某些浏览器可能会自动对 addr 行中出现的 url 进行编码。

URLs are limited to certain ascii chars. Non-url friendly chars are supposed to be url-encoded (the %hh encoding you see). Some browsers might automatically encode urls that appear on the addr line.

爱殇璃 2024-09-18 17:04:38

答案很简单:字符串始终被编码。正如 HTTP 标准中所述。
Firefox 显示是什么 - 这并不重要。

此外,由于 PHP 自动解码查询字符串,因此也不需要解码。

请注意,“%D2%F0%E0%EB%E8%E2%E0%EB%E8”是单字节编码,因此,您的页面可能是 1251。至少 HTTP 标头向浏览器说明了这一点。
而AJAX总是使用utf-8。

因此,您只需为页面使用单一编码 (utf-8),或者区分 ajax 调用和常规调用。

至于片段 - 不要使用片段值将其发送到服务器。有一个 JS 变量,然后使用它两次 - 设置片段并使用 JSON 发送到服务器。

The answer is easy: string being encoded always. As it's stated in the HTTP standard.
And what is firefox displays - it doesn't matter.

Also, as PHP decode query string automatically, no decoding required either.

Note that '%D2%F0%E0%EB%E8%E2%E0%EB%E8' is single-byte encoding, so, you have your page probably in 1251. At least HTTP header says that to the browser.
While AJAX always use utf-8.

So, you have just to either use single encoding (utf-8) for your pages, or distinguish ajax calls from regular ones.

As for the fragment - do not use a fragment value to send it to the server. Have a JS variable, and then use it twice - to set a fragment and to send to the server using JSON.

月亮邮递员 2024-09-18 17:04:38

RFC 1738 规定,只有字母数字、特殊字符 $-_.+!*'()," 和保留字符 ;/?:@=& 在URL。无论 PHP 是否自动解码查询字符串,都可以使用 rawurldecode() 进行编码。双重解码不会有任何危险。

RFC 1738 states that only alphanumerics, the special characters $-_.+!*'()," and reserved characters ;/?:@=& are unencoded within a URL. Everything else is encoded by the HTTP client, i.e. Web browser. You can use rawurldecode() whether or not PHP automatically decodes the query string. There's no danger in double-decoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文