encodeURIComponent 真的有用吗?

发布于 2024-08-20 14:03:04 字数 839 浏览 11 评论 0原文

当向服务器执行http-get请求时,我仍然不明白使用JS函数encodeURIComponent对http-get的每个组件进行编码的优势是什么

做了一些测试,我发现如果我不使用encodeURIComponent,服务器(使用 PHP)也能正确获取 http-get 请求的值! 显然我仍然需要在客户端级别对特殊字符和字符进行编码。 ? = / :否则像“peace&love=virtue”这样的 http-get 值将被视为 http-get 请求的新键值对,而不是单个值。 但是为什么encodeURIcompenent还编码许多其他字符,例如'è',它被翻译成%C3%A8,必须使用utf8_decode函数在PHP服务器上解码。

通过使用encodeURIComponent,http-get请求的所有值都是utf8编码的,因此当在PHP中获取它们时,我必须每次对每个$_GET值调用utf8_decode函数,这是相当烦人

为什么我们不能只对 & 进行编码? ? = / :字符?

另请参阅: JSencodeURIComponent结果与FORM创建的结果不同 它表明,encodeURIComponent 甚至无法正确编码,因为简单的浏览器 FORM GET 以不同的方式对“€”等字符进行编码。所以我仍然想知道这个encodeURIComponent是做什么用的?

Something I still don't understand when performing an http-get request to the server is what the advantage is in using JS function encodeURIcomponent to encode each component of the http-get.

Doing some tests I saw the server (using PHP) gets the values of the http-get request properly also if I don't use encodeURIcomponent!
Obviously I still need to encode at client level the special character & ? = / : otherwise an http-get value like this "peace&love=virtue" would be considered as new key value pair of the http-get request instead of a one single value.
But why does encodeURIcompenent encodes also many other characters like 'è' for example which is translated into %C3%A8 that must be decoded on a PHP server using the utf8_decode function.

By using encodeURIcomponent all values of the http-get request are utf8 encoded, therefore when getting them in PHP I have to call each time the utf8_decode function on each $_GET value which is quite annoying.

Why can't we just encode only the & ? = / : characters?

see also: JS encodeURIComponent result different from the one created by FORM
It shows that encodeURIComponent does not even encode properly because a simple browser FORM GET encodes characters like '€', in different way. So I still wonder what does this encodeURIComponent is for?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你的往事 2024-08-27 14:03:04

那是因为

统一资源标识符 (URI) 是
在[RFC3986]中定义为序列
从有限的字符中选择的
曲目的子集
US-ASCII [ASCII] 字符。

所以官方不支持unicode; 请参阅 RFC 了解详细信息。不过,所有现代浏览器都支持它,这就是为什么你得到的结果很好......但对于某些不支持它的浏览器或系统的奇怪情况,你对其进行编码并确保它在所有标准兼容的浏览器中正常工作。

That is because

A Uniform Resource Identifier (URI) is
defined in [RFC3986] as a sequence
of characters chosen from a limited
subset of the repertoire of
US-ASCII [ASCII] characters.

So officially unicode is not supported; see the RFC for details. All modern browsers support it though, and that is why you get your results just fine.. but for the odd case where some browser or system that does not support it you encode it and make sure it works fine across all standard compliant browsers..

治碍 2024-08-27 14:03:04

这是一个字符编码问题(再次)。正如 Gaby 所说,URI 是 ASCII 字符序列(因此只有 0-127 范围内的字节)。因此,任何其他非 ASCII 字符都需要使用 百分比进行编码-编码

由于 UTF-8 是新的“通用字符编码”,现在用户代理将 URI 解释为 UTF-8 编码。但这些 UTF-8 编码的单词本身也使用百分比编码进行编码,因为 URI 不能包含除 ASCII 字符之外的任何其他字符。

这意味着,当您在浏览器的地址字段中输入 http://en.wikipedia.org/wiki/€ 时,您的浏览器会查找 的 UTF-8 代码(0xE282AC) 并对其应用百分比编码 (%E2%82%AC)。所以http://en.wikipedia.org/wiki/€实际上会导致http://en.wikipedia.org/wiki/%E2%82%AC

为了向您证明这是正确的,只需在您的地址字段中输入 http://en.wikipedia.org/wiki/%E2%82%AC ,您的浏览器可能会将其转换为 http://en.wikipedia.org/wiki/€。这是因为现在用户代理将 URI 解释为 UTF-8 编码。

现在回到您最初的问题,为什么您应该显式应用百分比编码:假设您有一个网页,您想要链接到有关欧元符号的维基百科文章。如果您仅使用普通的 编写 URI:

<a href="http://en.wikipedia.org/wiki/€">Euro sign</a>

您的浏览器将使用文档的字符编码来表示 字符。这意味着,如果您的文档编码是 Windows-1252(如您的其他问题),则 将被编码为 0x80,URI 将是 http://en.wikipedia.org/wiki/%80 (这实际上是有效的,因为 Wikipedia 很聪明,因为 Windows-1252 是最流行的字符编码与 0x80 上的可打印字符)。

但如果您的文档编码是 ISO 8859-15,则 将被编码为 0xA4,表示 ISO 8859-1 中的货币符号 ¤ (维基百科将选择 ISO 8859-1,因为 0xA4 在 UTF-8 和 HTTP 指定 ISO 8859-1 作为默认字符编码)。

因此,我建议始终使用百分比编码以避免错误。不要让用户代理猜测您的意思。

This is a character encoding issue (again). As Gaby stated, URIs are a sequence of ASCII characters (thus only bytes of the range 0–127). So any other character, that is not in ASCII, needs to be encoded with the Percent-Encoding.

And since UTF-8 is the new “universal character encoding”, nowadays user agents interpret the URI to be UTF-8 encoded. But these UTF-8 encoded words are themselves also encoded with the Percent-Encoding since URIs cannot contain any other characters except those in ASCII.

That means, when you enter http://en.wikipedia.org/wiki/€ into your browser’s address field, your browser looks up the UTF-8 code for (0xE282AC) and applies the Percent-Encoding on it (%E2%82%AC). So http://en.wikipedia.org/wiki/€ will actually result in http://en.wikipedia.org/wiki/%E2%82%AC.

To show you that this is true, just enter http://en.wikipedia.org/wiki/%E2%82%AC into your address field and your browser will probably turn that into http://en.wikipedia.org/wiki/€. That is because nowadays user agents interpret the URI to be UTF-8 encoded.

Now back to your initial question, why you should apply the Percent-Encoding explicitly: Imagine you have a web page where you want to link to the Wikipedia article on the Euro sign. If you just write the URI with a plain :

<a href="http://en.wikipedia.org/wiki/€">Euro sign</a>

Your browser will use the character encoding of the document for the character. That means, if your document’s encoding is Windows-1252 (as in your other question), the will be encoded as 0x80 and the URI would be http://en.wikipedia.org/wiki/%80 (this actually works because Wikipedia is that clever to guess as Windows-1252 is the most popular character encoding with a printable character on 0x80).

But if your document’s encoding is ISO 8859-15, the will be encoded as 0xA4 that represents the currency sign ¤ in ISO 8859-1 (Wikipedia will chose ISO 8859-1 because 0xA4 is an invalid byte sequence in UTF-8 and HTTP specifies ISO 8859-1 as default character encoding).

So I recommend to always use the Percent-Encoding to avoid mistakes. Don’t let the user agents guess what you mean.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文