JavaScript 中的字符/URI 编码不同步?
我有一个关于在 JavaScript 中对 URL 中的特殊/扩展 UTF-8 字符进行编码的问题。同样的问题适用于许多字符,例如注册的 R 圆,但我的示例使用变音符号:
UTF-8 中的 ü = %C3%BC(从 http://www.utf8-chartable.de/)
如果 url 包含表示为 UTF-8 的元音变音 (ü = %C3%BC),并且我通过encodeURIComponent运行它,%s被编码,字符串现在看起来像“%25C3%25BC”,并且它被我的系统正确处理。这很好。
但是,不好的地方是:如果预编码的字符串有未编码的字符,则实际的元音变音、后面的编码看起来像“%C3%BC”并且失败,因为我相信%s也应该被编码。:
url = "http://foo.com/bar.html?ü"
url =encodeURIComponent(url);
// url 现在表示为“http%3A%2F%2Ffoo.com%2Fbar.html%3F%C3%BC”
我认为它失败了,因为它的编码不如 url 的其余部分彻底。
因此,除了一般建议或我不知道要问的问题的答案之外,我想我想知道的是如何让原始元音变音(以及所有其他特殊字符)完全编码。这是不正确的吗?
感谢您的帮助! 内特
I have a question about encoding special/extended UTF-8 characters in URLs in JavaScript. The same question applies to many characters like the Registered R-circle, but my example uses an umlaut:
ü = %C3%BC in UTF-8 (four rows from bottom of http://www.utf8-chartable.de/)
If the url contains an umlaut represented as UTF-8 (ü = %C3%BC), and I run it through encodeURIComponent, the %s are encode, the string now looks like "%25C3%25BC" and it gets correctly processed by my system. This is good.
url = "http://foo.com/bar.html?%C3%BC"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%25C3%25BC"
However, the bad: If the pre-encoded string has an unencoded character, the actual umlaut, the after encoding is looks like "%C3%BC" and fails because, I believe, the %s should be encoded, too.:
url = "http://foo.com/bar.html?ü"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%C3%BC"
I think it fails because it is less thoroughly encoded than the rest of the url.
So, beyond general advice or answers to questions I don't know to ask, what I think i want to know is how to get the raw umlaut (and all other special characters) to fully encode. Is that what is incorrect?
Thanks for your help!
Nate
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您无法一次对一个URL 进行编码。如果您已经将主机、路径、参数等连接在一起,那么就不可能正确确定哪些字符实际需要编码以及哪些字符是需要单独保留的分隔符。
构建 URL 的唯一可靠方法是连接已编码的值:
You cannot encode a URL all at once. If you have already concatenated the host, path, parameters, etc., together then it's impossible to correctly determine which characters actually need to be encoded and which characters are separators that need to be left alone.
The only reliable way to build a URL is by concatenating already-encoded values: