URLEncoder 和 URLDecoder 编码和解码 Javadoc 注意:如果不使用 UTF-8 怎么办?
因此,URLEncoder 的编码和 URLDecoder 的解码的 javadoc 中包含以下注释:
注意:万维网联盟建议规定应使用 UTF-8。不这样做可能会导致不兼容。”
但是,如果有人使用不同的编码类型发送请求,那么使用 UTF-8 编码不是一个坏主意吗?检查标头(如果存在)并使用其中指定的任何编码是否有任何问题?如果有人可以提供的话,也许这篇文章的更多背景知识会让我更有意义。
So, the javadoc for URLEncoder's encode and URLDecoder's decode have this note in them:
Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites."
However, if someone sends in a request with a different encoding type, wouldn't it be a bad idea to encode with UTF-8? Is there anything wrong with checking a header (if it exists) and using whatever encoding is specified in there? Perhaps some more background to this note would allow it to make more sense to me, if anyone can provide it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在同一个文档中
您可以更改编码,但由于它不符合 W3C 标准,因此这不是一个好主意。
资源:
In the same documentation
You can change the encoding, but as it's not W3C compliant it would be a bad idea.
Resources :
Tomcat 和其他一些 Web 服务器有一个单独的设置,用于控制 GET 请求中 URL 所使用的解码器。具体来说,Tomcat 将使用服务器的默认字符编码,除非在“Connector”的
URIEncoding
属性中指定了该编码。我发现这篇文章中的讨论很有帮助正在处理类似的问题。
Tomcat and some other web servers have a separate setting that controls the decoder used for the URL in a GET request. Specifically, Tomcat will use the server's default character encoding unless one is specified in the with the
URIEncoding
attribute of the "Connector".I found the discussion in this post helpful when I was dealing with similar problems.
一些国家的网站确实使用其他编码,因为 UTF-8 对于他们的语言来说效率低下。
URL 通常是不透明的。它是由网站生成并由同一网站使用的 ASCII 字符序列。只要网站本身能解析就可以了。
另一方面,人们确实想要查看 URL,尝试了解更详细的细节。当浏览器显示充满 % 编码八位字节的 URL 时,可能希望将它们转换回字符。不幸的是它必须猜测字符编码,理论上编码可以是任何东西,甚至是专有的。
此外,第 3 方可能想要生成他们无法控制的网站的 URL。有多少程序动态生成 Google 搜索 URL?同样,必须推测网站支持的编码。
因此,如果您是网站所有者,并且希望变得友好,那么最好支持 UTF-8 编码的 URL。当然,你不必如此。这是您的网址,由您决定。
Some countries' websites do use other encodings, because UTF-8 would be inefficient for their languages.
URLs are generally opaque. It's a sequence of ASCII chars that were generated by a website, and consumed by the same website. As long as the website itself can parse it, it's good.
On the other hand, people do want to look into URLs, try to understand finer details. A browser, when displaying a URL full of %-encoded octets, may want to convert them back to characters. Unfortunately it has to guess the character encoding, theoretically the encoding can be anything, even proprietary ones.
Also, a 3rd party may want to generate a URL to a website that they don't control. How many programs have dynamically generated Google search URLs? Again, the encoding supported by the website must be speculated.
So if you are a website owner, and you want to be nice, it's better to support UTF-8 encoded URLs. Of course, you don't have to be. It's your URLs, it's up to you.