用Java解码字符串
如何在 Java 中正确解码以下字符串
http%3A//www.google.ru/search%3Fhl%3Dru%26q%3Dla+mer+powder%26btnG%3D%u0420%A0%u0421%u045F%u0420%A0%u0421%u2022%u0420%A0%u0421%u2018%u0420%u040E%u0420%u0453%u0420%A0%u0421%u201D+%u0420%A0%u0420%u2020+Google%26lr%3D%26rlz%3D1I7SKPT_ru
当我使用 URLDecoder.decode() 时出现以下错误
java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u0"
谢谢, 戴夫
How do I properly decode the following string in Java
http%3A//www.google.ru/search%3Fhl%3Dru%26q%3Dla+mer+powder%26btnG%3D%u0420%A0%u0421%u045F%u0420%A0%u0421%u2022%u0420%A0%u0421%u2018%u0420%u040E%u0420%u0453%u0420%A0%u0421%u201D+%u0420%A0%u0420%u2020+Google%26lr%3D%26rlz%3D1I7SKPT_ru
When I use URLDecoder.decode() I the following error
java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u0"
Thanks,
Dave
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根据 维基百科,“Unicode 字符存在一种非标准编码:
%uxxxx
,其中xxxx
是 Unicode 值”。继续:“任何 RFC 均未指定此行为,并且已被 W3C 拒绝”。
您的 URL 包含此类标记,而 Java URLDecoder 实现不支持这些标记。
According to Wikipedia, "there exist a non-standard encoding for Unicode characters:
%uxxxx
, wherexxxx
is a Unicode value".Continuing: "This behavior is not specified by any RFC and has been rejected by the W3C".
Your URL contains such tokens, and the Java URLDecoder implementation doesn't support those.
%uXXXX
编码是非标准的,并且实际上被 W3C 拒绝了,所以很自然,URLDecoder 不理解它。您可以创建一个小函数,通过将编码字符串中每次出现的
%uXXYY
替换为%XX%YY
来修复该问题。然后就可以正常进行并解码固定字符串了。%uXXXX
encoding is non-standard, and was actually rejected by W3C, so it's natural, that URLDecoder does not understand it.You can make small function, which will fix it by replacing each occurrence of
%uXXYY
with%XX%YY
in your encoded string. Then you can procede and decode the fixed string normally.我们从 Vartec 的解决方案开始,但发现了其他问题。此解决方案适用于 UTF-16,但可以更改为返回 UTF-8。为了清楚起见,保留了全部替换,您可以在 http://www 阅读更多内容.cogniteam.com/wiki/index.php?title=DecodeEncodeJavaScript
we started with Vartec's solution but found out additional issues. This solution works for UTF-16, but it can be changed to return UTF-8. The replace all is left for clarity reasons and you can read more at http://www.cogniteam.com/wiki/index.php?title=DecodeEncodeJavaScript
在仔细研究了 @ariy 提出的解决方案后,我创建了一个基于 Java 的解决方案,该解决方案对于被切成两部分的编码字符也具有弹性(即丢失了一半的编码字符)。这种情况发生在我的用例中,我需要解码有时会被截断为 2000 个字符长度的长 URL。请参阅 URL 的最大长度是多少不同的浏览器?
After having had a good look at the solution presented by @ariy I created a Java based solution that is also resilient against encoded characters that have been chopped into two parts (i.e. half of the encoded character is missing). This happens in my usecase where I need to decode long urls that are sometimes chopped at 2000 chars length. See What is the maximum length of a URL in different browsers?