Java 中拉丁字符的 URL 编码
我正在尝试读取图像 URL。 将 URL 转换为 URI
String imageURL = "http://www.shefinds.com/files/Christian-Louboutin-Décolleté-100-pumps.jpg";
URL url = new URL(imageURL);
url = new URI(url.getProtocol(), url.getHost(), url.getFile(), null).toURL();
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
正如 java 文档中提到的,我尝试通过获取文件的 Java.io.FileNotFound 异常 http://www.shefinds.com/files/Christian-Louboutin- Décolleté-100-pumps.jpg
我做错了什么以及编码此 URL 的正确方法是什么?
更新:
我正在使用 Rome 阅读 RSS 提要。根据 BalusC 的建议,我打印出了不同阶段的原始输入,看起来 ROME rss 解析器正在使用 ISO-8859-1 而不是 UTF-8。
I'm trying to read in an image URL. As mentioned in the java documentation, I tried converting the URL to URI by
String imageURL = "http://www.shefinds.com/files/Christian-Louboutin-Décolleté-100-pumps.jpg";
URL url = new URL(imageURL);
url = new URI(url.getProtocol(), url.getHost(), url.getFile(), null).toURL();
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
I get the a Java.io.FileNotFound Exception for file
http://www.shefinds.com/files/Christian-Louboutin-Décolleté-100-pumps.jpg
What am I doing wrong and what is the right way to encode this URL?
Update:
I'm using Rome to read in RSS feeds. Taking suggestions from BalusC I have printed out the raw input from different stages and seems like that the ROME rss parser is using ISO-8859-1 instead of UTF-8.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里工作正常(返回 403,至少不是 404):
当我修复它以便它不返回 403 时,图片已正确退休:
所以你的问题出在其他地方。其实不需要转换。初始 URL 有效。
也许您正在使用错误的字符编码从某些二进制源获取实际的 URL?
é
到é
的转换表明原始源是 UTF-8 编码的,并且代码在使用 ISO-8859-1 而不是错误地读取它UTF-8。更新:或者您可能实际上已将其硬编码到 Java 源代码中并使用错误的编码保存源文件本身。我已将编辑器 (Eclipse) 配置为使用 UTF-8 保存文件,并且
-Dfile.encoding
也默认为 UTF-8,这可以解释为什么它可以在我的机器上运行< /em> ;)更新 2:根据评论,简而言之,如果用于保存源文件的编码与默认
-Dfile.encoding
运行时平台的(并且相关字符编码支持é
)。为了避免在分发代码时发生那些不可预见的冲突,最好用 unicode 转义符替换硬编码的非 ASCII 字符。Works fine here (returns a 403, it's at least not a 404):
When I fix it so that it doesn't return a 403, the picture is correctly retireved:
So your problem lies somewhere else. Converting is actually not needed. The initial URL is valid.
Maybe you're obtaining the actual URL from some binary source using the wrong character encoding? The transition of
é
toé
namely suggests that the original source was UTF-8 encoded and that the code has incorrectly read it in in using ISO-8859-1 instead of UTF-8.Update: or maybe you've actually hardcoded it in the Java source code and saving the source file itself using the wrong encoding. I've configured my editor (Eclipse) to save files using UTF-8 and the
-Dfile.encoding
is also defaulted to UTF-8, that would explain why it works at my machine ;)Update 2: as per the comments, in a nutshell, everything should work fine if the encoding used to save the source file matches the default
-Dfile.encoding
of the runtime platform (and the character encoding in question supports theé
). To avoid those unforeseen clashes whenever you like to distribute the code, it's indeed better to replace hardcoded non-ASCII chars by unicode escapes.我认为技术上的答案是“你不能”。根据标准,URL 中不能使用非 ASCII 字符,甚至某些 ASCII 字符也必须使用“%XX”语法进行转义,其中 XX 是该字符的 ASCII 值。
如果有的话,您可以使用“%E9”转义“é”,但这依赖于服务器根据 ISO-8859-1 将其解释为字符编码。虽然这在技术上是不允许的,但我相信很多服务器都会这样做。
I think the technical answer is "you can't." Non-ASCII characters can't be used in a URL according to the standard, and even some ASCII characters must be escaped with "%XX" syntax, where XX is the ASCII value of the character.
If anything, you can escape 'é' with '%E9' but this relies on the server interpreting this as an encoding of the character according to ISO-8859-1. While this isn't technically allowed, I believe many servers will do it.
源文件的编码是罪魁祸首。使用 IDE 将其设置为 UTF-8,然后重新粘贴 URL。
The encoding of your source file is to blame. Using your IDE, set it to UTF-8, and then repaste the URL.