Java 中的 UTF-8 到 ISO-8859-1 映射/无损转换库
我需要在 Java 中执行从 UTF-8 到 ISO-8859-1 的字符转换,而不会丢失所有 UTF-8 特定标点符号。
理想情况下,希望将它们转换为 ISO 中的等效字符(例如,UTF-8 中可能有 5 个不同的单引号,并且希望将它们全部转换为 ISO 单引号字符)。
String.getBytes("ISO-8859-1") 在这种情况下不起作用,因为它会丢失 UTF-8 特定的字符。
您是否知道 Java 中有任何现成的映射或库可以将 UTF-8 特定字符映射到 ISO?
I need to perform a conversion of characters from UTF-8 to ISO-8859-1 in Java without losing for example all of the UTF-8 specific punctuation.
Ideally would like these to be converted to equivalents in ISO (e.g. there are probably 5 different single quotes in UTF-8 and would like them all converted to ISO single quote character).
String.getBytes("ISO-8859-1") just won't do the trick in this case as it will lose the UTF-8-specific chars.
Do you know of any ready mappings or libraries in Java that would map UTF-8 specific characters to ISO?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
IBM 的 ICU 项目可能就是您正在寻找的。它支持后备转换。
IBM's ICU project might be what you're looking for. It has support for fallback conversions.
您是否考虑过使用具有 ISO-8859-1 显式字符集的 OutputStream?
然后只需编写您的 Unicode 字符,看看您会得到什么。
Have you considered using an OutputStream with an explicit character set of ISO-8859-1?
Then just write your Unicode chars and see what you get.
Java 开发工具包有一个名为 native2ascii 的工具可以执行此操作。使用:
您还可以使用 -reverse 选项以其他方式返回。
另请参阅 JDK 1.6 支持的编码列表。
The Java Development Kit has a tool called native2ascii that will do this. Use:
You can also go back the other way using the -reverse option.
Also see the list of supported encodings for JDK 1.6.