java中的编码转换
是否有任何免费的java库可以用来将一种编码中的字符串转换为其他编码,例如 iconv
? 我正在使用 Java 版本 1.3。
Is there any free java library which I can use to convert string in one encoding to other encoding, something like iconv
? I'm using Java version 1.3.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您不需要标准库之外的库 - 只需使用 字符集。 (您可以只使用 String 构造函数和 getBytes 方法,但我个人不喜欢只使用字符编码的名称。打字错误的空间太大。)
编辑:正如评论中指出的,您仍然可以使用 Charset 实例,但是易于使用 String 方法: 新字符串(字节,字符集) 和 String.getBytes(charset)。
请参阅“URL 编码(或:'那些是什么“
%20" URL 中的代码?')
"。
You don't need a library beyond the standard one - just use Charset. (You can just use the String constructors and getBytes methods, but personally I don't like just working with the names of character encodings. Too much room for typos.)
EDIT: As pointed out in comments, you can still use Charset instances but have the ease of use of the String methods: new String(bytes, charset) and String.getBytes(charset).
See "URL Encoding (or: 'What are those "
%20
" codes in URLs?')".CharsetDecoder
应该是您正在寻找的,不是吗?许多网络协议和文件使用面向字节的字符集存储其字符,例如
ISO-8859-1
(ISO-Latin-1
)。然而,Java 的本机字符编码是
UnicodeUTF16BE(十六进制-bit UCS 转换格式,大端字节顺序)。请参阅 <代码>字符集。 这并不意味着
UTF16
是默认字符集(即:默认的“十六位序列之间的映射 Unicode 代码单元 和字节序列"):此示例演示如何将
ByteBuffer
中的ISO-8859-1
编码字节转换为CharBuffer
中的字符串,反之亦然。CharsetDecoder
should be what you are looking for, no ?Many network protocols and files store their characters with a byte-oriented character set such as
ISO-8859-1
(ISO-Latin-1
).However, Java's native character encoding is
UnicodeUTF16BE (Sixteen-bit UCS Transformation Format, big-endian byte order).See
Charset
. That doesn't meanUTF16
is the default charset (i.e.: the default "mapping between sequences of sixteen-bit Unicode code units and sequences of bytes"):This example demonstrates how to convert
ISO-8859-1
encoded bytes in aByteBuffer
to a string in aCharBuffer
and visa versa.我想补充一点,如果字符串最初使用错误的编码进行编码,则可能不可能在没有错误的情况下将其更改为另一种编码。
这个问题并没有说明这里的转换是从错误的编码到正确的编码,但我个人只是因为这种情况才偶然发现这个问题,所以也请注意其他人。
其他问题中的这个答案解释了为什么转换并不总是产生正确的结果
https://stackoverflow.com/a/2623793/4702806
I would just like to add that if the String is originally encoded using the wrong encoding it might be impossible to change it to another encoding without errors.
The question does not state that the conversion here is made from wrong encoding to correct encoding but I personally stumbled to this question just because of this situation so just a heads up for others as well.
This answer in other question gives an explanation why the conversion does not always yield correct results
https://stackoverflow.com/a/2623793/4702806
如果您将 unicode 视为一个字符集(实际上就是这样 - 它基本上是所有已知字符的编号集),那就容易多了。 您可以将其编码为 UTF-8(每个字符 1-3 个字节,具体取决于)或 UTF-16(每个字符 2 个字节或使用代理项对的 4 个字节)。
早在很久以前,Java 就曾使用 UCS-2 来对 unicode 字符集进行编码。 这只能处理每个字符 2 个字节,现在已过时。 添加代理对并升级到 UTF-16 是一个相当明显的黑客行为。
很多人认为他们一开始就应该使用 UTF-8。 无论如何,当 Java 最初编写时,unicode 已经远远超过 65535 个字符......
It is a whole lot easier if you think of unicode as a character set (which it actually is - it is very basically the numbered set of all known characters). You can encode it as UTF-8 (1-3 bytes per character depending) or maybe UTF-16 (2 bytes per character or 4 bytes using surrogate pairs).
Back in the mist of time Java used to use UCS-2 to encode the unicode character set. This could only handle 2 bytes per character and is now obsolete. It was a fairly obvious hack to add surrogate pairs and move up to UTF-16.
A lot of people think they should have used UTF-8 in the first place. When Java was originally written unicode had far more than 65535 characters anyway...