当前位置：文江博客话题详情

java中的编码转换

发布于 2024-07-08 19:28:10 字数 185 浏览 7 评论 0原文

是否有任何免费的java库可以用来将一种编码中的字符串转换为其他编码，例如 iconv？我正在使用 Java 版本 1.3。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

聽兲甴掵 2024-07-15 19:28:10

您不需要标准库之外的库 - 只需使用字符集。（您可以只使用 String 构造函数和 getBytes 方法，但我个人不喜欢只使用字符编码的名称。打字错误的空间太大。）

编辑：正如评论中指出的，您仍然可以使用 Charset 实例，但是易于使用 String 方法：新字符串（字节，字符集）和 String.getBytes(charset)。

请参阅“URL 编码（或：'那些是什么“%20" URL 中的代码？')"。

回复收藏 0 原文

吖咩 2024-07-15 19:28:10

CharsetDecoder 应该是您正在寻找的，不是吗？

许多网络协议和文件使用面向字节的字符集存储其字符，例如 ISO-8859-1 (ISO-Latin-1)。
然而，Java 的本机字符编码是 ~~Unicode~~ UTF16BE（十六进制-bit UCS 转换格式，大端字节顺序）。

请参阅 <代码>字符集。这并不意味着 UTF16 是默认字符集（即：默认的“十六位序列之间的映射 Unicode 代码单元 和字节序列"):

Java 虚拟机的每个实例都有一个默认字符集，它可能是也可能不是标准字符集之一。
[US-ASCII、ISO-8859-1 又名 ISO-LATIN-1、UTF-8、<代码>UTF-16BE、UTF-16LE、UTF-16]
默认字符集是在虚拟机启动期间确定的，通常取决于底层操作系统使用的区域设置和字符集。

此示例演示如何将 ByteBuffer 中的 ISO-8859-1 编码字节转换为 CharBuffer 中的字符串，反之亦然。

// Create the encoder and decoder for ISO-8859-1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();

try {
    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
    // The new ByteBuffer is ready to be read.
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("a string"));

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
    // The new ByteBuffer is ready to be read.
    CharBuffer cbuf = decoder.decode(bbuf);
    String s = cbuf.toString();
} catch (CharacterCodingException e) {
}

CharsetDecoder should be what you are looking for, no ?

Many network protocols and files store their characters with a byte-oriented character set such as ISO-8859-1 (ISO-Latin-1).
However, Java's native character encoding is ~~Unicode~~ UTF16BE (Sixteen-bit UCS Transformation Format, big-endian byte order).

See Charset. That doesn't mean UTF16 is the default charset (i.e.: the default "mapping between sequences of sixteen-bit Unicode code units and sequences of bytes"):

Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets.
[US-ASCII, ISO-8859-1 a.k.a. ISO-LATIN-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16]
The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

This example demonstrates how to convert ISO-8859-1 encoded bytes in a ByteBuffer to a string in a CharBuffer and visa versa.

// Create the encoder and decoder for ISO-8859-1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();

try {
    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
    // The new ByteBuffer is ready to be read.
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("a string"));

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
    // The new ByteBuffer is ready to be read.
    CharBuffer cbuf = decoder.decode(bbuf);
    String s = cbuf.toString();
} catch (CharacterCodingException e) {
}

回复收藏 0 原文