Java中字节流转字符流

发布于 2024-10-14 02:54:16 字数 423 浏览 9 评论 0原文

是否有一个类可以通过指定编码来创建它，将字节流输入其中并从中获取字符流？要点是我想通过不同时将整个字节流数据和整个字符流数据存储在内存中来节省内存。

比如：

Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ... several more s.write() calls
s.close(); // or not needed

String text = s.getString();
// or
char[] text = s.getCharArray();

那是什么东西？

原文

Is there a class where one can create it by specifying the encoding, feed byte streams into it and get character streams from it? The main point is I want to conserve memory by not having both entire byte-stream data and entire character-stream data in the memory at the same time.

Something like:

Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ... several more s.write() calls
s.close(); // or not needed

String text = s.getString();
// or
char[] text = s.getCharArray();

What is that Something?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹雨成花 2024-10-21 02:54:16

您在寻找 ByteArrayInputStream 吗？然后，您可以将其包装在 InputStreamReader 中，并从原始字节数组中读取字符。

ByteArrayInputStream 允许您从字节数组“流式传输”。如果将其包装在 InputStreamReader 中，则可以读取字符。 InputStreamReader 允许您规定字符编码。

如果您想直接从字节输入源，那么您可以构造适当类型的InputStream类（例如FileInputStream）然后将其包装在 InputStreamReader 中。

回复收藏 0 原文

山川志 2024-10-21 02:54:16

您可能可以使用 CharsetDecoder 来模拟它。类似的内容

    CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
    CharBuffer cb = CharBuffer.allocate(100);
    decoder.decode(ByteBuffer.wrap(buffer1), cb, false);
    decoder.decode(ByteBuffer.wrap(buffer2), cb, false);
    ...
    decoder.decode(ByteBuffer.wrap(bufferN), cb, true);
    cb.position(0);
    return cb.toString();

（是的，我知道这会溢出您的 CharBuffer - 您可能需要将内容复制到 StringBuilder 中。）

You can probably mock it up using CharsetDecoder. Something along the lines of

    CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
    CharBuffer cb = CharBuffer.allocate(100);
    decoder.decode(ByteBuffer.wrap(buffer1), cb, false);
    decoder.decode(ByteBuffer.wrap(buffer2), cb, false);
    ...
    decoder.decode(ByteBuffer.wrap(bufferN), cb, true);
    cb.position(0);
    return cb.toString();

(Yes, I know this will overflow your CharBuffer -- you may want to copy the contents into a StringBuilder as you go.)

回复收藏 0 原文

很酷又爱笑 2024-10-21 02:54:16

您的示例代码似乎并未表明需要字符流。如果是这样，String 已经可以处理您想要的所有内容。假设 String s 包含数据，

char[] chars = s.toCharArray();
byte[] bytes = s.getBytes("utf-8");

那么问题就简化为如何将字节从字节流获取到 String 中，为此您可以使用 ByteArrayOutputStream ，就像这样：

ByteArrayOutputSteam os = new ByteArrayOutputSteam();
os.write(buffer, 0, buffer.length); // it just stores the bytes, doesn't convert yet.
// several more os.write() calls
s = os.toString("utf-8"); // now it converts the full buffer to a string in the specified encoding.

如果您确实想要具有字节输入流和字符输出流的东西，那么没有内置的。

Your example code didn't seem to indicate that a character stream was needed. If so, String can already handle all that you want. Assuming String s contains the data,

char[] chars = s.toCharArray();
byte[] bytes = s.getBytes("utf-8");

The question then reduces to how to get bytes from a byte stream into String, for which you can use ByteArrayOutputStream, like so:

ByteArrayOutputSteam os = new ByteArrayOutputSteam();
os.write(buffer, 0, buffer.length); // it just stores the bytes, doesn't convert yet.
// several more os.write() calls
s = os.toString("utf-8"); // now it converts the full buffer to a string in the specified encoding.

If you truly want something that has a byte input stream and a character output stream, there isn't a built-in one.

回复收藏 0 原文

最美不过初阳 2024-10-21 02:54:16

实际上，标题“在 Java 中将字节流转换为字符流”与您的示例相矛盾，除了数组之外，根本不使用任何流。我进一步假设你想要数组。

你肯定不能以 byte[] 开头并以 char[] （或 String）结束，而两者都在某个地方暂时存在。然而，还有一些可能性：

如果您确实需要 char[]：想法：将 byte[] 写入文件并使用 FileReader 将其读取到数组中。这实际上不起作用，因为您事先不知道正确的数组长度。因此，使用 DataOutput 生成所有字符并将其写入文件，然后使用 DataInput 将所有字符读回到数组中。
如果您确实需要 String：如上创建一个 char[] 并使用反射和 setAccessibe(true) 来调用package-private ctor String(int offset, int count, char value[]).
如果 CharSequence 就足够了：创建一个包含 byte[] 的类 MyCharSequence。一个非常慢的解决方案是通过从头开始转换 byte[] 的一部分来实现其方法 charAt(index) ，直到获得 index+1 字符。立即丢弃所有这些并保留最后一个。需要这样一个愚蠢的方法，因为使用 utf8 你不知道一个字符对应多少个字节。您可以在开始时执行一次，并记住每个字符的第一个字节的位置。这更愚蠢，因为这些位置需要更多的内存。幸运的是，存在一个简单的时空权衡，例如，记住每个第 16 个字符的第一个字节的位置。