Java中字节流转字符流

发布于 2024-10-14 02:54:16 字数 423 浏览 9 评论 0原文

是否有一个类可以通过指定编码来创建它,将字节流输入其中并从中获取字符流?要点是我想通过不同时将整个字节流数据和整个字符流数据存储在内存中来节省内存。

比如:

Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ... several more s.write() calls
s.close(); // or not needed

String text = s.getString();
// or
char[] text = s.getCharArray();

那是什么东西?

Is there a class where one can create it by specifying the encoding, feed byte streams into it and get character streams from it? The main point is I want to conserve memory by not having both entire byte-stream data and entire character-stream data in the memory at the same time.

Something like:

Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ... several more s.write() calls
s.close(); // or not needed

String text = s.getString();
// or
char[] text = s.getCharArray();

What is that Something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

风吹雨成花 2024-10-21 02:54:16

您在寻找 ByteArrayInputStream 吗?然后,您可以将其包装在 InputStreamReader 中,并从原始字节数组中读取字符。

ByteArrayInputStream 允许您从字节数组“流式传输”。如果将其包装在 InputStreamReader 中,则可以读取字符。 InputStreamReader 允许您规定字符编码。

如果您想直接从字节输入源,那么您可以构造适当类型的InputStream类(例如FileInputStream)然后将其包装在 InputStreamReader 中。

Are you looking for ByteArrayInputStream? You could then wrap that in a InputStreamReader and read characters out of the original byte array.

A ByteArrayInputStream lets you "stream" from a byte array. If you wrap that in an InputStreamReader you can read characters. The InputStreamReader lets you stipulate the character encoding.

If you want to go directly from an input source of bytes, then you can just construct the appropriate sort of InputStream class (FileInputStream for example) and then wrap that in an InputStreamReader.

山川志 2024-10-21 02:54:16

您可能可以使用 CharsetDecoder 来模拟它。类似的内容

    CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
    CharBuffer cb = CharBuffer.allocate(100);
    decoder.decode(ByteBuffer.wrap(buffer1), cb, false);
    decoder.decode(ByteBuffer.wrap(buffer2), cb, false);
    ...
    decoder.decode(ByteBuffer.wrap(bufferN), cb, true);
    cb.position(0);
    return cb.toString();

(是的,我知道这会溢出您的 CharBuffer - 您可能需要将内容复制到 StringBuilder 中。)

You can probably mock it up using CharsetDecoder. Something along the lines of

    CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
    CharBuffer cb = CharBuffer.allocate(100);
    decoder.decode(ByteBuffer.wrap(buffer1), cb, false);
    decoder.decode(ByteBuffer.wrap(buffer2), cb, false);
    ...
    decoder.decode(ByteBuffer.wrap(bufferN), cb, true);
    cb.position(0);
    return cb.toString();

(Yes, I know this will overflow your CharBuffer -- you may want to copy the contents into a StringBuilder as you go.)

很酷又爱笑 2024-10-21 02:54:16

您的示例代码似乎并未表明需要字符流。如果是这样,String 已经可以处理您想要的所有内容。假设 String s 包含数据,

char[] chars = s.toCharArray();
byte[] bytes = s.getBytes("utf-8");

那么问题就简化为如何将字节从字节流获取到 String 中,为此您可以使用 ByteArrayOutputStream ,就像这样:

ByteArrayOutputSteam os = new ByteArrayOutputSteam();
os.write(buffer, 0, buffer.length); // it just stores the bytes, doesn't convert yet.
// several more os.write() calls
s = os.toString("utf-8"); // now it converts the full buffer to a string in the specified encoding.

如果您确实想要具有字节输入流和字符输出流的东西,那么没有内置的。

Your example code didn't seem to indicate that a character stream was needed. If so, String can already handle all that you want. Assuming String s contains the data,

char[] chars = s.toCharArray();
byte[] bytes = s.getBytes("utf-8");

The question then reduces to how to get bytes from a byte stream into String, for which you can use ByteArrayOutputStream, like so:

ByteArrayOutputSteam os = new ByteArrayOutputSteam();
os.write(buffer, 0, buffer.length); // it just stores the bytes, doesn't convert yet.
// several more os.write() calls
s = os.toString("utf-8"); // now it converts the full buffer to a string in the specified encoding.

If you truly want something that has a byte input stream and a character output stream, there isn't a built-in one.

最美不过初阳 2024-10-21 02:54:16

实际上,标题“在 Java 中将字节流转换为字符流”与您的示例相矛盾,除了数组之外,根本不使用任何流。我进一步假设你想要数组。

你肯定不能以 byte[] 开头并以 char[] (或 String)结束,而两者都在某个地方暂时存在。然而,还有一些可能性:

  • 如果您确实需要 char[]: 想法:将 byte[] 写入文件并使用 FileReader 将其读取到数组中。这实际上不起作用,因为您事先不知道正确的数组长度。因此,使用 DataOutput 生成所有字符并将其写入文件,然后使用 DataInput 将所有字符读回到数组中。

  • 如果您确实需要 String:如上创建一个 char[] 并使用反射和 setAccessibe(true) 来调用package-private ctor String(int offset, int count, char value[]).

  • 如果 CharSequence 就足够了:创建一个包含 byte[] 的类 MyCharSequence。一个非常慢的解决方案是通过从头开始转换 byte[] 的一部分来实现其方法 charAt(index) ,直到获得 index+1 字符。立即丢弃所有这些并保留最后一个。需要这样一个愚蠢的方法,因为使用 utf8 你不知道一个字符对应多少个字节。您可以在开始时执行一次,并记住每个字符的第一个字节的位置。这更愚蠢,因为这些位置需要更多的内存。幸运的是,存在一个简单的时空权衡,例如,记住每个第 16 个字符的第一个字节的位置。

我所有的建议都有点奇怪,但我相信,它不能做得更好。这可能是一个有趣的家庭作业,我不会去做。

Actually the title "Convert byte-stream to character-stream in Java" contradicts your example using no streams at all but arrays. I'm assuming further you want arrays.

You surely can't start with byte[] and end with char[] (or String) without having both somewhere for a while. There are however some possibilities:

  • in case you really need a char[]: Idea: Write the byte[] into a file and read it using a FileReader into the array. This doesn't really work, since you don't know the proper array length in advance. So generate and write all the characters into a file using DataOutput, read all of them back using DataInput into an array.

  • in case you really need a String: Create a char[] as above and use reflection and setAccessibe(true) to invoke the package-private ctor String(int offset, int count, char value[]).

  • in case a CharSequence suffices: Create a class MyCharSequence holding the byte[]. An extremely slow solution would be to implement its method charAt(index) by converting a part of the byte[] starting from the beginning until you obtain index+1 chars. Discard all of them on the fly and keep the last one. Such a stupid method is needed since using utf8 you don't know how many bytes corresponds with a single char. You could do it once at the beginning and remember for each char the position of its first byte. This is even more stupid, as you'd need much more memory for those positions. Fortunately, a simple space-time tradeoff exists, e.g., remember the position of the first byte for each 16th char.

All my proposals are a bit strange, but I believe, it can't be done much better. It could be a funny homework, I wouldn't go for it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文