Java中字节流转字符流
是否有一个类可以通过指定编码来创建它,将字节流输入其中并从中获取字符流?要点是我想通过不同时将整个字节流数据和整个字符流数据存储在内存中来节省内存。
比如:
Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ... several more s.write() calls
s.close(); // or not needed
String text = s.getString();
// or
char[] text = s.getCharArray();
那是什么东西?
Is there a class where one can create it by specifying the encoding, feed byte streams into it and get character streams from it? The main point is I want to conserve memory by not having both entire byte-stream data and entire character-stream data in the memory at the same time.
Something like:
Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ... several more s.write() calls
s.close(); // or not needed
String text = s.getString();
// or
char[] text = s.getCharArray();
What is that Something
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您在寻找 ByteArrayInputStream 吗?然后,您可以将其包装在
InputStreamReader
中,并从原始字节数组中读取字符。ByteArrayInputStream
允许您从字节数组“流式传输”。如果将其包装在InputStreamReader
中,则可以读取字符。InputStreamReader
允许您规定字符编码。如果您想直接从字节输入源,那么您可以构造适当类型的
InputStream
类(例如FileInputStream
)然后将其包装在InputStreamReader
中。Are you looking for
ByteArrayInputStream
? You could then wrap that in aInputStreamReader
and read characters out of the original byte array.A
ByteArrayInputStream
lets you "stream" from a byte array. If you wrap that in anInputStreamReader
you can read characters. TheInputStreamReader
lets you stipulate the character encoding.If you want to go directly from an input source of bytes, then you can just construct the appropriate sort of
InputStream
class (FileInputStream
for example) and then wrap that in anInputStreamReader
.您可能可以使用
CharsetDecoder
来模拟它。类似的内容(是的,我知道这会溢出您的
CharBuffer
- 您可能需要将内容复制到StringBuilder
中。)You can probably mock it up using
CharsetDecoder
. Something along the lines of(Yes, I know this will overflow your
CharBuffer
-- you may want to copy the contents into aStringBuilder
as you go.)您的示例代码似乎并未表明需要字符流。如果是这样,
String
已经可以处理您想要的所有内容。假设String s
包含数据,那么问题就简化为如何将字节从字节流获取到
String
中,为此您可以使用ByteArrayOutputStream
,就像这样:如果您确实想要具有字节输入流和字符输出流的东西,那么没有内置的。
Your example code didn't seem to indicate that a character stream was needed. If so,
String
can already handle all that you want. AssumingString s
contains the data,The question then reduces to how to get bytes from a byte stream into
String
, for which you can useByteArrayOutputStream
, like so:If you truly want something that has a byte input stream and a character output stream, there isn't a built-in one.
实际上,标题“在 Java 中将字节流转换为字符流”与您的示例相矛盾,除了数组之外,根本不使用任何流。我进一步假设你想要数组。
你肯定不能以 byte[] 开头并以 char[] (或 String)结束,而两者都在某个地方暂时存在。然而,还有一些可能性:
如果您确实需要
char[]
: 想法:将 byte[] 写入文件并使用 FileReader 将其读取到数组中。这实际上不起作用,因为您事先不知道正确的数组长度。因此,使用 DataOutput 生成所有字符并将其写入文件,然后使用 DataInput 将所有字符读回到数组中。如果您确实需要
String
:如上创建一个char[]
并使用反射和setAccessibe(true)
来调用package-private ctorString(int offset, int count, char value[])
.如果
CharSequence
就足够了:创建一个包含 byte[] 的类 MyCharSequence。一个非常慢的解决方案是通过从头开始转换 byte[] 的一部分来实现其方法charAt(index)
,直到获得index+1
字符。立即丢弃所有这些并保留最后一个。需要这样一个愚蠢的方法,因为使用utf8
你不知道一个字符对应多少个字节。您可以在开始时执行一次,并记住每个字符的第一个字节的位置。这更愚蠢,因为这些位置需要更多的内存。幸运的是,存在一个简单的时空权衡,例如,记住每个第 16 个字符的第一个字节的位置。我所有的建议都有点奇怪,但我相信,它不能做得更好。这可能是一个有趣的家庭作业,我不会去做。
Actually the title "Convert byte-stream to character-stream in Java" contradicts your example using no streams at all but arrays. I'm assuming further you want arrays.
You surely can't start with byte[] and end with char[] (or String) without having both somewhere for a while. There are however some possibilities:
in case you really need a
char[]
: Idea: Write the byte[] into a file and read it using a FileReader into the array. This doesn't really work, since you don't know the proper array length in advance. So generate and write all the characters into a file using DataOutput, read all of them back using DataInput into an array.in case you really need a
String
: Create achar[]
as above and use reflection andsetAccessibe(true)
to invoke the package-private ctorString(int offset, int count, char value[])
.in case a
CharSequence
suffices: Create a class MyCharSequence holding the byte[]. An extremely slow solution would be to implement its methodcharAt(index)
by converting a part of the byte[] starting from the beginning until you obtainindex+1
chars. Discard all of them on the fly and keep the last one. Such a stupid method is needed since usingutf8
you don't know how many bytes corresponds with a single char. You could do it once at the beginning and remember for each char the position of its first byte. This is even more stupid, as you'd need much more memory for those positions. Fortunately, a simple space-time tradeoff exists, e.g., remember the position of the first byte for each 16th char.All my proposals are a bit strange, but I believe, it can't be done much better. It could be a funny homework, I wouldn't go for it.