从 InputStream 读取文本和二进制数据

发布于 2024-11-17 17:45:13 字数 551 浏览 1 评论 0原文

我正在尝试从二进制流中读取数据,其中部分数据应解析为 UTF-8。

直接使用 InputStream 来读取二进制数据,并在其之上使用 InputStreamReader 来读取 UTF-8 文本,这不起作用,因为读取器会提前读取并弄乱后续的二进制文件即使被告知最多读取 n 个字符。

我认识到这个问题与以多种格式从InputStream读取非常相似,但是那里提出的解决方案特定于 HTTP 流,这对我没有帮助。

我想将所有内容都作为二进制数据读取,然后将相关部分转换为文本。但我只有字符数据的长度信息,而不是字节。因此,我需要从流中读取字符的东西来了解编码。

有没有办法告诉 InputStreamReader 不要提前读取超出读取给定字符数所需的内容?或者是否有一个阅读器既支持二进制数据又支持带有编码的文本,并且可以在这些模式之间动态切换?

I am trying to read data from a binary stream, portions of which should be parsed as UTF-8.

Using the InputStream directly for the binary data and an InputStreamReader on top of it for the UTF-8 text does not work as the reader will read ahead and mess up the subsequent binary data even if it is told to read a maximum of n characters.

I recognize this question is very similar to Read from InputStream in multiple formats, but the solution proposed there is specific to HTTP streams, which does not help me.

I thought of just reading everything as binary data and converting the relevant pieces to text afterwards. But I only have the length information of the character data in characters, not in bytes. Thus, I need the thing which reads characters from the stream to be aware of the encoding.

Is there a way to tell InputStreamReader not to read ahead further than is needed for reading the given number of characters? Or is there a reader that supports both binary data and text with an encoding and can be switched between these modes on the fly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

2024-11-24 17:45:13

您需要先阅读二进制部分。当您识别出需要 UTF-8 解码的字节部分时,您需要提取这些字节并对其进行解码。

DataInputStream dis = 
// read a binary type.
int num = dis.readInt();
int len = dis.readUnsignedShort();
// read a UTF-8 portion.
byte[] bytes = new byte[len];
dis.readFully(bytes);
String text = new String(bytes, "UTF-8");
// read some binary
double d = dis.readDouble();

You need to read the binary portions first. Where you recognise a portion of bytes which need UTF-8 decoding you need to extract those bytes and decode it.

DataInputStream dis = 
// read a binary type.
int num = dis.readInt();
int len = dis.readUnsignedShort();
// read a UTF-8 portion.
byte[] bytes = new byte[len];
dis.readFully(bytes);
String text = new String(bytes, "UTF-8");
// read some binary
double d = dis.readDouble();
心作怪 2024-11-24 17:45:13

我认为你不应该使用 StreamReader。读者处理文本,但你同时处理文本和二进制数据。

没有办法。您必须读取二进制缓冲区并自己解释您的格式,即找到文本提取字节的位置并将它们转换为字符串。

为了简化此任务,我建议您创建自己的类(比方说 ProtocolRecord)。它应该是可序列化的。它将包含您的所有字段。
现在你有2个选择:

(1)简单的一个——使用java序列化机制。在这种情况下,您只需使用用于读取的 DataInputStream 和用于写入的 DataOutputStream 包装您的流,然后读取/写入您的对象。这种方法的缺点是您无法控制您的协议。

(2)自己实现方法readObject()和writeObject()。现在如上所述使用 DataInputStream 和 DataOutputStream。
在这种情况下,您确实必须实现序列化协议,但至少它被封装到您的类中。

它认为 DataInputStream 就是您所需要的。

I think that you just should not use StreamReader. Readers deal with text but you deal with text and binary data together.

There is no way. You have to read binary buffers and interpret your format yourself, i.e. find the position of text extract bytes and transform them to String.

To simplify this task I'd recommend you to create your own class (let's say ProtocolRecord.) It should be Serializable. It will contain all your fields.
Now you have 2 options:

(1) simple one - use the java serialization mechanism. In this case you just have to wrap your stream with DataInputStream for reading and DataOutputStream for writing and then read/write your objects. The disadvantage of this approach is that you cannot control your protocol.

(2) implement methods readObject() and writeObject() yourself. Now use DataInputStream and DataOutputStream as explained above.
In this case you do have to implement the serialization protocol but at least it is encapsulated into your class.

It think that DataInputStream is what you need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文