如何在 Java 中干净地读取同时包含 ASCII 和其他编码的文件?
我有一个自定义图像文件,其中第一个数据块是 ASCII 元数据。我需要能够使用 Java 读取文件的 ASCII 元数据部分,并知道它何时结束,以及另一种编码中的“原始图像数据”何时开始。
我正在考虑将所有文件读入一个 byte[],然后以某种方式开始从中读取字节并将它们转换为 ASCII,直到我到达 ascii 元数据部分的末尾,此时我将存储它数据。然后我可以按原样以不同的顺序重新排列原始二进制数据(无需读取)。然而,我可以考虑这样做的唯一方法是逐字节读取 ascii 内容并查找新行,然后连接新行之前的所有内容,看看这是否是表示该行开始的标记。原始图像数据。但是,必须有一种更好的方法来使用 readLine() 读取文件的 ascii 部分,然后能够立即从原始图像二进制文件开始,而无需在新的阅读器中重新打开文件并转到文件中的行。其他读者我发现了“开始图像”标签。
有什么想法吗?
I have a custom image file where the first block of data is ASCII meta data. I need to be able to read this ASCII meta-data part of the file with Java and know when it ends, and when the 'raw image data' in another encoding starts.
I was thinking of reading all of the file into a byte[], and then somehow either start reading bytes out of this and convert them to ASCII until I hit the end of the ascii meta-data section, at which point I would store this data. Then I could just rearrange the raw binary data in a different order as-is (no reading necessary). However, the only way I could think about doing this would be to read the ascii stuff byte-by-byte and look for new lines, and concat everything prior to a new line and see if that is the tag which signifies the beginning of the raw image data. However, there must be a better way of reading the ascii part of the file with readLine() and then be able to immediately start with the raw image binary without needed to reopen the file in a new reader and go to the line where in the other reader I found the 'begin image' tag.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
FileInputStream
(包装在BufferedInputStream
中)ByteArrayOutputStream
char
(隐式使用 ASCII)ByteArrayOutputStream
使用Scanner
在输入流上,但您必须小心使用哪种模式以确保它无需启动即可找到标签读取图像数据(因为您想从您保留单独引用的底层输入流中读取该数据)。编辑:不幸的是,扫描仪似乎隐式使用缓冲区也是如此,所以剩下的唯一选择是“手动”实现字符串搜索。
FileInputStream
(wrapped in aBufferedInputStream
)ByteArrayOutputStream
char
(that's using ASCII implicitly)ByteArrayOutputStream
ByteArrayOutputStream
and convert it to a String usingnew String(array, "US-ASCII");
It might be possible to do the string searching easily by using aScanner
on the input stream, but you have to be careful which pattern you use to make sure it will find the tag without starting to read the image data (since you want to read that yourself from the underlying input stream you're keeping a separate reference to).Edit: Unfortunately, it looks like Scanner implicitly uses a buffer as well, so the only option left is to implement the string search "manually".
不确定您是否可以自己决定格式,但无论如何:
另一种策略是在文件的第一个位置写入一个整数值,其中包含用于 ascii 分区的字节数。
然后您可以只读取该数量的字节,并且还可以轻松地跳过 ascii 并直接进入二进制 blob。
此策略很有效,但您无法在不更改计数的情况下更改 ascii 文本字符的数量。
顺便说一句,请确保清理您的输入:不要尝试读取超出文件包含的数据或分配超出机器能力的内存。
就我个人而言,我还会使用文件的前几个字符来包含一些魔术代码,以便您可以对文件正在使用您的数据格式以及数据格式的版本进行最少的检查。
Not sure if you can decide the format yourself, but anyway:
An alternative strategy is to write an integer value at the first location of the file, which contains the number of bytes which are used for the ascii partition.
Then you could just read that amount of bytes, and it is also possible to easily skip the ascii and go directly to the binary blob.
This strategy is efficient, but you cannot change the amount of ascii text characters without changing the count.
By the way, make sure to sanitize your input: Don't try to read more data then the file contains or allocate more memory then the machine is capable of.
Personally I would also use the first couple of characters of the file to contain some magic code, so that you can have a minimal check that the file is using your data format, and what version of the data format.