Java - 是否可以逐行读取文件,停止,然后立即开始读取我停止的地方的字节?
我在尝试解析文件的 ascii 部分时遇到问题,一旦我点击结束标记,立即开始从该点开始读取字节。我所知道的在 Java 中读取一行或整个单词的所有内容都会创建一个缓冲区,这会破坏在我的停止点之后立即获取字节的任何机会。逐字节读取、查找新行、重建新行之前的所有内容、查看它是否是我的结束标记,然后从那里开始,是唯一的方法吗?
I'm having an issue trying to parse the ascii part of a file, and once I hit the end tag, IMMEDIATELY start reading in the bytes from that point on. Everything I know in Java to read off a line or a whole word creates a buffer, which ruins any chance of getting the bytes immediately following my stop point. Is the only way to do this read in byte-by-byte, find new-lines, reconstruct everything prior to the new-line, see if it's my end tag, and go from there?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是可能的,但据我所知,API 中的类没有。
您可以手动执行此操作 - 将其打开为 BufferedInputStream< /a>,支持
标记
/重置
。您逐块读取 (byte[]
) 并将其解析为 ASCII。最终,您将其累积在缓冲区中,直到到达标记为止。但在
阅读
之前,您需要调用mark
。如果您认为您已读取了所需的全部 ASCII 内容,则可以调用reset
,然后调用read
转储 ASCII 部分的其余部分。现在您有了一个 BufferedInputStream(这是一个 InputStream),可以用来读取文件的二进制部分。It is possible, but as far as I know not with the classes from the API.
You can do it manually - open it as a BufferedInputStream, which supports
mark
/reset
. You read block by block (byte[]
) and you parse it as ASCII. Eventually you accumulate it in a buffer until you hit the marker.But before you
read
you callmark
. If you believe you read all you needed in ASCII, you callreset
and then you callread
to dump the rest of the ASCII part. And now you have aBufferedInputStream
(which is anInputStream
) ready for reading the binary part of the file.我认为最好的想法是放弃“线”的概念。要查找结束标记,请创建一个足以容纳结束标记的 环形缓冲区,逐字节读入其中,并在每个字节后检查它是否包含标签。
有更复杂和更高效的搜索算法,但差异仅与较长的搜索词相关(大概您的结束标记很短)。
I think the best idea would be to abandon the concept of "lines". To find the end tag, create a ring buffer that's just big enough to contain the end tag, read into it byte-by-byte, and after each byte check if it contains the tag.
There are more sophisticated and efficient search algorithms, but the difference is only relevant with longer search terms (presumably your end tag is short).
这个文件有多大?我的第一个想法是将整个内容读入 ByteBuffer 或 ByteArrayOutputStream 而不尝试处理它,然后通过比较字节值来定位标签。一旦您知道文本部分的结束位置和二进制部分的开始位置,您就可以适当地处理每个部分。
How big is this file? My first thought is to read the whole thing into a ByteBuffer or a ByteArrayOutputStream without trying to process it, then locate the tag by comparing byte values. Once you know where the text part ends and the binary part begins, you process each part as appropriate.
文件是在增长还是静态?
如果它是静态的,请参阅 http://java.sun .com/javase/6/docs/api/java/nio/MappedByteBuffer.html
Is the file growing, or is it static?
If it's static, see http://java.sun.com/javase/6/docs/api/java/nio/MappedByteBuffer.html
是的,你对一个字节一个字节的看法是正确的。抽象有其缺点。
Yup, you're right about the byte-by-byte. Abstraction has its disadvantages.