如何为大日志文件编写 Java 文本文件查看器
我正在开发带有集成日志文件查看器的软件产品。问题是,对于非常大的文件来说,它缓慢且不稳定,因为当您查看日志文件时,它会将整个文件读取到内存中。我想编写一个新的日志文件查看器来解决这个问题。
为大型文本文件编写查看器的最佳实践是什么?像notepad++和VIM这样的编辑器是如何完成这个任务的呢?我正在考虑将缓冲双向文本流阅读器与 Java 的 TableModel 一起使用。我的想法是否正确?这种流实现可用于 Java 吗?
编辑:是否值得运行一次文件来索引每行文本的开头位置,以便知道要查找的位置?我可能需要行数,所以可能必须至少扫描一次文件?
Edit2:我已将我的实现添加到下面的答案中。请对其发表评论或编辑它,以帮助我/我们实现更最佳实践的实施,或以其他方式提供您自己的实施。
I am working on a software product with an integrated log file viewer. Problem is, its slow and unstable for really large files because it reads the whole file into memory when you view a log file. I'm wanting to write a new log file viewer that addresses this problem.
What are the best practices for writing viewers for large text files? How does editors like notepad++ and VIM acomplish this? I was thinking of using a buffered Bi-directional text stream reader together with Java's TableModel. Am I thinking along the right lines and are such stream implementations available for Java?
Edit: Will it be worthwhile to run through the file once to index the positions of the start of each line of text so that one knows where to seek to? I will probably need the amount of lines, so will probably have to scan through the file at least once?
Edit2: I've added my implementation to an answer below. Please comment on it or edit it to help me/us arrive at a more best-practice implementation or otherwise provide your own.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不确定 NotePad++ 实际上实现了随机访问,但我认为这是可行的方法,尤其是使用日志文件查看器时,这意味着它将是只读的。
由于您的日志查看器是只读的,因此您可以使用只读 随机访问内存映射文件“流”。在 Java 中,这是 FileChannel。
然后根据需要在文件中跳转,并将数据的滚动窗口渲染到屏幕上。
FileChannel 的优点之一是并发线程可以打开文件,并且读取不会影响当前文件指针。因此,如果您在另一个线程中附加到日志文件,它不会受到影响。
另一个优点是您可以随时调用 FileChannel 的 size 方法来获取文件大小。
将内存直接映射到随机访问文件(某些文本编辑器(例如 HxD 和 UltraEdit)允许)的问题是任何更改都会直接影响文件。因此,更改是立即的(写入缓存除外),这是用户通常不希望发生的事情。相反,用户通常不希望进行更改,直到单击“保存”。但是,由于这只是一个查看器,因此您不会有同样的担忧。
I'm not sure that NotePad++ actually implements random access, but I think that's the way to go, especially with a log file viewer, which implies that it will be read only.
Since your log viewer will be read only, you can use a read only random access memory mapped file "stream". In Java, this is the FileChannel.
Then just jump around in the file as needed and render to the screen just a scrolling window of the data.
One of the advantages of the FileChannel is that concurrent threads can have the file open, and reading doesn't affect the current file pointer. So, if you're appending to the log file in another thread, it won't be affected.
Another advantage is that you can call the FileChannel's size method to get the file size at any moment.
The problem with mapping memory directly to a random access file, which some text editors allow (such as HxD and UltraEdit), is that any changes directly affect the file. Therefore, changes are immediate (except for write caching), which is something users typically don't want. Instead, users typically don't want their changes made until they click Save. However, since this is just a viewer, you don't have the same concerns.
一种典型的方法是使用可查找的文件读取器,遍历记录行偏移索引的日志,然后根据请求仅在文件的一部分上显示一个窗口。
这既减少了快速调用所需的数据,又不会加载 99% 的内容当前不可见的小部件。
A typical approach is to use a seekable file reader, make one pass through the log recording an index of line offsets and then present only a window onto a portion of the file as requested.
This reduces both the data you need in quick recall and doesn't load up a widget where 99% of its contents aren't currently visible.
我在这里发布了我的测试实现(在遵循 Marcus Adams 和 MSW 的建议之后),以方便您的使用,也为了进一步的评论和批评。它的速度相当快。
我不关心 Unicode 编码的安全性。我想这将是我的下一个问题。任何有关这方面的提示都非常受欢迎。
I post my test implementation (after following the advice of Marcus Adams and msw) here for your convenience and also for further comments and criticism. Its quite fast.
I've not bothered with Unicode encoding safety. I guess this will be my next question. Any hints on that very welcome.