如何为大日志文件编写 Java 文本文件查看器

发布于 2024-09-02 03:10:33 字数 376 浏览 3 评论 0原文

我正在开发带有集成日志文件查看器的软件产品。问题是,对于非常大的文件来说,它缓慢且不稳定,因为当您查看日志文件时,它会将整个文件读取到内存中。我想编写一个新的日志文件查看器来解决这个问题。

为大型文本文件编写查看器的最佳实践是什么?像notepad++和VIM这样的编辑器是如何完成这个任务的呢?我正在考虑将缓冲双向文本流阅读器与 Java 的 TableModel 一起使用。我的想法是否正确?这种流实现可用于 Java 吗?

编辑:是否值得运行一次文件来索引每行文本的开头位置,以便知道要查找的位置?我可能需要行数,所以可能必须至少扫描一次文件?

Edit2:我已将我的实现添加到下面的答案中。请对其发表评论或编辑它,以帮助我/我们实现更最佳实践的实施,或以其他方式提供您自己的实施。

I am working on a software product with an integrated log file viewer. Problem is, its slow and unstable for really large files because it reads the whole file into memory when you view a log file. I'm wanting to write a new log file viewer that addresses this problem.

What are the best practices for writing viewers for large text files? How does editors like notepad++ and VIM acomplish this? I was thinking of using a buffered Bi-directional text stream reader together with Java's TableModel. Am I thinking along the right lines and are such stream implementations available for Java?

Edit: Will it be worthwhile to run through the file once to index the positions of the start of each line of text so that one knows where to seek to? I will probably need the amount of lines, so will probably have to scan through the file at least once?

Edit2: I've added my implementation to an answer below. Please comment on it or edit it to help me/us arrive at a more best-practice implementation or otherwise provide your own.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

じ违心 2024-09-09 03:10:33

我不确定 NotePad++ 实际上实现了随机访问,但我认为这是可行的方法,尤其是使用日志文件查看器时,这意味着它将是只读的。

由于您的日志查看器是只读的,因此您可以使用只读 随机访问内存映射文件“流”。在 Java 中,这是 FileChannel

然后根据需要在文件中跳转,并将数据的滚动窗口渲染到屏幕上。

FileChannel 的优点之一是并发线程可以打开文件,并且读取不会影响当前文件指针。因此,如果您在另一个线程中附加到日志文件,它不会受到影响。

另一个优点是您可以随时调用 FileChannel 的 size 方法来获取文件大小。

将内存直接映射到随机访问文件(某些文本编辑器(例如 HxD 和 UltraEdit)允许)的问题是任何更改都会直接影响文件。因此,更改是立即的(写入缓存除外),这是用户通常不希望发生的事情。相反,用户通常不希望进行更改,直到单击“保存”。但是,由于这只是一个查看器,因此您不会有同样的担忧。

I'm not sure that NotePad++ actually implements random access, but I think that's the way to go, especially with a log file viewer, which implies that it will be read only.

Since your log viewer will be read only, you can use a read only random access memory mapped file "stream". In Java, this is the FileChannel.

Then just jump around in the file as needed and render to the screen just a scrolling window of the data.

One of the advantages of the FileChannel is that concurrent threads can have the file open, and reading doesn't affect the current file pointer. So, if you're appending to the log file in another thread, it won't be affected.

Another advantage is that you can call the FileChannel's size method to get the file size at any moment.

The problem with mapping memory directly to a random access file, which some text editors allow (such as HxD and UltraEdit), is that any changes directly affect the file. Therefore, changes are immediate (except for write caching), which is something users typically don't want. Instead, users typically don't want their changes made until they click Save. However, since this is just a viewer, you don't have the same concerns.

多情出卖 2024-09-09 03:10:33

一种典型的方法是使用可查找的文件读取器,遍历记录行偏移索引的日志,然后根据请求仅在文件的一部分上显示一个窗口。

这既减少了快速调用所需的数据,又不会加载 99% 的内容当前不可见的小部件。

A typical approach is to use a seekable file reader, make one pass through the log recording an index of line offsets and then present only a window onto a portion of the file as requested.

This reduces both the data you need in quick recall and doesn't load up a widget where 99% of its contents aren't currently visible.

嘿哥们儿 2024-09-09 03:10:33

我在这里发布了我的测试实现(在遵循 Marcus Adams 和 MSW 的建议之后),以方便您的使用,也为了进一步的评论和批评。它的速度相当快。

我不关心 Unicode 编码的安全性。我想这将是我的下一个问题。任何有关这方面的提示都非常受欢迎。

class LogFileTableModel implements TableModel {

    private final File f;
    private final int lineCount;
    private final String errMsg;
    private final Long[] index;
    private final ByteBuffer linebuf = ByteBuffer.allocate(1024);
    private FileChannel chan;

    public LogFileTableModel(String filename) {
        f = new File(filename);
        String m;
        int l = 1;
        Long[] idx = new Long[] {};
        try {
            FileInputStream in = new FileInputStream(f);
            chan = in.getChannel();
            m = null;
            idx = buildLineIndex();
            l = idx.length;
        } catch (IOException e) {
            m = e.getMessage();
        }
        errMsg = m;
        lineCount = l;
        index = idx;
    }

    private Long[] buildLineIndex() throws IOException {
        List<Long> idx = new LinkedList<Long>();
        idx.add(0L);

        ByteBuffer buf = ByteBuffer.allocate(8 * 1024);
        long offset = 0;
        while (chan.read(buf) != -1) {
            int len = buf.position();
            buf.rewind();            
            int pos = 0;
            byte[] bufA = buf.array();
            while (pos < len) {
                byte c = bufA[pos++];
                if (c == '\n')
                    idx.add(offset + pos);
            }
            offset = chan.position();
        }
        System.out.println("Done Building index");
        return idx.toArray(new Long[] {});
    }

    @Override
    public int getColumnCount() {
        return 2;
    }

    @Override
    public int getRowCount() {
        return lineCount;
    }

    @Override
    public String getColumnName(int columnIndex) {
        switch (columnIndex) {
        case 0:
            return "#";
        case 1:
            return "Name";
        }
        return "";
    }

    @Override
    public Object getValueAt(int rowIndex, int columnIndex) {
        switch (columnIndex) {
            case 0:                
                return String.format("%3d", rowIndex);
            case 1:
                if (errMsg != null)
                    return errMsg;
                try { 
                    Long pos = index[rowIndex];
                    chan.position(pos);
                    chan.read(linebuf);
                    linebuf.rewind();
                    if (rowIndex == lineCount - 1)
                        return new String(linebuf.array());
                    else    
                        return new String(linebuf.array(), 0, (int)(long)(index[rowIndex+1]-pos));
                } catch (Exception e) {
                    return "Error: "+ e.getMessage();
                }
        }            
        return "a";
    }

    @Override
    public Class<?> getColumnClass(int columnIndex) {
        return String.class;
    }

    // ... other methods to make interface complete


}

I post my test implementation (after following the advice of Marcus Adams and msw) here for your convenience and also for further comments and criticism. Its quite fast.

I've not bothered with Unicode encoding safety. I guess this will be my next question. Any hints on that very welcome.

class LogFileTableModel implements TableModel {

    private final File f;
    private final int lineCount;
    private final String errMsg;
    private final Long[] index;
    private final ByteBuffer linebuf = ByteBuffer.allocate(1024);
    private FileChannel chan;

    public LogFileTableModel(String filename) {
        f = new File(filename);
        String m;
        int l = 1;
        Long[] idx = new Long[] {};
        try {
            FileInputStream in = new FileInputStream(f);
            chan = in.getChannel();
            m = null;
            idx = buildLineIndex();
            l = idx.length;
        } catch (IOException e) {
            m = e.getMessage();
        }
        errMsg = m;
        lineCount = l;
        index = idx;
    }

    private Long[] buildLineIndex() throws IOException {
        List<Long> idx = new LinkedList<Long>();
        idx.add(0L);

        ByteBuffer buf = ByteBuffer.allocate(8 * 1024);
        long offset = 0;
        while (chan.read(buf) != -1) {
            int len = buf.position();
            buf.rewind();            
            int pos = 0;
            byte[] bufA = buf.array();
            while (pos < len) {
                byte c = bufA[pos++];
                if (c == '\n')
                    idx.add(offset + pos);
            }
            offset = chan.position();
        }
        System.out.println("Done Building index");
        return idx.toArray(new Long[] {});
    }

    @Override
    public int getColumnCount() {
        return 2;
    }

    @Override
    public int getRowCount() {
        return lineCount;
    }

    @Override
    public String getColumnName(int columnIndex) {
        switch (columnIndex) {
        case 0:
            return "#";
        case 1:
            return "Name";
        }
        return "";
    }

    @Override
    public Object getValueAt(int rowIndex, int columnIndex) {
        switch (columnIndex) {
            case 0:                
                return String.format("%3d", rowIndex);
            case 1:
                if (errMsg != null)
                    return errMsg;
                try { 
                    Long pos = index[rowIndex];
                    chan.position(pos);
                    chan.read(linebuf);
                    linebuf.rewind();
                    if (rowIndex == lineCount - 1)
                        return new String(linebuf.array());
                    else    
                        return new String(linebuf.array(), 0, (int)(long)(index[rowIndex+1]-pos));
                } catch (Exception e) {
                    return "Error: "+ e.getMessage();
                }
        }            
        return "a";
    }

    @Override
    public Class<?> getColumnClass(int columnIndex) {
        return String.class;
    }

    // ... other methods to make interface complete


}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文