Java:读取一个巨大文件的最后n行

发布于 2024-10-01 14:31:28 字数 199 浏览 7 评论 0原文

我想读取一个非常大的文件的最后 n 行,而不使用 Java 将整个文件读入任何缓冲区/内存区域。

我查看了 JDK API 和 Apache Commons I/O,但无法找到适合此目的的一个。

我正在考虑 tail 或 less 在 UNIX 中的做法。我不认为他们加载整个文件然后显示文件的最后几行。 Java 中也应该有类似的方法来做同样的事情。

I want to read the last n lines of a very big file without reading the whole file into any buffer/memory area using Java.

I looked around the JDK APIs and Apache Commons I/O and am not able to locate one which is suitable for this purpose.

I was thinking of the way tail or less does it in UNIX. I don't think they load the entire file and then show the last few lines of the file. There should be similar way to do the same in Java too.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

淡紫姑娘! 2024-10-08 14:31:28

我发现使用 ReversedLinesFileReader 来自 apache commons- io API。
此方法将为您提供文件从底部到顶部的行,您可以指定 n_lines 值来指定行数。

import org.apache.commons.io.input.ReversedLinesFileReader;


File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
    System.out.println(object.readLine());
    counter++;
}

I found it the simplest way to do by using ReversedLinesFileReader from apache commons-io api.
This method will give you the line from bottom to top of a file and you can specify n_lines value to specify the number of line.

import org.apache.commons.io.input.ReversedLinesFileReader;


File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
    System.out.println(object.readLine());
    counter++;
}
朕就是辣么酷 2024-10-08 14:31:28

如果您使用 RandomAccessFile< /a>,您可以使用 长度seek 到达文件末尾附近的特定点,然后从那里向前读取。

如果您发现没有足够的行,请从该点返回并重试。一旦您弄清楚最后第 N 行的开始位置,您就可以找到那里并阅读并打印。

可以根据您的数据属性做出初始最佳猜测假设。例如,如果它是一个文本文件,则行长度可能不会超过平均 132,因此,要获取最后五行,请在末尾之前开始 660 个字符。然后,如果你错了,请在 1320 处重试(你甚至可以使用从最后 660 个字符中学到的知识来调整 - 例如:如果这 660 个字符只是三行,则下一次尝试可能是 660 / 3 * 5,加一点额外以防万一)。

If you use a RandomAccessFile, you can use length and seek to get to a specific point near the end of the file and then read forward from there.

If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the Nth last line begins, you can seek to there and just read-and-print.

An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).

说好的呢 2024-10-08 14:31:28

正如其他答案所述,RandomAccessFile 是一个很好的起点。但有一个重要警告

如果您的文件未使用每字符一字节的编码进行编码,则 readLine() 方法对您不起作用。和 readUTF()< /code>在任何情况下都不起作用。 (它读取前面带有字符计数的字符串...)

相反,您需要确保以尊重编码字符边界的方式查找行尾标记。对于固定长度编码(例如UTF-16 或UTF-32 风格),您需要从可被字符大小(以字节为单位)整除的字节位置开始提取字符。对于可变长度编码(例如UTF-8),您需要搜索必须是字符的第一个字节的字节。

在 UTF-8 的情况下,字符的第一个字节将为 0xxxxxxx110xxxxx1110xxxx11110xxx >。其他任何内容要么是第二个/第三个字节,要么是非法的 UTF-8 序列。请参阅Unicode 标准,版本 5.2,第 3.9 章,表3-7.正如评论讨论所指出的,这意味着正确编码的 UTF-8 流中的任何 0x0A 和 0x0D 字节都将表示 LF 或 CR 字符。因此,如果我们可以假设不使用其他类型的 Unicode 行分隔符(0x2028、0x2029 和 0x0085),那么简单地计算 0x0A 和 0x0D 字节是一种有效的实现策略(对于 UTF-8)。你不能这么假设,那么代码会更复杂。

确定了正确的字符边界后,您只需调用 new String(...) 并传递字节数组、偏移量、计数和编码,然后重复调用 String.lastIndexOf(.. .) 来计算行尾。

RandomAccessFile is a good place to start, as described by the other answers. There is one important caveat though.

If your file is not encoded with an one-byte-per-character encoding, the readLine() method is not going to work for you. And readUTF() won't work in any circumstances. (It reads a string preceded by a character count ...)

Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that must be the first byte of a character.

In the case of UTF-8, the first byte of a character will be 0xxxxxxx or 110xxxxx or 1110xxxx or 11110xxx. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.

Having identified a proper character boundary, you can then just call new String(...) passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...) to count end-of-lines.

剑心龙吟 2024-10-08 14:31:28

ReversedLinesFileReader 可以在 Apache Commons IO java 中找到图书馆。

    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;

The ReversedLinesFileReader can be found in the Apache Commons IO java library.

    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;
吾性傲以野 2024-10-08 14:31:28

我发现 RandomAccessFile 和其他 Buffer Reader 类对我来说太慢了。没有什么比 tail -<#lines> 更快的了。所以这对我来说是最好的解决方案。

public String getLastNLogLines(File file, int nLines) {
    StringBuilder s = new StringBuilder();
    try {
        Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
        java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
        String line = null;
    //Here we first read the next line into the variable
    //line and then check for the EOF condition, which
    //is the return value of null
    while((line = input.readLine()) != null){
            s.append(line+'\n');
        }
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
    return s.toString();
}

I found RandomAccessFile and other Buffer Reader classes too slow for me. Nothing can be faster than a tail -<#lines>. So this it was the best solution for me.

public String getLastNLogLines(File file, int nLines) {
    StringBuilder s = new StringBuilder();
    try {
        Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
        java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
        String line = null;
    //Here we first read the next line into the variable
    //line and then check for the EOF condition, which
    //is the return value of null
    while((line = input.readLine()) != null){
            s.append(line+'\n');
        }
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
    return s.toString();
}
看轻我的陪伴 2024-10-08 14:31:28
package com.uday;

import java.io.File;
import java.io.RandomAccessFile;

public class TailN {
    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();

        TailN tailN = new TailN();
        File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
        tailN.readFromLast(file);

        System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));

    }

    public void readFromLast(File file) throws Exception {
        int lines = 3;
        int readLines = 0;
        StringBuilder builder = new StringBuilder();
        try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
            long fileLength = file.length() - 1;
            // Set the pointer at the last of the file
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                char c;
                // read from the last, one char at the time
                c = (char) randomAccessFile.read();
                // break when end of the line
                if (c == '\n') {
                    readLines++;
                    if (readLines == lines)
                        break;
                }
                builder.append(c);
                fileLength = fileLength - pointer;
            }
            // Since line is read from the last so it is in reverse order. Use reverse
            // method to make it correct order
            builder.reverse();
            System.out.println(builder.toString());
        }

    }
}
package com.uday;

import java.io.File;
import java.io.RandomAccessFile;

public class TailN {
    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();

        TailN tailN = new TailN();
        File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
        tailN.readFromLast(file);

        System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));

    }

    public void readFromLast(File file) throws Exception {
        int lines = 3;
        int readLines = 0;
        StringBuilder builder = new StringBuilder();
        try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
            long fileLength = file.length() - 1;
            // Set the pointer at the last of the file
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                char c;
                // read from the last, one char at the time
                c = (char) randomAccessFile.read();
                // break when end of the line
                if (c == '\n') {
                    readLines++;
                    if (readLines == lines)
                        break;
                }
                builder.append(c);
                fileLength = fileLength - pointer;
            }
            // Since line is read from the last so it is in reverse order. Use reverse
            // method to make it correct order
            builder.reverse();
            System.out.println(builder.toString());
        }

    }
}
月下客 2024-10-08 14:31:28

这是一个不依赖 Apache 的项目。它使用 Java 流,比 RandomAccessFile 或 Apache 的 ReversedLinesFileReader 快得多。以下是我从 100,000 行文件中读取最后 90,000 行时得到的结果:

此方法:50ms
Apache 的 ReversedLinesFileReader:900 毫秒
RandomAccessFile(反向读取):1,200ms

原始来源

public static String[] getLastNLinesFromFile(String filePath, int numLines) throws IOException {
    try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
        AtomicInteger offset = new AtomicInteger();
        String[] lines = new String[numLines];
        stream.forEach(line -> {
            lines[offset.getAndIncrement() % numLines] = line;
        });
        List<String> list = IntStream.range(offset.get() < numLines ? 0 : offset.get() - numLines, offset.get())
                .mapToObj(idx -> lines[idx % numLines]).collect(Collectors.toList());
        return list.toArray(new String[0]);
    }
}

Here's one without an Apache dependency. It uses Java streams and is much faster than RandomAccessFile or Apache's ReversedLinesFileReader. Here are the results I got when reading the last 90,000 lines from a 100,000 line file:

This method: 50ms
Apache's ReversedLinesFileReader: 900ms
RandomAccessFile (reading in reverse): 1,200ms

Original source

public static String[] getLastNLinesFromFile(String filePath, int numLines) throws IOException {
    try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
        AtomicInteger offset = new AtomicInteger();
        String[] lines = new String[numLines];
        stream.forEach(line -> {
            lines[offset.getAndIncrement() % numLines] = line;
        });
        List<String> list = IntStream.range(offset.get() < numLines ? 0 : offset.get() - numLines, offset.get())
                .mapToObj(idx -> lines[idx % numLines]).collect(Collectors.toList());
        return list.toArray(new String[0]);
    }
}
万水千山粽是情ミ 2024-10-08 14:31:28

来自 apache commons 的 CircularFifoBuffer 。类似问题的答案 如何将 .txt 文件的最后 5 行读入 java

请注意,在 Apache Commons Collections 4 中,此类似乎已重命名为 CircularFifoQueue

CircularFifoBuffer from apache commons . answer from a similar question at How to read last 5 lines of a .txt file into java

Note that in Apache Commons Collections 4 this class seems to have been renamed to CircularFifoQueue

A君 2024-10-08 14:31:28

RandomAccessFile 允许查找 (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html)。 File.length 方法将返回文件的大小。问题是确定行数。为此,您可以查找文件末尾并向后阅读,直到找到正确的行数。

A RandomAccessFile allows for seeking (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html). The File.length method will return the size of the file. The problem is determining number of lines. For this, you can seek to the end of the file and read backwards until you have hit the right number of lines.

孤独岁月 2024-10-08 14:31:28

我有类似的问题,但我不明白其他解决方案。

我用过这个。我希望那是简单的代码。

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}

I had similar problem, but I don't understood to another solutions.

I used this. I hope thats simple code.

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}
地狱即天堂 2024-10-08 14:31:28

这是为此所做的工作。

    private static void printLastNLines(String filePath, int n) {
    File file = new File(filePath);
    StringBuilder builder = new StringBuilder();
    try {
        RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
        long pos = file.length() - 1;
        randomAccessFile.seek(pos);

        for (long i = pos - 1; i >= 0; i--) {
            randomAccessFile.seek(i);
            char c = (char) randomAccessFile.read();
            if (c == '\n') {
                n--;
                if (n == 0) {
                    break;
                }
            }
            builder.append(c);
        }
        builder.reverse();
        System.out.println(builder.toString());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Here is the working for this.

    private static void printLastNLines(String filePath, int n) {
    File file = new File(filePath);
    StringBuilder builder = new StringBuilder();
    try {
        RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
        long pos = file.length() - 1;
        randomAccessFile.seek(pos);

        for (long i = pos - 1; i >= 0; i--) {
            randomAccessFile.seek(i);
            char c = (char) randomAccessFile.read();
            if (c == '\n') {
                n--;
                if (n == 0) {
                    break;
                }
            }
            builder.append(c);
        }
        builder.reverse();
        System.out.println(builder.toString());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
ㄖ落Θ余辉 2024-10-08 14:31:28

这是我发现的最好的方法。简单、相当快、内存效率高。

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}

Here is the best way I've found to do it. Simple and pretty fast and memory efficient.

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}
岁月静好 2024-10-08 14:31:28

(参见推荐)

public String readFromLast(File file, int howMany) throws IOException {
    int numLinesRead = 0;
    StringBuilder builder = new StringBuilder();
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            long fileLength = file.length() - 1;
            /*
             * Set the pointer at the end of the file. If the file is empty, an IOException
             * will be thrown
             */
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                byte b = (byte) randomAccessFile.read();
                if (b == '\n') {
                    numLinesRead++;
                    // (Last line often terminated with a line separator)
                    if (numLinesRead == (howMany + 1))
                        break;
                }
                baos.write(b);
                fileLength = fileLength - pointer;
            }
            /*
             * Since line is read from the last so it is in reverse order. Use reverse
             * method to make it ordered correctly
             */
            byte[] a = baos.toByteArray();
            int start = 0;
            int mid = a.length / 2;
            int end = a.length - 1;

            while (start < mid) {
                byte temp = a[end];
                a[end] = a[start];
                a[start] = temp;
                start++;
                end--;
            }// End while
            return new String(a).trim();
        } // End inner try-with-resources
    } // End outer try-with-resources

} // End method

(See commend)

public String readFromLast(File file, int howMany) throws IOException {
    int numLinesRead = 0;
    StringBuilder builder = new StringBuilder();
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            long fileLength = file.length() - 1;
            /*
             * Set the pointer at the end of the file. If the file is empty, an IOException
             * will be thrown
             */
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                byte b = (byte) randomAccessFile.read();
                if (b == '\n') {
                    numLinesRead++;
                    // (Last line often terminated with a line separator)
                    if (numLinesRead == (howMany + 1))
                        break;
                }
                baos.write(b);
                fileLength = fileLength - pointer;
            }
            /*
             * Since line is read from the last so it is in reverse order. Use reverse
             * method to make it ordered correctly
             */
            byte[] a = baos.toByteArray();
            int start = 0;
            int mid = a.length / 2;
            int end = a.length - 1;

            while (start < mid) {
                byte temp = a[end];
                a[end] = a[start];
                a[start] = temp;
                start++;
                end--;
            }// End while
            return new String(a).trim();
        } // End inner try-with-resources
    } // End outer try-with-resources

} // End method
仅此而已 2024-10-08 14:31:28

我首先尝试了 RandomAccessFile,向后读取文件并在每次读取操作时重新定位文件指针非常繁琐。因此,我尝试了 @Luca 解决方案,并在几分钟内将文件的最后几行作为字符串以两行形式获取。

    InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
    String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));

I tried RandomAccessFile first and it was tedious to read the file backwards, repositioning the file pointer upon every read operation. So, I tried @Luca solution and I got the last few lines of the file as a string in just two lines in a few minutes.

    InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
    String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));
孤独患者 2024-10-08 14:31:28

代码只有 2 行

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle:

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven:

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>

Code is 2 lines only

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle:

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven:

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文