当前位置：文江博客话题详情

Java file-io large-files

Java：读取一个巨大文件的最后n行

发布于 2024-10-01 14:31:28 字数 199 浏览 7 评论 0原文

我想读取一个非常大的文件的最后 n 行，而不使用 Java 将整个文件读入任何缓冲区/内存区域。

我查看了 JDK API 和 Apache Commons I/O，但无法找到适合此目的的一个。

我正在考虑 tail 或 less 在 UNIX 中的做法。我不认为他们加载整个文件然后显示文件的最后几行。 Java 中也应该有类似的方法来做同样的事情。

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（15）

淡紫姑娘！ 2024-10-08 14:31:28

我发现使用 ReversedLinesFileReader 来自 apache commons- io API。
此方法将为您提供文件从底部到顶部的行，您可以指定 n_lines 值来指定行数。

import org.apache.commons.io.input.ReversedLinesFileReader;


File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
    System.out.println(object.readLine());
    counter++;
}

I found it the simplest way to do by using ReversedLinesFileReader from apache commons-io api.
This method will give you the line from bottom to top of a file and you can specify n_lines value to specify the number of line.

import org.apache.commons.io.input.ReversedLinesFileReader;


File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
    System.out.println(object.readLine());
    counter++;
}

回复收藏 0 原文

朕就是辣么酷 2024-10-08 14:31:28

如果您使用 RandomAccessFile< /a>，您可以使用 长度和seek 到达文件末尾附近的特定点，然后从那里向前读取。

如果您发现没有足够的行，请从该点返回并重试。一旦您弄清楚最后第 N 行的开始位置，您就可以找到那里并阅读并打印。

可以根据您的数据属性做出初始最佳猜测假设。例如，如果它是一个文本文件，则行长度可能不会超过平均 132，因此，要获取最后五行，请在末尾之前开始 660 个字符。然后，如果你错了，请在 1320 处重试（你甚至可以使用从最后 660 个字符中学到的知识来调整 - 例如：如果这 660 个字符只是三行，则下一次尝试可能是 660 / 3 * 5，加一点额外以防万一）。

回复收藏 0 原文

说好的呢 2024-10-08 14:31:28

正如其他答案所述，RandomAccessFile 是一个很好的起点。但有一个重要警告。

如果您的文件未使用每字符一字节的编码进行编码，则 readLine() 方法对您不起作用。和 readUTF()< /code>在任何情况下都不起作用。（它读取前面带有字符计数的字符串...）

相反，您需要确保以尊重编码字符边界的方式查找行尾标记。对于固定长度编码（例如UTF-16 或UTF-32 风格），您需要从可被字符大小（以字节为单位）整除的字节位置开始提取字符。对于可变长度编码（例如UTF-8），您需要搜索必须是字符的第一个字节的字节。

在 UTF-8 的情况下，字符的第一个字节将为 0xxxxxxx 或 110xxxxx 或 1110xxxx 或 11110xxx >。其他任何内容要么是第二个/第三个字节，要么是非法的 UTF-8 序列。请参阅Unicode 标准，版本 5.2，第 3.9 章，表3-7.正如评论讨论所指出的，这意味着正确编码的 UTF-8 流中的任何 0x0A 和 0x0D 字节都将表示 LF 或 CR 字符。因此，如果我们可以假设不使用其他类型的 Unicode 行分隔符（0x2028、0x2029 和 0x0085），那么简单地计算 0x0A 和 0x0D 字节是一种有效的实现策略（对于 UTF-8）。你不能这么假设，那么代码会更复杂。

确定了正确的字符边界后，您只需调用 new String(...) 并传递字节数组、偏移量、计数和编码，然后重复调用 String.lastIndexOf(.. .) 来计算行尾。

回复收藏 0 原文

剑心龙吟 2024-10-08 14:31:28

ReversedLinesFileReader 可以在 Apache Commons IO java 中找到图书馆。

    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;

The ReversedLinesFileReader can be found in the Apache Commons IO java library.

    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;

回复收藏 0 原文

吾性傲以野 2024-10-08 14:31:28

我发现 RandomAccessFile 和其他 Buffer Reader 类对我来说太慢了。没有什么比 tail -<#lines> 更快的了。所以这对我来说是最好的解决方案。

public String getLastNLogLines(File file, int nLines) {
    StringBuilder s = new StringBuilder();
    try {
        Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
        java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
        String line = null;
    //Here we first read the next line into the variable
    //line and then check for the EOF condition, which
    //is the return value of null
    while((line = input.readLine()) != null){
            s.append(line+'\n');
        }
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
    return s.toString();
}

I found RandomAccessFile and other Buffer Reader classes too slow for me. Nothing can be faster than a tail -<#lines>. So this it was the best solution for me.

public String getLastNLogLines(File file, int nLines) {
    StringBuilder s = new StringBuilder();
    try {
        Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
        java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
        String line = null;
    //Here we first read the next line into the variable
    //line and then check for the EOF condition, which
    //is the return value of null
    while((line = input.readLine()) != null){
            s.append(line+'\n');
        }
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
    return s.toString();
}

回复收藏 0 原文

看轻我的陪伴 2024-10-08 14:31:28

package com.uday;

import java.io.File;
import java.io.RandomAccessFile;

public class TailN {
    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();

        TailN tailN = new TailN();
        File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
        tailN.readFromLast(file);

        System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));

    }

    public void readFromLast(File file) throws Exception {
        int lines = 3;
        int readLines = 0;
        StringBuilder builder = new StringBuilder();
        try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
            long fileLength = file.length() - 1;
            // Set the pointer at the last of the file
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                char c;
                // read from the last, one char at the time
                c = (char) randomAccessFile.read();
                // break when end of the line
                if (c == '\n') {
                    readLines++;
                    if (readLines == lines)
                        break;
                }
                builder.append(c);
                fileLength = fileLength - pointer;
            }
            // Since line is read from the last so it is in reverse order. Use reverse
            // method to make it correct order
            builder.reverse();
            System.out.println(builder.toString());
        }

    }
}

package com.uday;

import java.io.File;
import java.io.RandomAccessFile;

public class TailN {
    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();

        TailN tailN = new TailN();
        File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
        tailN.readFromLast(file);

        System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));

    }

    public void readFromLast(File file) throws Exception {
        int lines = 3;
        int readLines = 0;
        StringBuilder builder = new StringBuilder();
        try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
            long fileLength = file.length() - 1;
            // Set the pointer at the last of the file
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                char c;
                // read from the last, one char at the time
                c = (char) randomAccessFile.read();
                // break when end of the line
                if (c == '\n') {
                    readLines++;
                    if (readLines == lines)
                        break;
                }
                builder.append(c);
                fileLength = fileLength - pointer;
            }
            // Since line is read from the last so it is in reverse order. Use reverse
            // method to make it correct order
            builder.reverse();
            System.out.println(builder.toString());
        }

    }
}

回复收藏 0 原文

月下客 2024-10-08 14:31:28

这是一个不依赖 Apache 的项目。它使用 Java 流，比 RandomAccessFile 或 Apache 的 ReversedLinesFileReader 快得多。以下是我从 100,000 行文件中读取最后 90,000 行时得到的结果：

此方法：50ms
Apache 的 ReversedLinesFileReader：900 毫秒
RandomAccessFile（反向读取）：1,200ms

public static String[] getLastNLinesFromFile(String filePath, int numLines) throws IOException {
    try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
        AtomicInteger offset = new AtomicInteger();
        String[] lines = new String[numLines];
        stream.forEach(line -> {
            lines[offset.getAndIncrement() % numLines] = line;
        });
        List<String> list = IntStream.range(offset.get() < numLines ? 0 : offset.get() - numLines, offset.get())
                .mapToObj(idx -> lines[idx % numLines]).collect(Collectors.toList());
        return list.toArray(new String[0]);
    }
}

Here's one without an Apache dependency. It uses Java streams and is much faster than RandomAccessFile or Apache's ReversedLinesFileReader. Here are the results I got when reading the last 90,000 lines from a 100,000 line file:

This method: 50ms
Apache's ReversedLinesFileReader: 900ms
RandomAccessFile (reading in reverse): 1,200ms

Original source

public static String[] getLastNLinesFromFile(String filePath, int numLines) throws IOException {
    try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
        AtomicInteger offset = new AtomicInteger();
        String[] lines = new String[numLines];
        stream.forEach(line -> {
            lines[offset.getAndIncrement() % numLines] = line;
        });
        List<String> list = IntStream.range(offset.get() < numLines ? 0 : offset.get() - numLines, offset.get())
                .mapToObj(idx -> lines[idx % numLines]).collect(Collectors.toList());
        return list.toArray(new String[0]);
    }
}

回复收藏 0 原文

万水千山粽是情ミ 2024-10-08 14:31:28

来自 apache commons 的 CircularFifoBuffer 。类似问题的答案如何将 .txt 文件的最后 5 行读入 java

请注意，在 Apache Commons Collections 4 中，此类似乎已重命名为 CircularFifoQueue

回复收藏 0 原文

A君 2024-10-08 14:31:28

RandomAccessFile 允许查找 (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html)。 File.length 方法将返回文件的大小。问题是确定行数。为此，您可以查找文件末尾并向后阅读，直到找到正确的行数。

回复收藏 0 原文

孤独岁月 2024-10-08 14:31:28

我有类似的问题，但我不明白其他解决方案。

我用过这个。我希望那是简单的代码。

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}

I had similar problem, but I don't understood to another solutions.

I used this. I hope thats simple code.

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}

回复收藏 0 原文

地狱即天堂 2024-10-08 14:31:28

这是为此所做的工作。

    private static void printLastNLines(String filePath, int n) {
    File file = new File(filePath);
    StringBuilder builder = new StringBuilder();
    try {
        RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
        long pos = file.length() - 1;
        randomAccessFile.seek(pos);

        for (long i = pos - 1; i >= 0; i--) {
            randomAccessFile.seek(i);
            char c = (char) randomAccessFile.read();
            if (c == '\n') {
                n--;
                if (n == 0) {
                    break;
                }
            }
            builder.append(c);
        }
        builder.reverse();
        System.out.println(builder.toString());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Here is the working for this.

    private static void printLastNLines(String filePath, int n) {
    File file = new File(filePath);
    StringBuilder builder = new StringBuilder();
    try {
        RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
        long pos = file.length() - 1;
        randomAccessFile.seek(pos);

        for (long i = pos - 1; i >= 0; i--) {
            randomAccessFile.seek(i);
            char c = (char) randomAccessFile.read();
            if (c == '\n') {
                n--;
                if (n == 0) {
                    break;
                }
            }
            builder.append(c);
        }
        builder.reverse();
        System.out.println(builder.toString());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

回复收藏 0 原文

ㄖ落Θ余辉 2024-10-08 14:31:28

这是我发现的最好的方法。简单、相当快、内存效率高。

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}

Here is the best way I've found to do it. Simple and pretty fast and memory efficient.

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}

回复收藏 0 原文

岁月静好 2024-10-08 14:31:28

（参见推荐）

public String readFromLast(File file, int howMany) throws IOException {
    int numLinesRead = 0;
    StringBuilder builder = new StringBuilder();
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            long fileLength = file.length() - 1;
            /*
             * Set the pointer at the end of the file. If the file is empty, an IOException
             * will be thrown
             */
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                byte b = (byte) randomAccessFile.read();
                if (b == '\n') {
                    numLinesRead++;
                    // (Last line often terminated with a line separator)
                    if (numLinesRead == (howMany + 1))
                        break;
                }
                baos.write(b);
                fileLength = fileLength - pointer;
            }
            /*
             * Since line is read from the last so it is in reverse order. Use reverse
             * method to make it ordered correctly
             */
            byte[] a = baos.toByteArray();
            int start = 0;
            int mid = a.length / 2;
            int end = a.length - 1;

            while (start < mid) {
                byte temp = a[end];
                a[end] = a[start];
                a[start] = temp;
                start++;
                end--;
            }// End while
            return new String(a).trim();
        } // End inner try-with-resources
    } // End outer try-with-resources

} // End method

public String readFromLast(File file, int howMany) throws IOException {
    int numLinesRead = 0;
    StringBuilder builder = new StringBuilder();
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            long fileLength = file.length() - 1;
            /*
             * Set the pointer at the end of the file. If the file is empty, an IOException
             * will be thrown
             */
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                byte b = (byte) randomAccessFile.read();
                if (b == '\n') {
                    numLinesRead++;
                    // (Last line often terminated with a line separator)
                    if (numLinesRead == (howMany + 1))
                        break;
                }
                baos.write(b);
                fileLength = fileLength - pointer;
            }
            /*
             * Since line is read from the last so it is in reverse order. Use reverse
             * method to make it ordered correctly
             */
            byte[] a = baos.toByteArray();
            int start = 0;
            int mid = a.length / 2;
            int end = a.length - 1;

            while (start < mid) {
                byte temp = a[end];
                a[end] = a[start];
                a[start] = temp;
                start++;
                end--;
            }// End while
            return new String(a).trim();
        } // End inner try-with-resources
    } // End outer try-with-resources

} // End method

回复收藏 0 原文

仅此而已 2024-10-08 14:31:28

我首先尝试了 RandomAccessFile，向后读取文件并在每次读取操作时重新定位文件指针非常繁琐。因此，我尝试了 @Luca 解决方案，并在几分钟内将文件的最后几行作为字符串以两行形式获取。

    InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
    String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));

I tried RandomAccessFile first and it was tedious to read the file backwards, repositioning the file pointer upon every read operation. So, I tried @Luca solution and I got the last few lines of the file as a string in just two lines in a few minutes.

    InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
    String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));

回复收藏 0 原文

孤独患者 2024-10-08 14:31:28

代码只有 2 行

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle：

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven：

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>

Code is 2 lines only

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle:

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven:

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

23 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

烙印

文章 0 评论 0

singlesman

文章 0 评论 0

给自己一个微笑

文章 0 评论 0

独孤求败

文章 0 评论 0

晨钟暮鼓

文章 0 评论 0

我是自愿种绣球花的

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文