如何读取文件从第x行到第y行(使用php)

发布于 2024-12-04 11:42:05 字数 493 浏览 5 评论 0原文

我在互联网上进行了探索以找到解决方案;但他们都忽略了一个重要问题。最好的解决方案是在 Stack Overflow 中,

  $file = new SplFileObject('longFile.txt');
    $fileIterator = new LimitIterator($file, 1000, 2000);
    foreach($fileIterator as $line) {
    echo $line, PHP_EOL;
    } 

但与其他方法一样,这需要从文件的开头读取以到达偏移行。通常,它可以忽略不计;但对于大文件(例如数百万行),这会显着减慢该过程。时间随着偏移量的增加而单调增加。如果将偏移量设置为数百万,则处理时间将是几秒钟。

在数据库(如mysql)中,我们对表进行索引以读取一行,而无需遍历整个数据库。有没有用文件密钥(行号)做这样的事情?我想知道像 SQLite 和 Berkeley DB 这样的平面文件数据库是如何索引它们的表的。

I've explored all over the internet to find a solution; but all of them are neglecting an important issue. The best solution was in Stack Overflow as

  $file = new SplFileObject('longFile.txt');
    $fileIterator = new LimitIterator($file, 1000, 2000);
    foreach($fileIterator as $line) {
    echo $line, PHP_EOL;
    } 

But like other approaches, this needs to read from the beginning of the file to reach the offset line. Usually, it is negligible; but for large files (say millions of line), this significantly slow down the process. The time increases monotonically by the increase of the offset. If you put the offset at millions, the process time will be few seconds.

In databases (like mysql), we index the table to read a row without walking through the whole database. Is there to do such thing with the file key (line number)? I wonder how flat file databases like SQLite and Berkeley DB do index their tables.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蒲公英的约定 2024-12-11 11:42:05

没有办法寻找特定的线路,因为“线路”术语只是一个约定。行是“由\n分隔的字符集”。文件对这个约定一无所知。因此,要获得第 N 行,您需要逐个字符地遍历以计算所需的行数。

正如您所提到的 - 您可以使用某种自定义创建的索引(例如行号 - 以字节为单位的偏移量列表)来提高性能,但要构建它,您无论如何都需要解析文件。

There is no way to seek to a particular line, because "line" term is just a convention. Line is a "set of characters separated by \n". And file has no idea about this convention. So to get N'th line you need to traverse char by char to count needed amount of lines.

As you mentioned - you can improve the performance using some sort of custom created index (like row number - offset in bytes list), but to build it you need to parse the file anyway.

梦回旧景 2024-12-11 11:42:05

这里的概念问题是文件只是字符串,其中一些字符表示行尾。因此,如果不先读取文件,就不可能知道行的开头和结尾。

如果要不断读取文件,请首先扫描文件并将行的偏移量记录到某种索引中,然后使用 fseek()fread( ) 准确读取您的行 想。

正如您所提到的,数据库可以为您做类似的工作,因此,您可以逐行读取文件并将这些行插入数据库中,并使用一些存储行号的字段,而不是创建自己的数据库,然后获取查询所需的行。

The conceptual problem here is that files are just strings of characters, some of those characters denote ends of lines. Because of that, it is impossible to know where lines begin and end without reading the file first.

If a file is to be read constantly, you scan the file first and record the offsets for the lines into some kind of index and use fseek() and fread() to read exactly the lines you want.

As you mentioned, databases can do a similar job for you so, instead of creating, essentially, your own database, you could read the file line–by–line and insert those lines in database with some field storing the line number and then get the lines you want with a query.

恍梦境° 2024-12-11 11:42:05
<?php

    $strings = file_get_contents($file);

    $length= strlen($strings);

    for($i=0;$i<$length;$i++) {
        print $strings{$i};
    }

?>

上面的代码将获取字符串中的文件内容,然后将逐个迭代每个字符,现在由您决定如何使用它们。

<?php

    $strings = file_get_contents($file);

    $length= strlen($strings);

    for($i=0;$i<$length;$i++) {
        print $strings{$i};
    }

?>

The above code will get file contents in strings, and then will iterate each character one by one, now its upto you how you want to make use of them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文