在 Perl 中从大文件中读取特定行

发布于 2024-12-22 03:47:04 字数 585 浏览 2 评论 0原文

是否有任何快速且内存有效的方法来读取大文件的特定行,而不将其加载到内存中?

我编写了一个 perl 脚本,它运行许多分支,我希望它们从文件中读取特定行。

目前我正在使用外部命令:

sub getFileLine {
    my ( $filePath, $lineWanted ) = @_;
    $SIG{PIPE} = '_IGNORE_';
    open( my $fh, '-|:utf8', "tail -q -n +$lineWanted \"$filePath\" | head -n 1" );
    my $line = <$fh>;
    close $fh;
    chomp( $line );
    return $line;
}

它快速且有效 - 但也许有一种更“Perl 式”的方式,与此一样快速且内存高效?

如您所知,在 Perl 中创建派生进程会复制主进程内存 - 因此,如果主进程使用 10MB,则派生进程将至少使用那么多内存。

我的目标是使 fork 进程(因此是主进程,直到也运行 fork)内存使用量尽可能低。这就是为什么我不想将整个文件加载到内存中。

Is there any fast and memory efficient way to read specific lines of large file, without loading it to memory?

I wrote a perl script, that runs many forks and I would like them to read specific lines from a file.

At the moment Im using an external command:

sub getFileLine {
    my ( $filePath, $lineWanted ) = @_;
    $SIG{PIPE} = '_IGNORE_';
    open( my $fh, '-|:utf8', "tail -q -n +$lineWanted \"$filePath\" | head -n 1" );
    my $line = <$fh>;
    close $fh;
    chomp( $line );
    return $line;
}

Its fast and it works - but maybe there's a more "Perl-ish" way, as fast and as memory efficient as this one?

As you know, creating a fork process in Perl duplicates the main process memory - so if the main process is using 10MB, the fork will use at least that much.

My goal is to keep fork process (so main process until running forks also) memory use as low as possible. Thats why I dont want to load the whole file into memory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

明媚如初 2024-12-29 03:47:04

在进一步讨论之前,了解 fork 的工作原理非常重要。当您fork一个进程时,操作系统使用copy-on-write 共享父进程和子进程的大部分内存的语义;只需要单独分配父级和子级之间不同的内存量。

要在 Perl 中读取文件的单行,有一个简单的方法:

open my $fh, '<', $filePath or die "$filePath: $!";
my $line;
while( <$fh> ) {
    if( $. == $lineWanted ) { 
        $line = $_;
        last;
    }
}

使用特殊的 $. 变量来保存当前文件句柄的行号。

Before you go further, it's important to understand how fork works. When you fork a process, the OS uses copy-on-write semantics to share the bulk of the parent and child processes' memory; only the amount of memory that differs between the parent and child need to be separately allocated.

For reading a single line of a file in Perl, here's a simple way:

open my $fh, '<', $filePath or die "$filePath: $!";
my $line;
while( <$fh> ) {
    if( $. == $lineWanted ) { 
        $line = $_;
        last;
    }
}

This uses the special $. variable which holds the line number of the current filehandle.

2024-12-29 03:47:04

Take a look at Tie::File core module.

最美不过初阳 2024-12-29 03:47:04

你不需要分叉。正如您可以想象的那样,从文件中读取特定行是一项非常常见的操作,CPAN 上的 20k 模块之一已经执行了该操作。

File::ReadBackwards 内存效率高且速度快。

You don't need to fork. As you can imagine, reading a specific line from a file is a common enough operation that one of the 20k modules on CPAN does it already.

File::ReadBackwards is memory-efficient and fast.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文