如何从 Perl 脚本非阻塞地写入 gzip 文件？

发布于 2024-10-09 17:49:09 字数 370 浏览 6 评论 0原文

我目前正在编写一个脚本，该脚本采用数据库作为输入，并按照某些规则从 10 多个表中生成所有有效组合。由于输出非常巨大，我通过 gzip 将其转储到文件中，如下所示：

open( my $OUT, '|-', "gzip > file" );
for ( @data ) {
    my $line = calculate($_);
    print $OUT $line;
}

由于野兽的本质，尽管我最终不得不进行数十万次小写入，每一行一个。这意味着在每次计算之间，它都会等待 gzip 接收数据并完成压缩。至少我是这么认为的，也许我是错的。

如果我是对的，我想知道如何使此打印异步，即它在 gzip 上触发数据，然后继续处理数据。

原文

I'm currently writing a script that takes a database as input and generates all valid combinations from the 10+ tables, following certain rules. Since the output is pretty darn huge, i'm dumping this through gzip into the file, like this:

open( my $OUT, '|-', "gzip > file" );
for ( @data ) {
    my $line = calculate($_);
    print $OUT $line;
}

Due to the nature of the beast though i end up having to make hundreds of thousands of small writes, one for each line. This means that between each calculation it waits for gzip to receive the data and get done compressing it. At least i think so, i might be wrong.

In case I'm right though, I'm wondering how i can make this print asynchronous, i.e. it fires the data at gzip and then goes on processing the data.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忱杏 2024-10-16 17:49:09

尝试 IO::Compress::Gzip 。它接受要写入的文件句柄。您可以在该文件句柄上设置O_NONBLOCK。

回复收藏 0 原文

绅刃 2024-10-16 17:49:09

管道已经使用了缓冲区，因此写入程序不必等待读取程序。然而，该缓冲区通常相当小（在 Linux 上通常只有 64KB）并且不容易更改（需要重新编译内核）。如果标准缓冲区不够，最简单的方法是在管道中包含一个缓冲程序：

open( my $OUT, '|-', "bfr | gzip > file" );

bfr 只是将 STDIN 读入内存缓冲区，并以下一个程序允许的速度写入 STDOUT。默认值为 5MB 缓冲区，但您可以使用 -b 选项更改该设置（例如，bfr -b10m 表示 10MB 缓冲区）。

Pipes already use a buffer so that the writing program doesn't have to wait for the reading program. However, that buffer is usually fairly small (it's normally only 64KB on Linux) and not easily changed (it requires recompiling the kernel). If the standard buffer is not enough, the easiest thing to do is include a buffering program in the pipeline:

open( my $OUT, '|-', "bfr | gzip > file" );

bfr simply reads STDIN into an in-memory buffer, and writes to STDOUT as fast as the next program allows. The default is a 5MB buffer, but you can change that with the -b option (e.g. bfr -b10m for a 10MB buffer).

回复收藏 0 原文