如何在 perl 中禁用 stdout 重定向到文件缓冲?
这是一个启动 10 个进程的脚本,每个进程向其 STDOUT 写入 100,000 行,该 STDOUT 是从父进程继承的:
#!/usr/bin/env perl
# buffer.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);
$|=1; # don't think this does anything with syswrite...
# start 10 jobs which write 100,000 lines each
for (1 .. 10 ) {
$pm->start and next;
for my $j (1 .. 100_000) {
syswrite(\*STDOUT,"$j\n");
}
$pm->finish;
}
$pm->wait_all_children;
如果我通过管道传输到另一个进程,一切都很好。
$ perl buffering.pl | wc -l
1000000
但是如果我通过管道传输到磁盘,则系统写入会互相破坏。
$ perl buffering.pl > tmp.txt ; wc -l tmp.txt
457584 tmp.txt
更重要的是,如果我在子进程中打开写入文件句柄并直接写入 tmp.txt:
#!/usr/bin/env perl
# buffering2.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);
$|=1;
for (1 .. 10) {
$pm->start and next;
open my $fh, '>', 'tmp.txt';
for my $j (1 .. 100_000) {
syswrite($fh,"$j\n");
}
close $fh;
$pm->finish;
}
$pm->wait_all_children;
tmp.txt 按预期有 1,000,000 行。
$ perl buffering2.pl; wc -l tmp.txt
100000 tmp.txt
因此通过“>”重定向到磁盘有某种缓冲,但重定向到进程没有?这是怎么回事?
Here's a script that launchs 10 processes, each writing 100,000 lines to its STDOUT, which is inherited from the parent:
#!/usr/bin/env perl
# buffer.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);
$|=1; # don't think this does anything with syswrite...
# start 10 jobs which write 100,000 lines each
for (1 .. 10 ) {
$pm->start and next;
for my $j (1 .. 100_000) {
syswrite(\*STDOUT,"$j\n");
}
$pm->finish;
}
$pm->wait_all_children;
If I pipe to another process, all is well..
$ perl buffering.pl | wc -l
1000000
But if I pipe to disk, the syswrites clobber each other.
$ perl buffering.pl > tmp.txt ; wc -l tmp.txt
457584 tmp.txt
What's more, if I open write-file handles in the child processes and write directly to tmp.txt:
#!/usr/bin/env perl
# buffering2.pl
use 5.10.0;
use strict;
use warnings FATAL => "all";
use autodie;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(4);
$|=1;
for (1 .. 10) {
$pm->start and next;
open my $fh, '>', 'tmp.txt';
for my $j (1 .. 100_000) {
syswrite($fh,"$j\n");
}
close $fh;
$pm->finish;
}
$pm->wait_all_children;
tmp.txt has 1,000,000 lines as expected.
$ perl buffering2.pl; wc -l tmp.txt
100000 tmp.txt
So redirection via '>' to disk has some sort of buffering but redirection to a process doesn't? What's the deal?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您重定向整个 perl 脚本时,您会得到一个文件描述符(当您执行
> tmp.txt
时由 shell 创建,并由 perl 作为stdout
继承),即dup
'd 给每个孩子。当您在每个子项中显式open
时,您会获得不同的文件描述符(而不是原始文件的dup
)。如果您将open my $fh, '>', 'tmp.txt' 提升到循环之外,您应该能够复制 shell 重定向情况。
管道情况之所以有效,是因为您正在与管道而不是文件进行通信,并且它没有偏移量的概念,正如我上面所描述的,偏移量可能会无意中在内核中共享。
When you redirect the whole perl script you get one file descriptor (created by the shell when you do
> tmp.txt
and inherited asstdout
by perl) which isdup
'd to each child. When you explicitlyopen
in each child you get different file descriptors (notdup
s of the original). You should be able to replicate the shell redirection case if you hoistopen my $fh, '>', 'tmp.txt'
out of your loop.The pipe case works because you're talking to a pipe and not a file and it has no notion of offset which can be inadvertently shared in the kernel as I described above.