Perl 中带有单个标量参数的 system() 调用的退出代码

发布于 2024-12-17 05:33:06 字数 545 浏览 2 评论 0原文

在具有多个管道的 Perl 脚本中，有一个使用单个标量参数的 system() 调用。该调用看起来或多或少像这样：

system("zcat /foo.gz | grep '^.{6}X|Y|Z' | awk '{print $2,$3,$4,$6}' | bzip2 > /foo.processed.bz2");

有问题的文件 (foo.gz) 非常大，压缩后大小约为 2GB。我想这就是为什么它最初是通过系统调用完成的。

问题：

现在的问题是，这个系统调用似乎总是返回 0，无论其中一个系统命令是否失败。我认为这是因为它是通过 sh -c '...' 调用的。这是正确的吗？

如果仅传递单个标量参数，是否有方法检查 system() 调用是否成功？

有没有更好的方法来处理这样的大文件，以同样或更有效的方式（主要在速度方面）？

感谢您的任何提示，因为我对 Perl 不太熟悉。

原文

There is a system() call in a Perl script with multiple pipes, using a single scalar argument. The call looks more or less like this:

system("zcat /foo.gz | grep '^.{6}X|Y|Z' | awk '{print $2,$3,$4,$6}' | bzip2 > /foo.processed.bz2");

The file in question (foo.gz) is quite large, about 2GB compressed in size. I guess that's why it was originally done via a system call.

Questions:

The problem now is, that this system call always seem to return 0, whether one of the system commands fail or not. I assume this is because it gets invoked via sh -c '...'. Is that correct?

Is there a way to check if a system() call was successful if only a single scalar argument is passed?

Is there a better way to process a large file like this, in a way thats equally or more efficient (in terms of speed mainly)?

Thanks for any hints as I am not really familiar with Perl.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情话已封尘 2024-12-24 05:33:06

有两件事：

当您执行系统调用时，返回的值是管道中的最后一个值。因此，您将获得 bzip2 命令的状态代码。
该程序这样做的原因是因为编写该程序的人可能不知道更多。我见过 Perl 程序使用系统调用来查找文件的基名、进行查找，甚至进行复制/重命名/移动。这些都是可以在 Perl 程序中更快、更轻松地完成的事情。而且，您不会遇到整个 Windows/Unix 兼容性问题。

对于此类事情，使用 Perl 模块总是更好。在这种情况下，我敢打赌 Perl 模块将比 shell 管道更快，并且您将对整个操作有更多的控制权。

有一个名为 IO::Compress 的集合，可以处理 Zip 和 BZip2 。

我使用 Archive::Zip 这是一个很棒的模块，但您想使用 Bzip2 压缩算法，而 Archive::Zip 无法处理。

回复收藏 0 原文

最佳男配角 2024-12-24 05:33:06

system() 返回 /bin/sh shell 返回的内容。当多个命令被管道化时，shell 会为每个命令分叉一个新进程，并返回链中最后一个命令的状态代码，在本例中为 bzip2。

回复收藏 0 原文

初熏 2024-12-24 05:33:06

根据您的评论和回答，我现在会这样做：

$infile =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
open(OUTFH, "| /bin/bzip > $outfile") or die "Can't open $outfile: $!";
open(INFH, $infile) or die "Can't open $infile: $!";
while (my $line = <INFH>) {
    if ($line =~ /^.{6}X|Y|Z) {
        # TODO: the awk part...
        print OUTFH $line;
    }
}
close(INFH);
close(OUTFH);

请随意发表评论并投票赞成/反对。

Based on your comments and answers, I'd do it like that now:

$infile =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
open(OUTFH, "| /bin/bzip > $outfile") or die "Can't open $outfile: $!";
open(INFH, $infile) or die "Can't open $infile: $!";
while (my $line = <INFH>) {
    if ($line =~ /^.{6}X|Y|Z) {
        # TODO: the awk part...
        print OUTFH $line;
    }
}
close(INFH);
close(OUTFH);

Please feel free to comment and vote up/down.

回复收藏 0 原文