如何用IPC::Open2过滤大量数据?

发布于 2024-12-02 19:09:26 字数 612 浏览 8 评论 0原文

我的任务是使用外部实用程序(addr2line)从 perl 脚本中过滤一些数据。数据量相当大。我需要将大量数据打印到程序的 stdin 并读回大量数据(从程序的 stdout 读回我的脚本)。

现在我使用 IPC::Open2 执行此操作,但我不混合读取和写入。这合法吗? Open2 会在管道中缓冲任意大小的数据吗?

我的代码:

my $cmd="addr2line -e $prog_name ";
use IPC::Open2;
local (*Reader, *Writer);
my $pid = open2(\*Reader, \*Writer, $cmd);
for(@requests) {  # this array is HUGE, 100s of thousands of entries
    print Writer "$_\n";
}
close Writer;  
for(@requests) {
    $function_name = <Reader>;
    $filesource = <Reader>;
   #... store ..
}
close Reader;
waitpid($pid,0);

My task is to filter some data from perl script with external utility (the addr2line). The data size is quite large. I need to print a lot of data to stdin of program and read a lot of data back (from stdout of program into my script).

Now I do this with IPC::Open2, but I don't mix reading and writing. Is this legal? Will Open2 buffer any size of data in pipe?

My code:

my $cmd="addr2line -e $prog_name ";
use IPC::Open2;
local (*Reader, *Writer);
my $pid = open2(\*Reader, \*Writer, $cmd);
for(@requests) {  # this array is HUGE, 100s of thousands of entries
    print Writer "$_\n";
}
close Writer;  
for(@requests) {
    $function_name = <Reader>;
    $filesource = <Reader>;
   #... store ..
}
close Reader;
waitpid($pid,0);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

欲拥i 2024-12-09 19:09:26

是的,您的程序编写方式会遇到缓冲区容量限制。您的输入缓冲区 (Reader) 将填满并阻止外部程序的执行。

混合读取和写入会有所帮助,因为您清空输入缓冲区的速度与外部程序填充输入缓冲区的速度大致相同。

另一件有用的事情是使用文件进行进程间通信,而不是管道或套接字(如 IPC::Open2 所做的那样)。那么您将仅受可用磁盘空间量的限制。你可以自己做,不过 Forks::Super 默认情况下使用 IPC 文件。

use Forks::Super 'open2';

...
my ($Reader,$Writer,$pid) = open2(@command);
for (@requests) { print $Writer "$_\n" }
close $Writer;
for (@requests) { ... read ... }
close $Reader;
waitpid $pid,0;

Yes, you will run into buffer capacity constraints the way your program is written. Your input buffer (Reader) will fill up and block execution of your external program.

Mixing reading and writing would help, as you would be emptying the input buffer at about the same rate that the external program is filling it.

Another thing that would help is using files for interprocess communication instead of pipes or sockets (as IPC::Open2 does). Then you would be limited only by the amount of free disk space. You could do it yourself, though Forks::Super uses files for IPC by default.

use Forks::Super 'open2';

...
my ($Reader,$Writer,$pid) = open2(@command);
for (@requests) { print $Writer "$_\n" }
close $Writer;
for (@requests) { ... read ... }
close $Reader;
waitpid $pid,0;
清风夜微凉 2024-12-09 19:09:26

管道的尺寸有限。您的方法将陷入僵局

  Parent                 Child
  ------                 -----
  ...                    ...
                         Wait for data in Writer
  Put data in Writer
                         Read data from Writer
                         Put data in Reader
                         Wait for data in Writer
  Put data in Writer
                         Read data from Writer
                         Put data in Reader
                           => Blocks cause Reader is full
  Put data in Writer
  Put data in Writer
  ...
  Put data in Writer
  Put data in Writer
    => Blocks cause Writer is full

一种可能的解决方案:

use strict;
use warnings;
use threads;
use IPC::Open2 qw( open2 );

my @cmd = ("addr2line", "-e", $prog_name);

local (*Reader, *Writer);
my $pid = open2(\*Reader, \*Writer, @cmd);

my $thread = async {
   for (;;) {
       $function_name = <Reader>;
       last if !defined($function_name);
       $filesource = <Reader>;
       #... store ..
   }

   close Reader;
};

{
   my @requests = ...;

   for(@requests) {  # this array is HUGE, 100s of thousands of entries
      print Writer "$_\n";
   }

   close Writer;
}

$thread->join();
waitpid($pid, 0);

或者, IPC::Run 具有以下工具:也会让这变得容易。

unixy 的方法是使用 IO::Select ,但这确实很痛苦。

Pipes have limited sizes. Your approach will deadlock

  Parent                 Child
  ------                 -----
  ...                    ...
                         Wait for data in Writer
  Put data in Writer
                         Read data from Writer
                         Put data in Reader
                         Wait for data in Writer
  Put data in Writer
                         Read data from Writer
                         Put data in Reader
                           => Blocks cause Reader is full
  Put data in Writer
  Put data in Writer
  ...
  Put data in Writer
  Put data in Writer
    => Blocks cause Writer is full

One possible solution:

use strict;
use warnings;
use threads;
use IPC::Open2 qw( open2 );

my @cmd = ("addr2line", "-e", $prog_name);

local (*Reader, *Writer);
my $pid = open2(\*Reader, \*Writer, @cmd);

my $thread = async {
   for (;;) {
       $function_name = <Reader>;
       last if !defined($function_name);
       $filesource = <Reader>;
       #... store ..
   }

   close Reader;
};

{
   my @requests = ...;

   for(@requests) {  # this array is HUGE, 100s of thousands of entries
      print Writer "$_\n";
   }

   close Writer;
}

$thread->join();
waitpid($pid, 0);

Alternatively, IPC::Run has tools that will make this easy too.

The unixy way would be to use IO::Select, but that's a real pain.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文