我是否正确使用 Parallel::Fork::BossWorkerAsync?
背景: 我对多进程 Perl 脚本没有太多经验。我有一个 FooService 的数据清理过程,需要超过 12 个小时才能完成,当我调查时,我发现几乎所有时间都花在等待 FooClient 返回数据上。我正在研究一种多进程方法来完成该任务,一位同事推荐了 Parallel::Fork::BossWorkerAsync 而不是我之前所做的简单 fork() 。我喜欢它,因为它大大降低了我的内存使用量。
问题: BossWorkerAsync 看起来非常整洁,perldoc 很棒,并且在无写测试模式下运行它效果非常好,将我的执行时间缩短到一个小时以内。我唯一的问题是文档并没有真正解释共享数据如何与“init_handler => &x”构造设置一起工作。我希望每个工作人员都有自己的 FooClient,以避免任何类型的同步问题。我选择了我认为正确的做法,但我对此有点偏执,并且也想确保我以最正确的方式处理这件事。
代码:
# The number of children to spawn, modify after performance testing
Readonly my $CHILDREN => 40;
# Each child will set their own client
my $client;
my $bw = Parallel::Fork::BossWorkerAsync->new(
work_handler => \&process_keys,
init_handler => \&setup_client,
worker_count => $CHILDREN,
);
send_work($bw);
while ($bw->pending()) {
my $ref = $bw->get_result();
# Do stuff with the result
}
$bw->shut_down();
exit;
sub setup_client {
$client = FooClient->new();
}
我是否正在处理我不想正确共享的 $client?我保留了与 fork() 版本相同的处理方式,即在 fork() 之后设置 $client,但我只是担心这不是正确的方法。
Background:
I haven't had much experience with multi-process Perl scripts. I have a data cleanup process for the FooService that is taking over 12 hours to complete, and when I investigated, I found that almost all of that time was spent waiting for the FooClient to return me data. I was looking into a multi-process way to do the task, and a coworker recommended Parallel::Fork::BossWorkerAsync over the simple fork() I was doing before. I liked it since it lowered my memory use by a ton.
Problem:
BossWorkerAsync looks pretty neat, the perldoc is great, and running it in no-write test mode works really well, pushing my execution time under an hour. My only problem is that the documentation doesn't really explain how shared data works with the "init_handler => &x" construction setting. I want each worker to have its own FooClient, just to avoid any sort of synchronization issues. I went with what I thought was correct, but I'm sort of paranoid about it, and also want to make sure that I'm dealing with this in the most correct way.
Code:
# The number of children to spawn, modify after performance testing
Readonly my $CHILDREN => 40;
# Each child will set their own client
my $client;
my $bw = Parallel::Fork::BossWorkerAsync->new(
work_handler => \&process_keys,
init_handler => \&setup_client,
worker_count => $CHILDREN,
);
send_work($bw);
while ($bw->pending()) {
my $ref = $bw->get_result();
# Do stuff with the result
}
$bw->shut_down();
exit;
sub setup_client {
$client = FooClient->new();
}
Am I handling the $client that I don't want shared correctly? I kept the same sort of deal I had with my fork() version, where I set the $client after the fork(), but I'm just worried that it's not the right way to do this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,您正在正确使用该模块和 init_handler。处理程序在 fork 之后、在每个子进程中、在进入阻塞选择循环、等待作业之前被调用。
我是该模块的作者。很抱歉我花了这么长时间才注意到这一点并做出回应。很高兴看到代码正在被使用。
干杯,
-乔
Yes, you're using the module, and the init_handler, correctly. The handler is called just after the fork, in each child, before it enters the blocking select loop, waiting for a job.
I'm the author of the module. I'm sorry it took me so long to notice this, and respond. Glad to see the code is being used.
Cheers,
-joe