我是否正确使用 Parallel::Fork::BossWorkerAsync?

发布于 2024-12-14 07:19:18 字数 1089 浏览 1 评论 0原文

背景: 我对多进程 Perl 脚本没有太多经验。我有一个 FooService 的数据清理过程,需要超过 12 个小时才能完成,当我调查时,我发现几乎所有时间都花在等待 FooClient 返回数据上。我正在研究一种多进程方法来完成该任务,一位同事推荐了 Parallel::Fork::BossWorkerAsync 而不是我之前所做的简单 fork() 。我喜欢它,因为它大大降低了我的内存使用量。

问题: BossWorkerAsync 看起来非常整洁,perldoc 很棒,并且在无写测试模式下运行它效果非常好,将我的执行时间缩短到一个小时以内。我唯一的问题是文档并没有真正解释共享数据如何与“init_handler => &x”构造设置一起工作。我希望每个工作人员都有自己的 FooClient,以避免任何类型的同步问题。我选择了我认为正确的做法,但我对此有点偏执,并且也想确保我以最正确的方式处理这件事。

代码:

# The number of children to spawn, modify after performance testing
Readonly my $CHILDREN => 40;

# Each child will set their own client
my $client;

my $bw = Parallel::Fork::BossWorkerAsync->new(
    work_handler => \&process_keys,
    init_handler => \&setup_client,
    worker_count => $CHILDREN,
);

send_work($bw);

while ($bw->pending()) {
    my $ref = $bw->get_result();
    # Do stuff with the result
}

$bw->shut_down();
exit;

sub setup_client {
    $client = FooClient->new();
}

我是否正在处理我不想正确共享的 $client?我保留了与 fork() 版本相同的处理方式,即在 fork() 之后设置 $client,但我只是担心这不是正确的方法。

Background:
I haven't had much experience with multi-process Perl scripts. I have a data cleanup process for the FooService that is taking over 12 hours to complete, and when I investigated, I found that almost all of that time was spent waiting for the FooClient to return me data. I was looking into a multi-process way to do the task, and a coworker recommended Parallel::Fork::BossWorkerAsync over the simple fork() I was doing before. I liked it since it lowered my memory use by a ton.

Problem:
BossWorkerAsync looks pretty neat, the perldoc is great, and running it in no-write test mode works really well, pushing my execution time under an hour. My only problem is that the documentation doesn't really explain how shared data works with the "init_handler => &x" construction setting. I want each worker to have its own FooClient, just to avoid any sort of synchronization issues. I went with what I thought was correct, but I'm sort of paranoid about it, and also want to make sure that I'm dealing with this in the most correct way.

Code:

# The number of children to spawn, modify after performance testing
Readonly my $CHILDREN => 40;

# Each child will set their own client
my $client;

my $bw = Parallel::Fork::BossWorkerAsync->new(
    work_handler => \&process_keys,
    init_handler => \&setup_client,
    worker_count => $CHILDREN,
);

send_work($bw);

while ($bw->pending()) {
    my $ref = $bw->get_result();
    # Do stuff with the result
}

$bw->shut_down();
exit;

sub setup_client {
    $client = FooClient->new();
}

Am I handling the $client that I don't want shared correctly? I kept the same sort of deal I had with my fork() version, where I set the $client after the fork(), but I'm just worried that it's not the right way to do this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倾城月光淡如水﹏ 2024-12-21 07:19:18

是的,您正在正确使用该模块和 init_handler。处理程序在 fork 之后、在每个子进程中、在进入阻塞选择循环、等待作业之前被调用。

我是该模块的作者。很抱歉我花了这么长时间才注意到这一点并做出回应。很高兴看到代码正在被使用。

干杯,
-乔

Yes, you're using the module, and the init_handler, correctly. The handler is called just after the fork, in each child, before it enters the blocking select loop, waiting for a job.

I'm the author of the module. I'm sorry it took me so long to notice this, and respond. Glad to see the code is being used.

Cheers,
-joe

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文