我是否正确使用 Parallel::Fork::BossWorkerAsync？

发布于 2024-12-14 07:19:18 字数 1089 浏览 1 评论 0原文

背景：我对多进程 Perl 脚本没有太多经验。我有一个 FooService 的数据清理过程，需要超过 12 个小时才能完成，当我调查时，我发现几乎所有时间都花在等待 FooClient 返回数据上。我正在研究一种多进程方法来完成该任务，一位同事推荐了 Parallel::Fork::BossWorkerAsync 而不是我之前所做的简单 fork() 。我喜欢它，因为它大大降低了我的内存使用量。

问题： BossWorkerAsync 看起来非常整洁，perldoc 很棒，并且在无写测试模式下运行它效果非常好，将我的执行时间缩短到一个小时以内。我唯一的问题是文档并没有真正解释共享数据如何与“init_handler => &x”构造设置一起工作。我希望每个工作人员都有自己的 FooClient，以避免任何类型的同步问题。我选择了我认为正确的做法，但我对此有点偏执，并且也想确保我以最正确的方式处理这件事。

代码：

# The number of children to spawn, modify after performance testing
Readonly my $CHILDREN => 40;

# Each child will set their own client
my $client;

my $bw = Parallel::Fork::BossWorkerAsync->new(
    work_handler => \&process_keys,
    init_handler => \&setup_client,
    worker_count => $CHILDREN,
);

send_work($bw);

while ($bw->pending()) {
    my $ref = $bw->get_result();
    # Do stuff with the result
}

$bw->shut_down();
exit;

sub setup_client {
    $client = FooClient->new();
}

我是否正在处理我不想正确共享的 $client？我保留了与 fork() 版本相同的处理方式，即在 fork() 之后设置 $client，但我只是担心这不是正确的方法。

原文

Background:
I haven't had much experience with multi-process Perl scripts. I have a data cleanup process for the FooService that is taking over 12 hours to complete, and when I investigated, I found that almost all of that time was spent waiting for the FooClient to return me data. I was looking into a multi-process way to do the task, and a coworker recommended Parallel::Fork::BossWorkerAsync over the simple fork() I was doing before. I liked it since it lowered my memory use by a ton.

Problem:
BossWorkerAsync looks pretty neat, the perldoc is great, and running it in no-write test mode works really well, pushing my execution time under an hour. My only problem is that the documentation doesn't really explain how shared data works with the "init_handler => &x" construction setting. I want each worker to have its own FooClient, just to avoid any sort of synchronization issues. I went with what I thought was correct, but I'm sort of paranoid about it, and also want to make sure that I'm dealing with this in the most correct way.

Code:

# The number of children to spawn, modify after performance testing
Readonly my $CHILDREN => 40;

# Each child will set their own client
my $client;

my $bw = Parallel::Fork::BossWorkerAsync->new(
    work_handler => \&process_keys,
    init_handler => \&setup_client,
    worker_count => $CHILDREN,
);

send_work($bw);

while ($bw->pending()) {
    my $ref = $bw->get_result();
    # Do stuff with the result
}

$bw->shut_down();
exit;

sub setup_client {
    $client = FooClient->new();
}

Am I handling the $client that I don't want shared correctly? I kept the same sort of deal I had with my fork() version, where I set the $client after the fork(), but I'm just worried that it's not the right way to do this.

分享到QQ

分享到微博