具有多个服务器和 php 工作人员的 Gearman
我在多个服务器上运行的 gearman 工作人员遇到问题,我似乎无法解决。
当工作服务器脱机而不是工作进程被取消时,就会出现此问题,并导致所有其他工作进程出错和失败。
仅 1 个客户端和 2 个工作人员的示例 -
客户端:
$client = new GearmanClient ();
$client->addServer ('192.168.1.200');
$client->addServer ('192.168.1.201');
$job = $client->do ('generate_tile', serialize ($arrData));
工作人员:
$worker = new GearmanWorker ();
$worker->addServer ('192.168.1.200');
$worker->addServer ('192.168.1.201');
$worker->addFunction ('generate_tile', 'generate_tile');
while (1)
{
if (!$worker->work ())
{
switch ($worker->returnCode ())
{
default:
echo "Error: " . $worker->returnCode () . ': ' . $worker->error () . "\n";
break;
}
}
}
function generate_tile ($job) { ... }
工作人员代码在 2 个独立的服务器上运行。当每台服务器启动并运行时,两个工作线程都会按预期执行作业。当其中一个工作进程被取消时,另一个工作进程将按预期执行所有作业。
但是,当已取消工作进程的服务器关闭并完全脱机时,对客户端脚本的请求将挂起,并且剩余的工作进程不会获取任何作业。
我从剩余的工作进程中收到以下一组错误:
Error: 46: gearman_con_wait:timeout reached
Error: 46: gearman_con_wait:timeout reached
Error: 4: gearman_con_flush:write:110
Error: 46: gearman_con_wait:timeout reached
Error: 4: gearman_con_flush:write:113
Error: 4: gearman_con_flush:write:113
Error: 4: gearman_con_flush:write:113
....
当我启动另一台服务器,而不是在其上启动工作进程时,剩余的工作进程立即开始运行并执行任何剩余的作业。
对我来说,很明显我需要在工作进程中使用一些代码来处理任何可能离线的服务器,但是我不知道如何做到这一点。
非常感谢,
安迪
I'm having a problem with gearman workers running on multiple servers which i can't seem to solve.
The problem occurs when a worker server is taken offline, rather than the worker process being cancelled, and causes all other worker processes to error and fail.
Example with just 1 client and 2 workers -
Client:
$client = new GearmanClient ();
$client->addServer ('192.168.1.200');
$client->addServer ('192.168.1.201');
$job = $client->do ('generate_tile', serialize ($arrData));
Worker:
$worker = new GearmanWorker ();
$worker->addServer ('192.168.1.200');
$worker->addServer ('192.168.1.201');
$worker->addFunction ('generate_tile', 'generate_tile');
while (1)
{
if (!$worker->work ())
{
switch ($worker->returnCode ())
{
default:
echo "Error: " . $worker->returnCode () . ': ' . $worker->error () . "\n";
break;
}
}
}
function generate_tile ($job) { ... }
The worker code is being run on 2 separate servers. When every server is up and running both workers execute jobs as expected. When one of the worker processes is cancelled, the other worker executes all jobs as expected.
However, when the server with the cancelled worker process is shutdown and taken completely offline, requests to the client script hang and the remaining worker process does not pick up any jobs.
I get the following set of errors from the remaining worker process:
Error: 46: gearman_con_wait:timeout reached
Error: 46: gearman_con_wait:timeout reached
Error: 4: gearman_con_flush:write:110
Error: 46: gearman_con_wait:timeout reached
Error: 4: gearman_con_flush:write:113
Error: 4: gearman_con_flush:write:113
Error: 4: gearman_con_flush:write:113
....
When i start-up the other server, not starting the worker process on it, the remaining worker process immediately jumps into life and executes any remaining jobs.
It seems clear to me that i need some code in the worker process to cope with any servers that may be offline, however i cannot see how to do this.
Many thanks,
Andy
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我们对多个 gearman 服务器的测试表明,如果列表中的最后一个服务器(在您的情况下为 192.168.1.201)被关闭,工作人员将停止执行您所描述的方式。 (此外,工作人员从最后一个服务器获取作业。仅当 0.201 上没有作业时,它们才会在 0.200 上处理作业)。
这似乎是 gearman 服务器中链表的一个错误,据报道已修复多次,但对于所有可用的 gearman 版本,该错误仍然存在。抱歉,我知道这不是解决方案,但我们遇到了同样的问题,但没有找到解决方案。 (如果有人可以为这个问题提供可行的解决方案,我同意给予大额赏金)
Our tests with multiple gearman servers shows that if the last server in the list (192.168.1.201 in your case) is taken down, the workers stop executing the way you are describing. (Also, the workers grab jobs from the last server. They process jobs on .200 only if on .201 there are no jobs).
It seems that this is a bug with the linked list in the gearman server, which is reported to be fixed multiple times, but with all available versions of gearman, the bug persist. Sorry, I know that's not a solution, but we had the same problem and didn't found a solution. (if someone can provide working solution for this problem, I agree to give large bounty)
继上面@Darhazer 的评论之后。我们也发现了这一点,并像这样解决了:-
我们随时运行 6 到 10 个工作线程,并在它们完成 x 个请求后使它们过期。
Further to @Darhazer 's comment above. We found that as well and solved like thus :-
We run 6 to 10 workers at any time, and expire them after they've completed x requests.
我使用这个类,它跟踪哪些作业在哪些服务器上运行。还没有彻底测试过,现在就写一下。我粘贴了编辑后的版本,因此可能存在拼写错误或类似问题,但似乎解决了问题。
I use this class, which keep track of which jobs work on which servers. It hasn't been thoroughly tested, just wrote it now. I've pasted an edited version, so there might be a typo or somesuch, but otherwise appears to solve the issue.
由于 gearman 客户端的“addServer”无法正常工作,此代码可以随机选择一个作业服务器,如果失败,请尝试下一个,这样您就可以平衡负载。
since 'addServer' from gearman client is not working properly this code can choose a jobserver randomly and if fails try the next one, this way you can balance the load.
解决方案经过测试并且工作正常。
Solution tested and working ok.