如何清除卡住/过时的 Resque 工作人员?
正如您从附图中看到的,我有几个工人似乎被困住了。这些过程不应超过几秒钟。
我不确定为什么它们无法清除或如何手动删除它们。
我在 Heroku 上使用 Resque 与 Redis-to-Go 和 HireFire 来自动扩展工作人员。
As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds.
I'm not sure why they won't clear or how to manually remove them.
I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
这些解决方案都不适合我,我仍然会在 redis-web 中看到这个:
最后,这对我清除所有工作人员有用:
None of these solutions worked for me, I would still see this in redis-web:
Finally, this worked for me to clear all the workers:
在你的控制台中:
否则你可以尝试假装它们已完成删除它们,方法是:
编辑
很多人都赞成这个答案,我觉得人们尝试 hagope 的解决方案注销很重要工作人员离开队列,而上面的代码删除队列。如果你乐意伪造它们,那就酷了。
In your console:
Otherwise you can try to fake them as being done to remove them, with:
EDIT
A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.
您可能已经安装了 resque gem,因此您可以打开控制台并获取当前的工作人员
它返回工作人员列表,
选择工作人员和 prune_dead_workers ,例如第一个
You probably have the resque gem installed, so you can open the console and get current workers
It returns a list of workers
pick the worker and
prune_dead_workers
, for example the first one添加到 hagope 的回答中,我希望能够只注销已经运行一定时间的工作人员。下面的代码只会注销运行时间超过 300 秒(5 分钟)的工作人员。
我正在收集与 Resque 相关的 Rake 任务,并将其添加到: https://gist.github.com/ewherrmann/ 8809350
Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).
I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350
无论您在何处运行启动服务器的命令,都可以运行此命令,
您应该会看到如下内容:
记下我示例中的 PID(进程 ID),它是 92102
然后您可以退出进程 1 of 2方式。
优雅地使用
QUIT 92102
强制使用
TERM 92102
* 我不确定它的语法是
QUIT 92102
或QUIT -92102
如果您遇到任何问题,请告诉我。
Run this command wherever you ran the command to start the server
you should see something like this:
Make note of the PID (process id) in my example it is 92102
Then you can quit the process 1 of 2 ways.
Gracefully use
QUIT 92102
Forcefully use
TERM 92102
* I'm not sure of the syntax it's either
QUIT 92102
orQUIT -92102
Let me know if you have any trouble.
我刚刚做了:
得到了工人名单。
...其中 n 是不需要的工人的从零开始的索引。
I just did:
Got the list of workers.
... where n is the zero based index of the unwanted worker.
我遇到了类似的问题,Redis 将数据库保存到包含无效(未运行)工作线程的磁盘。每次启动 Redis/resque 时,它们都会出现。
使用以下方法修复此问题:
确保重新启动 Redis 和 Resque 工作线程。
I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.
Fix this using:
Make sure you restart Redis and your Resque workers.
最近开始从事 https://github.com/shaiguitar/resque_stuck_queue/ 工作。这不是解决如何修复卡住的工作人员的解决方案,但它解决了 resque 挂起/被卡住的问题,所以我认为这对这个线程上的人可能会有所帮助。来自自述文件:
“如果 resque 在特定时间范围内没有运行作业,它将触发您选择的预定义处理程序。您可以使用它来发送电子邮件、寻呼机任务、添加更多 resque 工作人员、重新启动 resque、向您发送txt...任何适合你的。”
已在生产中使用,到目前为止对我来说效果很好。
Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:
"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."
Been used in production and works pretty well for me thus far.
以下是如何通过主机名从 Redis 中清除它们。当我停用服务器并且工作人员无法正常退出时,就会发生这种情况。
Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.
我遇到了这个问题,并开始实施这里的许多建议。但是,我发现造成此问题的根本原因是我 使用 gem redis-rb 3.3.0。降级到 redis-rb 3.2.2 从一开始就防止了这些工作人员陷入困境。
I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.
我直接从 redis-cli 中清除了它们。幸运的是 redistogo.com 允许从 heroku 之外的环境进行访问。
从列表中获取死亡工人 ID。我的是
直接在redis中运行这个命令。
您可以监视 redis 数据库以查看它在幕后执行的操作。
倒数第二行删除该工作人员。
I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku.
Get dead worker ID from the list. Mine was
Run this command in redis directly.
You can monitor redis db to see what it's doing behind the scenes.
Second last line deletes the worker.
在 resque 2.0.0 中,这是一种似乎的方法,可以仅删除 resque 2.0.0 中实际上已死亡的工人:
我不是正在发生的事情的专家,可能有更好的方法这样做或者那样做就会有问题。我也只是想弄清楚这个问题。
这似乎从 resque 工作人员列表中删除了比预期时间长得多的时间没有发送“心跳”的工作人员。
如果幻影工作处于“运行”状态,则将在“失败”作业队列中创建与幻影作业相对应的新条目。
In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:
I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.
This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.
If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.
我这里也有卡住/陈旧的 resque 工作人员,或者我应该说“工作”,因为工作人员实际上仍然在那里并且运行良好,这是卡住的分叉进程。
我选择了残酷的解决方案,即通过 bash 脚本终止分叉进程“处理”超过 5 分钟,然后工作进程只是在队列中生成下一个进程,一切都会继续进行,
请查看我的脚本:https://gist.github.com/jobwat/5712437
I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.
I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going
have a look at my script here: https://gist.github.com/jobwat/5712437
如果您使用的是较新版本的 Resque,则需要使用以下命令,因为内部 API 已更改...
If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...
只要您有比 1.26.0 更新的 resque 版本,这就可以避免该问题:
请记住,它不会让当前正在运行的作业完成。
This avoids the problem as long as you have a resque version newer than 1.26.0:
Keep in mind that it does not let the currently running job finish.
如果你使用Docker,也可以使用这个命令:
是worker id。If you use Docker, you can also use this command:
<id>
is the worker id.