检测和诊断无声崩溃的工人
我正在使用守护进程运行 Celery 2 - http://ask.github.com/celery/ Cookbook/daemonizing.html 与 RabbitMQ。 有时会发生无声崩溃,我在 celeryd.log 中看到的唯一内容是:
[2010-12-24 14:14:31,323: INFO/PoolWorker-1414] process shutting down
[2010-12-24 14:14:31,323: INFO/PoolWorker-1414] process exiting with exitcode 0
[2010-12-24 14:14:31,331: INFO/PoolWorker-1415] child process calling self.run()
[2010-12-24 14:14:48,673: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[01bf5d36-7c0e-4f8a-af69-750ef1b24abc]
[2010-12-24 14:14:48,761: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[2d5f9952-d493-4de4-9752-0eee1776147d]
[2010-12-24 14:14:48,861: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[0c77c1ec-df6c-4e34-875c-44909fbf8b9f]
[2010-12-24 14:14:48,961: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[3d83dd54-0be8-4cf9-9cd6-81e070d97170]
[2010-12-24 14:14:49,061: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[2dd29e70-e085-4fd1-a7ef-12d06b21644c]
..........
然后 - 只有“从代理获取任务”,没有任何任务处理。
ps -C celeryd
显示 - celery 节点正在运行。
如果我这样做: /etc/init.d/celeryd restart
- celeryd 进程数量加倍。似乎旧进程不再受守护进程控制。
- 如何检测 - 即使从代理接收到任务,为什么未执行任务处理?
- 为什么旧的 celeryd 进程不会被
/etc/init.d/celeryd restart
杀死?
I'm running Celery 2 using daemonizing - http://ask.github.com/celery/cookbook/daemonizing.html with RabbitMQ.
From time to time silent crash happens, the only thing i see in celeryd.log:
[2010-12-24 14:14:31,323: INFO/PoolWorker-1414] process shutting down
[2010-12-24 14:14:31,323: INFO/PoolWorker-1414] process exiting with exitcode 0
[2010-12-24 14:14:31,331: INFO/PoolWorker-1415] child process calling self.run()
[2010-12-24 14:14:48,673: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[01bf5d36-7c0e-4f8a-af69-750ef1b24abc]
[2010-12-24 14:14:48,761: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[2d5f9952-d493-4de4-9752-0eee1776147d]
[2010-12-24 14:14:48,861: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[0c77c1ec-df6c-4e34-875c-44909fbf8b9f]
[2010-12-24 14:14:48,961: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[3d83dd54-0be8-4cf9-9cd6-81e070d97170]
[2010-12-24 14:14:49,061: INFO/MainProcess] Got task from broker: airsale.search.xxx.get_search_results[2dd29e70-e085-4fd1-a7ef-12d06b21644c]
..........
Then - only "Got task from broker" without any task processing.
ps -C celeryd
shows - that celery nodes are running.
If i do : /etc/init.d/celeryd restart
- number of celeryd processes doubles. Seems that old processes are uncontrolled by daemon any more.
- How to detect - why task processing is not performed, even if task is received from broker?
- Why old celeryd processes are not killed by
/etc/init.d/celeryd restart
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
队列worker被停滞,所以解决这个问题的主要方法是调整每个任务的任务时间限制,当任务超过这个时间时重新启动worker。
到您的任务中:
在 settings.py 中添加以下内容
Queue workers are stalled, so the main solution to this is to adjust a the task time limit for each task, in a way to restart the worker when the tasks exceed this time.
Add to your task the following
in your settings.py add the following