为什么 RabbitMQ 不断从损坏的持久日志文件中中断?
我在 Django 应用程序中运行 Celery,并使用 RabbitMQ 作为消息代理。然而,RabbitMQ 总是这样崩溃。首先是我从 Django 得到的错误。跟踪大多不重要,因为我知道导致错误的原因,正如您将看到的。
Traceback (most recent call last):
...
File "/usr/local/lib/python2.6/dist-packages/amqplib/client_0_8/transport.py", line 85, in __init__
raise socket.error, msg
error: [Errno 111] Connection refused
我知道这是由于 rabbit_persister.log 文件损坏造成的。这是因为在我杀死与 RabbitMQ 相关的所有进程后,我运行“sudorabbitmq-server start”会出现以下崩溃:
...
starting queue recovery ...done
starting persister ...BOOT ERROR: FAILED
Reason: {{badmatch,{error,{{{badmatch,eof},
[{rabbit_persister,internal_load_snapshot,2},
{rabbit_persister,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{child,undefined,rabbit_persister,
{rabbit_persister,start_link,[]},
transient,100,worker,
[rabbit_persister]}}}},
[{rabbit_sup,start_child,2},
{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
{rabbit,run_boot_step,1},
{rabbit,'-start/2-lc$^0/1-0-',1},
{rabbit,start,2},
{application_master,start_it_old,4}]}
Erlang has closed
我当前的修复:每次发生这种情况时,我都会重命名相应的rabbit_persister.log文件到其他东西(rabbit_persister.log.bak)并且能够成功重新启动 RabbitMQ。但问题不断出现,我也说不出原因。有什么想法吗?
另外,作为免责声明,我没有使用 Erlang 的经验;我只使用 RabbitMQ,因为它是 Celery 青睐的代理。
预先感谢,这个问题真的很烦我,因为我一遍又一遍地做同样的修复。
I'm running Celery in a Django app with RabbitMQ as the message broker. However, RabbitMQ keeps breaking down like so. First is the error I get from Django. The trace is mostly unimportant, because I know what is causing the error, as you will see.
Traceback (most recent call last):
...
File "/usr/local/lib/python2.6/dist-packages/amqplib/client_0_8/transport.py", line 85, in __init__
raise socket.error, msg
error: [Errno 111] Connection refused
I know that this is due to a corrupt rabbit_persister.log file. This is because after I kill all processes tied to RabbitMQ, I run "sudo rabbitmq-server start" to get the following crash:
...
starting queue recovery ...done
starting persister ...BOOT ERROR: FAILED
Reason: {{badmatch,{error,{{{badmatch,eof},
[{rabbit_persister,internal_load_snapshot,2},
{rabbit_persister,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{child,undefined,rabbit_persister,
{rabbit_persister,start_link,[]},
transient,100,worker,
[rabbit_persister]}}}},
[{rabbit_sup,start_child,2},
{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
{rabbit,run_boot_step,1},
{rabbit,'-start/2-lc$^0/1-0-',1},
{rabbit,start,2},
{application_master,start_it_old,4}]}
Erlang has closed
My current fix: Every time this happens, I rename the corresponding rabbit_persister.log file to something else (rabbit_persister.log.bak) and am able to restart RabbitMQ with success. But the problem keeps occurring, and I can't tell why. Any ideas?
Also, as a disclaimer, I have no experience with Erlang; I'm only using RabbitMQ because it's the broker favored by Celery.
Thanks in advance, this problem is really annoying me because I keep doing the same fix over and over.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
持久化器是RabbitMQ的内部消息数据库。该“日志”可能类似于数据库日志,删除它会导致您丢失消息。我猜它会被不干净的经纪商关闭所破坏,但这有点离题了。
有趣的是,您在
rabbit_persister
模块中遇到了错误。具有该文件的 RabbitMQ 的最后一个版本是 2.2.0,因此我强烈建议您升级。最好的版本始终是最新的,您可以使用 RabbitMQ APT 存储库 获取该版本。特别是,持久化器在 2.2.0 之后的版本中已经进行了大量修复,因此您的问题很可能已经得到解决。如果升级后仍然遇到问题,您应该在 RabbitMQ 讨论上报告该问题邮件列表。开发人员(Celery 和 RabbitMQ)致力于修复那里报告的任何问题。
The persister is RabbitMQ's internal message database. That "log" is presumably like a database log and deleting it will cause you to lose messages. I guess it's getting corrupted by unclean broker shutdowns, but that's a bit beside the point.
It's interesting that you're getting an error in the
rabbit_persister
module. The last version of RabbitMQ that has that file is 2.2.0, so I'd strongly advise you to upgrade. The best version is always the latest, which you can get by using the RabbitMQ APT repository. In particular, the persister has seen a fairly large amount of fixes in the versions after 2.2.0, so there's a big chance your problem has already been resolved.If you still see the problem after upgrading, you should report it on the RabbitMQ Discuss mailing list. The developers (of both Celery and RabbitMQ) make a point of fixing any problems reported there.
A.因为您运行的是早于2.7.1的旧版本RabbitMQ
B.因为RabbitMQ没有足够的RAM。您需要在服务器上单独运行 RabbitMQ,并为该服务器提供足够的 RAM,以便 RAM 是持久消息日志最大可能大小的 2.5 倍。
您也许可以在不更改任何软件的情况下解决此问题,只需添加更多 RAM 并终止机器上的其他服务即可。
另一种方法是从源代码构建您自己的 RabbitMQ,并包含使用 Tokyo Cabinet 持久保存消息的 toke 扩展。确保您使用的是本地硬盘而不是 NFS 分区,因为 Tokyo Cabinet 存在 NFS 损坏问题。当然,为此使用版本 2.7.1。根据您的消息内容,您还可能受益于 Tokyo Cabinets 压缩设置,以减少持久消息的读/写活动。
A. Because you are running an old version of RabbitMQ earlier than 2.7.1
B. Because RabbitMQ doesn't have enough RAM. You need to run RabbitMQ on a server all by itself and give that server enough RAM so that the RAM is 2.5 times the largest possible size of your persisted message log.
You might be able to fix this without any software changes just by adding more RAM and killing other services on the box.
Another approach to this is to build your own RabbitMQ from source and include the toke extension that persists messages using Tokyo Cabinet. Make sure you are using local hard drive and not NFS partitions because Tokyo Cabinet has corruption issues with NFS. And, of course, use version 2.7.1 for this. Depending on your message content, you might also benefit from Tokyo Cabinets compression settings to reduce the read/write activity of persisted messages.