nova-compute service of a compute node is unstable

发布于 2021-11-24 04:50:47 字数 2390 浏览 916 评论 3

有一个计算节点上的nova-compute服务不稳定，在dashboard上“计算服务”一栏里显示这个节点的计算服务是"down"的状态时，该节点就无法建立虚机了，然而我登录到节点检查服务是正常运行的状态，有进程号。出现这种情况后只有restart nova-compute service才可以在此节点上新建立虚机。但是重启服务后一般会有效时间在10小时左右会再次出现建立不了虚机的情况。

查看日志有如下错误信息：

2015-07-23 07:01:51.549 12683 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID aefecca0b79147d4a99d4edc66c7e2e5.
2015-07-23 07:02:00.784 12683 ERROR nova.servicegroup.drivers.db [-] model server went away
2015-07-23 07:02:51.778 12683 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._run_pending_deletes: Timed out waiting for a reply to message ID 925284b7efb24c01817f5846a0a8e7c9.
2015-07-23 07:03:07.758 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 110] Connection timed out
2015-07-23 07:03:07.763 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 15.12.52.37:5672 is unreachable: [Errno 110] Connection timed out. Trying again in 1 seconds.
2015-07-23 07:03:11.510 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 113] EHOSTUNREACH
2015-07-23 07:03:11.513 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 15.12.52.37:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 2 seconds.
2015-07-23 07:03:14.510 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 113] EHOSTUNREACH
2015-07-23 07:03:14.512 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 15.12.52.37:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 2 seconds.
2015-07-23 07:03:17.438 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-07-23 07:03:17.442 12683 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 15.12.52.37:5672 closed the connection. Check login credentials: Socket closed
2015-07-23 07:04:19.074 12683 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._heal_instance_info_cache: Timed out waiting for a reply to message ID c359b12760394e23bf6472bc7f856543.

并且该节点存在丢包现象但是丢包率不高，从该节点ping任何节点都会丢包，但是其他节点上不存在这现象。其他节点都是正常使用。

rabbitmq服务正常运行。

分享到QQ

分享到微博