没有从 ELB 到 Auto Scaling 实例之一的流量

发布于 2025-01-05 16:07:30 字数 331 浏览 0 评论 0原文

我们使用 Auto Scaling,它对我们来说效果很好,但今天早上发生了一些事情。 由于某种原因,其中一个实例的 CPU 利用率约为 %0,这给同一可用区中的其余实例带来了 100% 的 CPU 利用率,并且它没有扩展,因为所有实例的平均 CPU 利用率约为 %70 while 触发器应在达到 %80 时启动新实例。还使用了 ELB 实例运行状况检查,但此 %0 实例运行状况良好。

是否可以配置 Auto Scaling 来删除此类实例? 我们不想设置任何自定义 cronjobs 来进行检查。

自动缩放问题

We use Auto Scaling and it works pretty good for us, but this morning something happened to it.
CPU Utilization of one of the Instances was about %0 for some reason which brought %100 of CPU Utilization to the rest of Instances in the same Availability Zone and it didn't scale up, because Average CPU Utilization of all Instances was about %70 while trigger should start new Instance when %80 is hit. ELB Instance health check is used as well, but this %0 Instance was healthy.

Is it possible to configure Auto Scaling to remove such Instances?
We don't want to setup any custom cronjobs for check ups.

Auto Scaling Issue

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

櫻之舞 2025-01-12 16:07:30

更新2

是否可以配置 Auto Scaling 来删除此类实例?

是的,请参见下文 - 根据您的评论,您已经正确完成了此操作。

我们不想设置任何自定义 cronjobs 来进行检查。

鉴于您的配置显然是正确的(意味着 Auto Scaling 和/或 ELB 存在相应问题),恐怕无法通过主动关闭未使用的实例或促进< em>as-set-instance-health,正如我在下面的最初答案中已经建议的那样 - 前者是 tribalcrossing 对 ELB - 不健康的实例会 OOS,然后也会自动从 ELB 中删除,这似乎可以解决您的情况:

我们运行一个每 5 分钟触发一次的 cronjob 来扫描所有
ELB 中的服务器检查其运行时间是否超过 5
分钟并且是不健康的。当我们找到一个时,我们将其关闭。我们已经
遇到了“死”实例陷入 ELB 并抛出的问题
监控触发自动缩放操作的指标,以及 cronjob
为我们解决了问题。


更新1

还使用了 ELB 实例运行状况检查,但此 %0 实例已
身体健康。

您指的是哪个健康指标?您又如何得出实例健康的结论?

重要的是要认识到,自动扩展和 ELB 以不同的方式衡量健康实例,请参阅 alighafour 对 自动缩放不会对不健康的实例做出反应

ELB 在应用层进行检查,而自动伸缩则在应用层进行检查
机器层。

AWS 团队对链接问题 ELB-Unhealthy 的答复进一步详细介绍了这种差异采取 OOS 的实例然后自动从 ELB 中删除(这实际上解决了一个相反的问题):

自动缩放正在查看实例的运行状况 - 他们将采取一个实例
如果数据显示实例不健康,则关闭。他们会采取
当时就退出ELB,然后关闭实例。

另一方面,ELB 正在通过以下方式进行应用程序运行状况检查
读入文件或连接到端口。如果应用
未通过一定数量的此类检查,实例将继续运行,
但 ELB 不会向其发送任何新流量
。 ELB 继续
执行健康检查 - 如果应用程序实例变得健康
再次,它将开始将流量路由到它。 ELB 不会删除
来自 ELB 注册的实例
- 它只是停止发送它
交通恢复健康为止。 [强调我的]

结论

看起来上述场景可能确实适用于您的体验:ELB 停止向您的实例发送流量,因为 ELB 运行状况检查失败,而 Auto Scaling 运行状况检查没有发现实例存在问题;例如,如果 ELB 运行状况检查探测 Apache 提供的网页,但由于任何原因(例如 Apache 崩溃或其他原因)而无法响应,则可能会发生这种情况。

解决方案

您需要配置Auto Scaling 策略,使其健康决策基于 EC2 健康状态 ELB 健康状态,如下所述在为 Elastic Load Balancing 创建运行状况检查部分 维持当前的缩放级别

默认情况下,Auto Scaling 使用所有 Amazon EC2 运行状况
Auto Scaling 托管实例。同时使用弹性负载
Balancer的健康检查,设置group的HealthCheckType属性
至 ELB:

% as-update-autoscaling-group myGroup --health-check-type ELB

完成此配置后,一旦 ELB 运行状况检查失败,实例就会被视为不健康,如下所示好吧,它将相应地被替换。


初步答复

一个 Auto Scaling 组是否可以有多个触发器?

不幸的是,请参阅 AWS 团队对 如何在模板中设置多个触发器的响应

遗憾的是,Auto Scaling 服务仅允许每个 Auto 有 1 个触发器
缩放组,因此我们不支持有多个触发器
此时模板中的同一组。

另一种方法可能是通过 as-set-instance-health 实施自定义解决方案,如 维持当前的扩展级别

如果您有自己的健康检查系统,您可以将其与
自动缩放。使用 SetInstanceHealth 发送实例的运行状况
信息直接从您的系统发送到 Auto Scaling。

Update 2

Is it possible to configure Auto Scaling to remove such Instances?

Yes, see below - according to your comments you have done this correctly already.

We don't want to setup any custom cronjobs for check ups.

Given your configuration is apparently correct (implying a respective issue with Auto Scaling and/or ELB), I'm afraid that it is not possible to avoid a custom solution by actively shutting unused instances down or facilitating as-set-instance-health, as already suggested in my initial answer below - the former is suggested by tribalcrossing's answer to ELB-Unhealthy instances taken OOS then removed from ELB automatically as well, which seems to address your situation:

We run a cronjob that's fired every 5 minutes to scan all of the
servers in an ELB to check to see if it's been up for more than 5
minutes AND is unhealthy. When we find one, we shut it down. We've
hadd issues of "dead" instances stuck in ELB and throwing off
monitoring metrics that trigger autoscaling actions, and that cronjob
has solved the problem for us.


Update 1

ELB Instance health check is used as well, but this %0 Instance was
healthy.

Which health indicator are you referring to and how did you conclude the instance being healthy in turn?

It is important to realize, that Autoscaling and ELB measure healthy instances differently, see alighafour's response to Autoscaling not reacting to unhealthy instances:

ELB checks at the application layer while autoscaling checks at the
machine layer.

This difference is further detailed in the AWS team's response to the linked question ELB-Unhealthy instances taken OOS then removed from ELB automatically (which addresses an inverse issue actually):

Autoscaling is looking at instance health - they'll take an instance
down if the data shows that the instance is not healthy. They'll take
it out of the ELB at that time and then shut down the instance.

ELB, on the other hand, is doing an application health check by
reading in a file or doing a connection to a port. If the application
fails a certain number of these checks, the instance continues to run,
but the ELB won't send it any new traffic
. The ELB continues to
perform the health check - if the application instance becomes healthy
again, it'll start routing traffic to it. ELB doesn't remove the
instances from the ELB registration
- it simply stops sending it
traffic until it's healthy again. [emphasis mine]

Conclusion

It looks like the aforementioned scenario might apply to your experience indeed: ELB stopped sending traffic to your instance, because the ELB health check failed, while the Auto Scaling health check didn't see a problem with the instance as such; this might happen for example, if the ELB health check probes an Apache served webpage, which fails to respond for whatever reason (e.g. an Apache crash or else).

Solution

You need to configure the Auto Scaling Policy to base its health decision on both, the EC2 health status and the ELB health status, as outlined in section Creating a Health Check for Elastic Load Balancing within Maintaining Current Scaling Level:

By default, Auto Scaling uses the Amazon EC2 health status for all
Auto-Scaling-managed instances. To also use the Elastic Load
Balancer's health check, set the HealthCheckType property of the group
to ELB:

% as-update-autoscaling-group myGroup –-health-check-type ELB

With this configuration in place, the instance is going to be considered unhealthy as soon as the ELB health check fails as well, and it will be replaced accordingly.


Initial Answer

Is it possible to have multiple triggers for one Auto Scaling Group?

Unfortunately not, see e.g. the AWS team response to How to set Multiple Triggers in Template:

Unfortunately, the Auto Scaling service only allows 1 trigger per Auto
Scaling group and so we do not support having multiple triggers for
the same group within a template at this time.

An alternative approach could be to implement a custom solution via as-set-instance-health, as mentioned in section Custom Health Check within Maintaining Current Scaling Level :

If you have your own health check system, you can integrate it with
Auto Scaling. Use SetInstanceHealth to send the instance's health
information directly from your system to Auto Scaling.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文