当前位置：文江博客话题详情

如何从 ELB 组中正常关闭或删除 AWS 实例

发布于 12-08 12:04 字数 296 浏览 2 评论 0原文

我有一个在 Amazon 上运行的服务器实例云，使用负载均衡器来分配流量。现在我正在寻找一种好方法来优雅地缩小网络规模，而不会导致浏览器端的连接错误。

据我所知，当从负载均衡器中删除实例时，任何连接都会被粗暴地终止。

我希望有一种方法可以在实例关闭前一分钟通知我的实例，或者让负载均衡器停止向即将死亡的实例发送流量，但又不会终止与其的现有连接。

我的应用程序是基于 Node.js，在 Ubuntu 上运行。我还运行了一些特殊的软件，所以我不喜欢使用许多提供 Node.js 托管的 PAAS。

感谢您的任何提示。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝色星空2024-12-15 12:04:26

我知道这是一个老问题，但应该指出的是，亚马逊最近添加了对连接耗尽的支持，这意味着当从负载均衡器中删除实例时，该实例将完成从负载均衡器中删除实例之前的进度。不会将任何新请求路由到已删除的实例。您还可以为这些请求提供超时，这意味着任何运行时间超过超时窗口的请求都将被终止。

要启用此行为，请转到负载均衡器的实例选项卡并更改连接耗尽行为。

回复收藏 0 原文

°如果伤别离去2024-12-15 12:04:26

这个想法使用 ELB 的能力来检测不健康的节点并将其从池中删除，但它依赖于 ELB 的行为符合以下假设中的预期。这是我一直想亲自测试的东西，但还没有时间。当我这样做时，我会更新答案。

流程概述

以下逻辑可以在需要关闭节点时包装并运行。

阻止与 nodeX 的新 HTTP 连接，但继续允许现有连接
通过监视与应用程序的现有连接或允许“安全”时间量，等待现有连接耗尽。
直接使用 EC2 API 或抽象脚本在 nodeX EC2 实例上启动关闭。

根据您的应用程序，“安全”，这对于某些应用程序可能无法确定。

需要测试的假设

我们知道 ELB 从池中删除不健康的实例我希望这是优雅的，以便：

与最近关闭的端口的新连接将被优雅地重定向到池中的下一个节点
当节点被标记为“坏”时，与该节点已建立的连接不受影响。

可能的测试用例：

在ELB上触发HTTP连接（例如来自curl脚本）记录
脚本化打开或关闭其中一个节点期间的结果
HTTP 端口。您需要进行实验才能找到
允许 ELB 始终确定状态的可接受时间量
改变。
维持较长的 HTTP 会话（例如文件下载），同时阻止新的会话
HTTP 连接，长会话有望继续。

1.如何阻止 HTTP 连接

使用 NodeX 上的本地防火墙阻止新会话，但继续允许已建立的会话。

例如IP表：

iptables -A INPUT -j DROP -p tcp --syn --destination-port <web service port>

This idea uses the ELB's capability to detect an unhealthy node and remove it from the pool BUT it relies upon the ELB behaving as expected in the assumptions below. This is something I've been meaning to test for myself but haven't had the time yet. I'll update the answer when I do.

Process Overview

The following logic could be wrapped and run at the time the node needs to be shut down.

Block new HTTP connections to nodeX but continue to allow existing connections
Wait for existing connections to drain, either by monitoring existing connections to your application or by allowing a "safe" amount of time.
Initiate a shutdown on the nodeX EC2 instance using the EC2 API directly or Abstracted scripts.

"safe" according to your application, which may not be possible to determine for some applications.

Assumptions that need to be tested

We know that ELB removes unhealthy instances from it's pool I would expect this to be graceful, so that:

A new connection to a recently closed port will be gracefully redirected to the next node in the pool
When a node is marked Bad, the already established connections to that node are unaffected.

possible test cases:

Fire HTTP connections at ELB (E.g. from a curl script) logging the
results during scripted opening an closing of one of the nodes
HTTP ports. You would need to experiment to find an
acceptable amount of time that allows ELB to always determine a state
change.
Maintain a long HTTP session, (E.g. file download) while blocking new
HTTP connections, the long session should hopefully continue.

1. How to block HTTP Connections

Use a local firewall on nodeX to block new sessions but continue to allow established sessions.

For example IP tables:

iptables -A INPUT -j DROP -p tcp --syn --destination-port <web service port>

回复收藏 0 原文

携君以终年2024-12-15 12:04:26

分配来自 ELB 的流量的推荐方法是在多个可用区中拥有相同数量的实例。例如：

ELB

实例 1 (us-east-a)
实例 2 (us-east-a)
实例 3 (us-east-b)
实例 4 (us-east-b)

现在有两个感兴趣的 ELB API，前提是：允许您以编程方式（或通过控制面板）分离实例：

取消注册实例
禁用可用区域（随后禁用该区域内的实例）

ELB 开发人员指南有一节介绍了禁用可用区的影响。该部分中的注释特别有趣：

您的负载均衡器始终将流量分配给所有已启用的
可用区。如果一个可用区中的所有实例都
在可用区被禁用之前已取消注册或运行状况不佳
对于负载均衡器，发送到该可用区的所有请求
将失败，直到DisableAvailabilityZonesForLoadBalancer 调用该操作
可用区。

上述注释的有趣之处在于，它可能意味着如果您调用DisableAvailabilityZonesForLoadBalancer，ELB 可能会立即开始仅向可用区域发送请求 - 当您在禁用的可用区域中的服务器上执行维护时，可能会导致 0 停机体验。

上述“理论”需要亚马逊云工程师的详细测试或确认。

回复收藏 0 原文

煮酒2024-12-15 12:04:26

似乎这里已经有很多回复，其中一些给出了很好的建议。但我认为总的来说你的设计是有缺陷的。无论您设计的关闭程序多么完美，以确保在关闭服务器之前关闭客户端连接，您仍然容易受到攻击。

服务器可能会断电。
硬件故障导致服务器出现故障。
连接可能会因网络问题而关闭。
客户端失去互联网或 WiFi。

我可以继续列出这个清单，但我的观点是，设计时不要让系统始终正常工作。设计它来处理故障。如果您设计的系统可以处理随时断电的服务器，那么您就创建了一个非常强大的系统。这不是 ELB 的问题，而是当前系统架构的问题。

回复收藏 0 原文

卸妝后依然美2024-12-15 12:04:26

由于我的声誉较低，我无法发表评论。这是我制作的一些片段，可能对其他人非常有用。它利用 aws cli 工具来检查实例的连接何时被耗尽。

您需要一个 ec2 实例，并在 ELB 后面提供 python 服务器。

from flask import Flask
import time

app = Flask(__name__)

@app.route("/")
def index():
    return "ok\n"

@app.route("/wait/<int:secs>")
def wait(secs):
    time.sleep(secs)
    return str(secs) + "\n"

if __name__ == "__main__":
    app.run(
        host='0.0.0.0',
        debug=True)

然后从本地工作站向 ELB 运行以下脚本。

#!/bin/bash

which jq >> /dev/null || {
   echo "Get jq from http://stedolan.github.com/jq"
}

# Fill in following vars
lbname="ELBNAME"
lburl="http://ELBURL.REGION.elb.amazonaws.com/wait/30"
instanceid="i-XXXXXXX"

getState () {
    aws elb describe-instance-health \
        --load-balancer-name $lbname \
        --instance $instanceid | jq '.InstanceStates[0].State' -r
}

register () {
    aws elb register-instances-with-load-balancer \
        --load-balancer-name $lbname \
        --instance $instanceid | jq .
}

deregister () {
    aws elb deregister-instances-from-load-balancer \
        --load-balancer-name $lbname \
        --instance $instanceid | jq .
}

waitUntil () {
    echo -n "Wait until state is $1"
    while [ "$(getState)" != "$1" ]; do
        echo -n "."
        sleep 1
    done
    echo
}

# Actual Dance
# Make sure instance is registered. Check latency until node is deregistered

if [ "$(getState)" == "OutOfService" ]; then
    register >> /dev/null
fi

waitUntil "InService"

curl $lburl &
sleep 1

deregister >> /dev/null

waitUntil "OutOfService"

I can't comment cause of my low reputation. Here is some snippets I crafted that might be very useful for someone out there. It utilizes the aws cli tool to check when an instance been drained of connections.

You need an ec2-instance with provided python server behind an ELB.

from flask import Flask
import time

app = Flask(__name__)

@app.route("/")
def index():
    return "ok\n"

@app.route("/wait/<int:secs>")
def wait(secs):
    time.sleep(secs)
    return str(secs) + "\n"

if __name__ == "__main__":
    app.run(
        host='0.0.0.0',
        debug=True)

Then run following script from local workstation towards the ELB.

#!/bin/bash

which jq >> /dev/null || {
   echo "Get jq from http://stedolan.github.com/jq"
}

# Fill in following vars
lbname="ELBNAME"
lburl="http://ELBURL.REGION.elb.amazonaws.com/wait/30"
instanceid="i-XXXXXXX"

getState () {
    aws elb describe-instance-health \
        --load-balancer-name $lbname \
        --instance $instanceid | jq '.InstanceStates[0].State' -r
}

register () {
    aws elb register-instances-with-load-balancer \
        --load-balancer-name $lbname \
        --instance $instanceid | jq .
}

deregister () {
    aws elb deregister-instances-from-load-balancer \
        --load-balancer-name $lbname \
        --instance $instanceid | jq .
}

waitUntil () {
    echo -n "Wait until state is $1"
    while [ "$(getState)" != "$1" ]; do
        echo -n "."
        sleep 1
    done
    echo
}

# Actual Dance
# Make sure instance is registered. Check latency until node is deregistered

if [ "$(getState)" == "OutOfService" ]; then
    register >> /dev/null
fi

waitUntil "InService"

curl $lburl &
sleep 1

deregister >> /dev/null

waitUntil "OutOfService"

回复收藏 0 原文