特使 - 另一个控制平面实例的故障转移

发布于 2025-02-09 03:29:53 字数 544 浏览 1 评论 0原文

在我们的环境中,Envoy通过GRPC从控制平面消耗动态配置。控制平面发现被配置为严格_dns:

- name: cplane
    connect_timeout: 5s
    type: STRICT_DNS
    load_assignment:
      cluster_name: cplane
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: control-plane-fqdn
                port_value: 1234

Control-control-plane-fqdn DNS记录下,我们有多个实例,并且Envoy连接到其中任何一个。问题是 - 特使使用的故障转移机制是什么?从我的观察结果来看,故障转移到另一个实例(关闭特使连接到的实例)需要5到50秒。这种传播的原因是什么,可以使故障转移时间更确定性?

In our setting, Envoy consumes dynamic configuration from the control plane via GRPC. The control plane discovery is configured as STRICT_DNS:

- name: cplane
    connect_timeout: 5s
    type: STRICT_DNS
    load_assignment:
      cluster_name: cplane
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: control-plane-fqdn
                port_value: 1234

Under the control-plane-fqdn DNS record we have multiple instances and Envoy connects to any one of them. The question is - what is the failover mechanism that Envoy uses? From my observations, failover to another instance (upon shutting down the one to which Envoy is connected) takes from 5 to 50 seconds. What is the reason for this spread and is it possible to make the failover time more deterministic?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

为你鎻心 2025-02-16 03:29:53

最终找到了答案。这是此处定义的指数退缩机制, 1 < /a>。参数重新启动MaxDelaym不可配置,并且将硬编码为30秒。此外,当DNS有两个IP,其中一个变得无法到达时,Envoy试图以圆形旋转方式重新连接每个IP,并在尝试之间呈指数退缩。因此,如果辅助IP还没有准备好立即接受连接,则指数退回可以再增加30-40秒,直到重新连接为止。

Eventually found the answer. It is the exponential backoff mechanism defined here 1. The parameter RetryMaxDelayMs is not configurable and is hard coded to 30 sec. Moreover, when the DNS has two IPs and one of them becomes unreachable, Envoy tries to reconnect to each IP in a round-robin manner with an exponential backoff between attempts. So, if the secondary IP is not ready to accept connection immediately then the exponential backoff may add another 30 - 40 sec until it reconnects.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文