GCP 的 kubernetes pod 上的 Pub/Sub 拉取请求数量急剧减少

发布于 2025-01-14 01:19:09 字数 1036 浏览 2 评论 0原文

我的积压 gcp pub/sub 订阅中有 5M~ 消息（总共 7GB~），并且希望提取尽可能多的消息。我正在使用具有以下设置的同步拉取，并等待 3 分钟来堆积消息并发送到另一个数据库。

    defaultSettings := &pubsub.ReceiveSettings{
        MaxExtension:           10 * time.Minute,
        MaxOutstandingMessages: 100000,
        MaxOutstandingBytes:    128e6, // 128 MB
        NumGoroutines:          1,
        Synchronous:            true,
    }

问题是，如果我的 kubernetes 集群上有大约 5 个 pod，那么 pod 几乎每轮（3 分钟时间段）都能够拉近 90k~ 消息。但是，当我在第一轮或第二轮中将 pod 数量增加到 20 时每个 pod 都能够检索 90k~ 消息，但是一段时间后，拉取请求计数会急剧下降，并且每个 pod 在每轮中都会收到 1k-5k~ 消息。我研究了 go 库同步拉取机制，并知道如果没有成功确认消息，您将无法请求新消息，因此拉取请求计数可能会下降以防止超过 MaxOutstandingMessages ，但我正在将我的消息缩小到零pod 启动新的 pod，而我的订阅中仍然有数百万条未确认的消息，并且它们在 3 分钟内收到的消息数量仍然非常少，5 或 20 个 pod 并不重要。大约 20-30 分钟后，他们再次收到 90k~ 条消息，然后在一段时间后再次下降到非常低的水平（从指标页面检查）。另一个有趣的事情是，虽然我的新 Pod 收到的消息数量非常少，但连接到同一订阅的本地计算机在每一轮中都会收到 90k~ 消息。

我读过 pubsub 的配额和限制页面，带宽配额非常高（大区域为每分钟 240,000,000 kB（4 GB/s））。我尝试了很多事情，但无法理解为什么拉取请求计数会在我启动新的 Pod 时大幅下降。 gcp 或 pub/sub 端的 kubernetes 集群节点是否存在某些连接或带宽限制？接收大量消息对于我的任务至关重要。

原文

I have 5M~ messages (total 7GB~) on my backlog gcp pub/sub subscription and want to pull as many as possible of them. I am using synchronous pull with settings below and waiting for 3 minutes to pile up messages and sent to another db.

    defaultSettings := &pubsub.ReceiveSettings{
        MaxExtension:           10 * time.Minute,
        MaxOutstandingMessages: 100000,
        MaxOutstandingBytes:    128e6, // 128 MB
        NumGoroutines:          1,
        Synchronous:            true,
    }

Problem is that if I have around 5 pods on my kubernetes cluster pods are able to pull nearly 90k~ messages almost in each round (3 minutes period).However, when I increase the number of pods to 20 in the first or the second round each pods able to retrieve 90k~ messages however after a while somehow pull request count drastically drops and each pods receives 1k-5k~ messages in each round. I have investigated the go library sync pull mechanism and know that without acking successfully messages you are not able to request for new ones so pull request count may drop to prevent exceed MaxOutstandingMessages but I am scaling down to zero my pods to start fresh pods while there are still millions of unacked messages in my subscription and they still gets very low number of messages in 3 minutes with 5 or 20 pods does not matter. After around 20-30 minutes they receives again 90k~ messages each and then again drops to very low levels after a while (checking from metrics page). Another interesting thing is that while my fresh pods receives very low number of messages, my local computer connected to same subscription gets 90k~ messages in each round.

I have read the quotas and limits page of pubsub, bandwith quotas are extremely high (240,000,000 kB per minute (4 GB/s) in large regions) . I tried a lot of things but couldn't understand why pull request counts drops massively in case I am starting fresh pods. Is there some connection or bandwith limitation for kubernetes cluster nodes on gcp or on pub/sub side? Receiving messages in high volume is critical for my task.

分享到QQ

分享到微博