跨两个不同的kafka连接群集的同一消费者组(S3接收器连接器)
我正在将Kafka连接器从ECS群集迁移到在Kubernetes上运行的新集群。我通过删除Postgres源连接器并在确切的复制插槽上重新创建它们,成功地迁移了Postgres源连接器。他们在同一Kafka群集中一直写作相同的主题。旧集群中的S3连接器继续从这些群集中读取并将记录写入S3。一切都像往常一样工作。
但是现在,为了移动AWS S3接收器连接器,我首先在新集群中创建了一个非批评S3连接器,其名称与旧群集中的名称相同。我要等待几分钟,然后删除旧数据以避免丢失数据。令我惊讶的是,看起来(基于Akhq.io提供的UI),该新的S3连接器上的一个工人与现有的消费者组相连。我完全期望有重复的数据。基于
集群中的所有工人都使用相同的三个内部主题来共享 连接器配置,偏移数据和状态更新。为了这 原因是所有分布式的工人配置在同一连接中 群集必须具有匹配的config.storage.topic,offset.storage.topic, and Status.Storage.Topic属性。
因此,从这个“相同的连接群集”中,我认为拥有相同的消费者组ID仅在同一连接群集中起作用。但是从我的观察看来,您似乎可以在属于同一消费者群体的不同集群中有多个消费者?
基于此 __ commuter_offsets
使用了消费者使用,与其他隐藏“偏移”相关主题不同,它没有任何群集名称名称。
这是否意味着我可以简单地在新的kubernetes群集中创建S3接收器连接器,然后在不复制或丢失数据的情况下删除ECS群集中的连接器(只要它们具有相同的名称 - >同一消费者组)?我不确定这是人们通常使用的正确模式。
I'm migrating Kafka connectors from an ECS cluster to a new cluster running on Kubernetes. I successfully migrated the Postgres source connectors over by deleting them and recreating them on the exact replication slots. They keep writing to the same topics in the same Kafka cluster. And the S3 connector in the old cluster continues to read from those and write records into S3. Everything works as usual.
But now to move the AWS s3 sink connectors, I first created a non-critical s3 connector in the new cluster with the same name as the one in the old cluster. I was going to wait a few minutes before deleting the old one to avoid missing data. To my surprise, it looks like (based on the UI provided by akhq.io) the one worker on that new s3 connector joins with the existing same consumer group. I was fully expecting to have duplicated data. Based on the Confluent doc,
All Workers in the cluster use the same three internal topics to share
connector configurations, offset data, and status updates. For this
reason all distributed worker configurations in the same Connect
cluster must have matching config.storage.topic, offset.storage.topic,
and status.storage.topic properties.
So from this "same Connect cluster", I thought having the same consumer group id only works within the same connect cluster. But from my observation, it seems like you could have multiple consumers in different clusters belonging to the same consumer group?
Based on this article __consumer_offsets
is used by consumers, and unlike other hidden "offset" related topics, it doesn't have any cluster name designation.
Does that mean I could simply create S3 sink connectors in the new Kubernetes cluster and then delete the ones in the ECS cluster without duplicating or missing data then (as long as they have the same name -> same consumer group)? I'm not sure if this is the right pattern people usually use.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不熟悉使用Kafka Connect群集,但我知道它是一个独立于Kafka群集的连接器群。
在这种情况下,由于连接器使用相同的Kafka群集,并且您只是将它们从ECS移到K8S,因此它应该在您描述的情况下工作。消费者偏移信息和存储在kafka群集中,因此,只要连接器连接到同一kafka群集,连接器的运行时间都并不重要。无论在哪里运行,它们都应从相同的位置重新启动或表现为相同连接器的其他复制品。
I'm not familiar with using a Kafka Connect Cluster but I understand that it is a cluster of connectors that is independent of the Kafka cluster.
In that case, since the connectors are using the same Kafka cluster and you are just moving them from ECS to k8s, it should work as you describe. The consumer offsets information and the internal kafka connect offsets information is stored in the Kafka cluster, so it doesn't really matter where the connectors run as long as they connect to the same Kafka cluster. They should restart from the same position or behave as additional replicas of the same connector regardless of where ther are running.