Kafka 集群建模
我有一个 API 端点,它接受具有特定用户 ID 和一些其他数据的事件。我希望将这些事件广播到一些外部位置,并且我想探索使用 Kafka 作为解决方案。
我有以下要求:
- 具有相同
UserID
的事件应按顺序传递到外部位置。 - 事件应该被持久化。
- 如果单个外部位置出现故障,则不应延迟向其他位置的交付。
最初,根据我所做的一些阅读,感觉我想要拥有 N 个消费者,其中 N 是我想要广播到的外部位置的数量。这应该满足要求(3)。我可能还需要一个生产者,即我的 API,它将事件推送到我的 Kafka 集群。要求 (2) 应该自动出现在 Kafka 中。
我对于如何对内部 Kafka 集群方面进行建模更加困惑。同样,从我的阅读来看,拥有数百万个主题听起来是一种不好的做法,因此为每个 userID
分配一个主题并不是一种选择。我读到的另一种选择是为每个 userID
分配一个分区(比方说 M
分区)。如果我理解正确的话,这将允许要求(1)立即发生。但这也意味着我有 M
经纪人,对吗?这听起来也很不合理。
满足所有要求的最佳方式是什么?首先,我计划使用本地 Kafka 集群来托管它。
I have an API endpoint that accepts events with a specific user ID and some other data. I want those events broadcasted to some external locations and I wanted to explore using Kafka as a solution for that.
I have the following requirements:
- Events with the same
UserID
should be delivered in order to the external locations. - Events should be persisted.
- If a single external location is failing, that shouldn't delay delivery to other locations.
Initially, from some reading I did, it felt like I want to have N
consumers where N
is the number of external locations I want to broadcast to. That should fulfill requirement (3). I also probably want one producer, my API, that will push events to my Kafka cluster. Requirement (2) should come in automatically with Kafka.
I was more confused regarding how to model the internal Kafka cluster side of things. Again, from the reading I did, it sounds like it's a bad practice to have millions of topics, so having a single topic for each userID
is not an option. The other option I read about is having one partition for each userID
(let's say M
partitions). That would allow requirement (1) to happen out of the box, if I understand correctly. But that would also mean I have M
brokers, is that correct? That also sounds unreasonable.
What would be the best way to fulfill all requirements? As a start, I plan on hosting this with a local Kafka cluster.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是对的,每个用户一个主题并不理想。
分区计数不依赖于代理计数,因此这是一个更好的设计。
这是标准的消费者群体行为,而不是主题/分区设计。
You are correct that one topic per user is not ideal.
Partition count is not dependent upon broker count, so this is a better design.
This is standard consumer-group behavior, not topic/partition design.