如何管理分片微服务本地存储?
我们假设有一个消费者组(从卡夫卡的角度来看)。消费者组由 20 个 Service 实例的副本组成。所有工作都根据某些属性 (UUID) 在这 20 个实例之间进行平衡。每个实例管理自己的存储/状态/读取,而这些存储/状态/读取又仅包含属于该分片的数据。因此有 20 个独立的存储,每个副本一个。但是如果扩大或缩小这些服务会发生什么?其余 10 个服务如何设法获取以前属于其他实例的所有数据?我假设每个服务可能会发出所谓的“状态事件”(流表二元性?),而其他实例可能会负责管理基于此类流的整体数据的新部分。但这还有很多工作要做。这样的流可能包含数百万个项目(即使是压缩的)。必须有一种更有效的方法来实现这一目标。如果我们扩大规模怎么办?组长现在必须以某种方式通知各自的实例删除其部分数据。我读过一些关于此事的书籍/帖子,但我找不到有关如何管理此问题的任何具体信息。
Let's assume there is a single consumer group (from kafka perspective). Consumer group consists of 20 replicas of Service instances. All work is balanced among those 20 instances based on some property (UUID). Each instance manages its own storage/state/read which in turn contains only data belonging to that shard only. So there are 20 separate storages, one for each replica. But what happens in case of scaling up or down those Services? How would the remaining 10 Services manage to get all that data previously belonging to other instances? I assume that each service may emit so-called "state event" (stream-table duality?) and other instance may get the responsibility of managing a new part of overall data based on such stream. But this is still a lot of work to do. Such a stream may consist of millions of items (even if compacted). There must be a more efficient way to achieve this. And what if we scale up? Group leader must now inform somehow respective instance to drop part of its data. I have read some books/posts about that matter but I couldn't find any concrete information on how this is managed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不清楚为什么标记为apache-kafka,因为分片不是 Kafka 术语。在 Kafka Streams 中,它可以使用 KTable API 处理跨单独实例的状态存储分配。当实例扩展和缩小时,在重建状态时数据将暂时无法访问。不同实例可以通过“交互式查询”相互查询。
Unclear why this is tagged apache-kafka, since sharding isn't a Kafka term. In Kafka Streams, it can handle distribution of state stores across separate instances using the
KTable
API. When instances are scaled up and down, the data becomes temporarily unaccessible while the state is rebuilt. Different instances can query each other with "Interactive Queries".