驱逐KAFKA钥匙值状态商店的数据
我正在使用kafkastreams执行聚合,这实际上将我的所有汇总记录都保留到密钥价值状态存储中,而我正在生成的特定键,以唯一地识别该汇总。 我没有使用任何Kafka窗口进行此聚合。 因此,从本质上讲,这种方法将继续聆听输入数据,从而继续汇总。 现在,基于密钥,我需要应用不同的逻辑以从StateStore搜索并下游将数据移动。
Kafka的KeyValuestateStore为我提供了4种方法,即prefixScan,range and get。 其中根据我生成的密钥,我发现我只能使用全部并获得。
- 如果我使用Get,Kafka将在完整的StateStore内部进行内部扫描,并给我特定键的数据,因此,如果我有键列表,它将在列表中的键数量上迭代完整的stateStore。
- 如果我设法为搜索键创建正则拨号,则可以在Java逻辑中使用所有()并在StateStore的所有数据上迭代,并搜索我的正则态度并下游移动。 但同样,这将是整个StateStore的手动迭代。
PS在任何时间点,我的StateStore将至少包含十亿个记录。
有人可以建议使用键搜索到Kafka KeyValue StateStore检索数据的最佳(性能)可能的方法。 或任何替代方法的替代方法。
更新: 驱逐从StateStore驱逐数据后,我没有删除它,而是希望用ex驱逐的标志更新它。 只有通过对StateStore的读/写入访问才能是可能的,该访问仅通过管道可用,因为交互式查询只能访问对StateStore的读取访问。这就是我对kafka限制的知识。如果否则,请提供帮助。
I am performing aggregation using kafkaStreams which actually keeps all my aggregated records into a keyValue state store against a specific key which i am generating to uniquely identify that aggregation.
I am not using any kafka window for this aggregation.
so essentially this method will keep of listening to input data and thus keep on aggregating.
Now based on the key, i need to apply different logic to search from the stateStore and move my data downstream.
Kafka's KeyValueStateStore gives me 4 methods viz, all, prefixScan, range and get.
Of which based on the key i am generating, i find i can only use all and get.
- if i use get, kafka will internally scan over the complete statestore and give me data for the specific key, so if i have a list of keys, it will iterate over complete statestore for the number of keys in the list.
- If I manage to create a regex for my search key, i can use all() and iterate over all data in statestore in a java logic and search for my regex and move downstream.
but again it will be a manual iteration over the complete statestore.
P.S. at any point in time my statestore will contain at least a billion records.
Can someone please suggest the best (performance wise) possible way to retrieve data using a key search into kafka keyValue stateStore.
or any alternative to the approach is appreciated.
Update:
After eviction of data from statestore, i am not deleting it but wish to update it with a flag stating evicted or not.
Which can only be possible by having a read/write access to the statestore which is again only available through pipeline as interactive queries give only a read access to the statestore. This is what my knowledge of Kafka limits to. Please help if otherwise.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您应该使用Apache Spark流
我不确定是否可以在Kafka中完成此操作
I think you should use Apache Spark streaming to use this
I am not sure if this can be done in Kafka