驱逐KAFKA钥匙值状态商店的数据

发布于 2025-02-09 17:55:30 字数 745 浏览 4 评论 0原文

我正在使用kafkastreams执行聚合,这实际上将我的所有汇总记录都保留到密钥价值状态存储中,而我正在生成的特定键,以唯一地识别该汇总。 我没有使用任何Kafka窗口进行此聚合。 因此,从本质上讲,这种方法将继续聆听输入数据,从而继续汇总。 现在,基于密钥,我需要应用不同的逻辑以从StateStore搜索并下游将数据移动。

Kafka的KeyValuestateStore为我提供了4种方法,即prefixScan,range and get。 其中根据我生成的密钥,我发现我只能使用全部并获得。

  1. 如果我使用Get,Kafka将在完整的StateStore内部进行内部扫描,并给我特定键的数据,因此,如果我有键列表,它将在列表中的键数量上迭代完整的stateStore。
  2. 如果我设法为搜索键创建正则拨号,则可以在Java逻辑中使用所有()并在StateStore的所有数据上迭代,并搜索我的正则态度并下游移动。 但同样,这将是整个StateStore的手动迭代。

PS在任何时间点,我的StateStore将至少包含十亿个记录。

有人可以建议使用键搜索到Kafka KeyValue StateStore检索数据的最佳(性能)可能的方法。 或任何替代方法的替代方法。

更新: 驱逐从StateStore驱逐数据后,我没有删除它,而是希望用ex驱逐的标志更新它。 只有通过对StateStore的读/写入访问才能是可能的,该访问仅通过管道可用,因为交互式查询只能访问对StateStore的读取访问。这就是我对kafka限制的知识。如果否则,请提供帮助。

I am performing aggregation using kafkaStreams which actually keeps all my aggregated records into a keyValue state store against a specific key which i am generating to uniquely identify that aggregation.
I am not using any kafka window for this aggregation.
so essentially this method will keep of listening to input data and thus keep on aggregating.
Now based on the key, i need to apply different logic to search from the stateStore and move my data downstream.

Kafka's KeyValueStateStore gives me 4 methods viz, all, prefixScan, range and get.
Of which based on the key i am generating, i find i can only use all and get.

  1. if i use get, kafka will internally scan over the complete statestore and give me data for the specific key, so if i have a list of keys, it will iterate over complete statestore for the number of keys in the list.
  2. If I manage to create a regex for my search key, i can use all() and iterate over all data in statestore in a java logic and search for my regex and move downstream.
    but again it will be a manual iteration over the complete statestore.

P.S. at any point in time my statestore will contain at least a billion records.

Can someone please suggest the best (performance wise) possible way to retrieve data using a key search into kafka keyValue stateStore.
or any alternative to the approach is appreciated.

Update:
After eviction of data from statestore, i am not deleting it but wish to update it with a flag stating evicted or not.
Which can only be possible by having a read/write access to the statestore which is again only available through pipeline as interactive queries give only a read access to the statestore. This is what my knowledge of Kafka limits to. Please help if otherwise.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一身仙ぐ女味 2025-02-16 17:55:30

我认为您应该使用Apache Spark流

  1. 通过Spark流从Kafka使用此读取数据,
  2. 在Spark中执行聚合/转换
  3. 将消毒数据推入所需的下游主题,

我不确定是否可以在Kafka中完成此操作

I think you should use Apache Spark streaming to use this

  1. Read data from Kafka through spark streaming
  2. Perform aggregations/transformations in spark
  3. Push the sanitized data into desired downstream topics

I am not sure if this can be done in Kafka

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文