有没有办法通过 keyBy 广播?
我使用flink版本1.14.3
我有一个大数据(大约4Gb)想要广播到KeyedBroadcastProcessFunction,但是如果我将原始数据广播到每个节点,它将占用大量内存和低性能,所以我想要要知道,是否有某种方法可以在处理函数和广播中使用相同的 keySelector 规则,即 keyBy 广播然后让指定的键到达指定的节点?
i use flink version 1.14.3
i have a large data (about 4Gb) that want to broadcast to a KeyedBroadcastProcessFunction, but if i broadcast the raw data to every node, it's will take up a lot of memory and low performance, so i want to know, is there has some way to use the same keySeletor rule in process function and broadcast, that can keyBy broadcast then let the specified key goes to the specified node?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
广播的定义是将所有内容发送到每个下游节点。
相反,如果您有两个流,您希望将它们键分区到同一键空间中,以便您可以在该键上将它们连接起来,则可以这样做。您将使用
KeyedCoProcessFunction
而不是KeyedBroadcastProcessFunction
。看起来像这样:请参阅 Apache Flink 中的 RidesAndFares 练习训练以获得此模式的完整示例。
The very definition of broadcast is that everything is sent to every downstream node.
If instead, you have two streams that you want to key partition into the same key space, so that you can join them on that key, you can do that. Instead of a
KeyedBroadcastProcessFunction
you will use aKeyedCoProcessFunction
. That looks something like this:See the RidesAndFares exercise from the Apache Flink training for a complete example of this pattern.