Apache Flink:最佳实践
我有一些关于 apache flink 的问题。
我有参考数据存储在多个关系数据库中,我可以通过restapi获取它们。 这些数据是静态的,只需要加载一次。它意味着由所有 flink 运营商共享。我应该在这里做什么,我可以将它加载到我的 flink 作业中吗?
flink 并行性有意义吗 >卡夫卡分区。我们在这里得到什么?我假设 flink 会自动从分区传递数据并重新平衡并重新分配给更多线程进行计算。所以增益主要是在计算部分,但是采购速度无法提高,因为它严格受到kafka中有多少个分区的约束。
I have a few questions regarding apache flink.
I have reference data stored in multiple relational database, I can get them via restapi.
These data are static and only need to be loaded once. And it meant to be shared by all flink operators. What should I do here, Can I just load it within my flink job.Does it make sense to have flink parallelism > kafka partitions. What do we gain here? I am assuming flink will automatically pass data from partitions and rebalance and redistributed to more threads for computation. So the gain is mainly on computation part, but the speed for sourcing cannot be improved because it is strictly binded by how many partitions you have in kafka.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论