暂停/恢复在迁移过程中
我正在使用apache flink
来从给定的一组kafka
主题传播到elasticsearch
cluster中。
我面临的问题是,有时alasticsearch
群集发展,我必须(1)修改映射,(2)复制数据...当我必须将flink
作业指向新的别名/索引时,有很多更新使其成为旧索引。
因此,我想知道最好的方法是什么。我可以有停机时间,但是如果可能的话,我想避免这种情况。我试图使flink
作业减速或暂停(kafka
)输入源,直到迁移完成为止,但是我没有找到任何端点。
flink
作业在应用程序中运行模式。
如果有人可以阐明如何完成此操作:暂停/通过API或类似的东西恢复工作,我将非常感谢输入。我唯一的约束是围绕停止应用程序(如停止/杀死吊舱):这是可能的,但是由于访问kubernetes
簇的访问,因此太麻烦了。
I'm using Apache Flink
to propagate updates from a given set of Kafka
topics into an Elasticsearch
cluster.
The problem I'm facing is that sometimes the Elasticsearch
cluster evolves and I have to (1) modify the mappings, (2) copy over the data...and by the time I have to point the Flink
jobs to the new alias/index, there are plenty of updates that made it to the old index.
So I wonder what's the best way to approach this. I can have downtime, but I would like to avoid this if possible. I was trying to make the Flink
jobs to slowdown or pause the (Kafka
) input sources until the migration finishes, but I didn't find any endpoint for this.
The
Flink
jobs run in application mode.
If anyone can shed some light on how to accomplish this: pause/resume the jobs via an API or something similar, I will really appreciate the input. The only constraint I have is around stopping the applications (as in stopping/killing pods): it's possible, but too troublesome due to access constraints to the Kubernetes
clusters.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可能会考虑使用Flink Rest API使用保存点停止工作: https://nightlies.apache.org/flink/flink/flink/flink-docs-stable/docs/ops/rest_api/#jobs-jobid-stop
如果那个浮网应用是相当大,并且有很多状态,如果您不想停止,您也可以尝试停止将数据发送到输入Kafka主题(假设它可以在制作新的映射和indecs时正确编写所需的ES群集更改,而无需更改Flink作业)。它有点开销,但是您可能会为生产者和夹杂源具有不同的主题,并且从一个主题(生产者生产到)到另一个主题(flink from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from)。当您想停止写入ES时,只需使用REST API停止该工作。要不写新的弗林克工作,您可以使用Mirrormaker或类似的工作,但是要停止它,您可能必须杀死其豆荚。
或者另一个选项是构建Elasticsearch索引,以便它们可以支持您的群集演变,而不必停止Flink应用程序。很难知道您需要做些什么才能准确改变,但是通过写入别名并使用Write索引标志,您可能可以实现想要的目标。我过去已经做到了,但是,如果您的映射发生了很大的变化,这是很难做到的,
I'd probably look into stopping the job with a save point using Flink REST API: https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/rest_api/#jobs-jobid-stop
If that Flink app is pretty big and has lots of state, you can also try to just stop sending data to the input Kafka topic if you don't want to stop it (assuming that it can properly write with the new mappings and indeces after you've made the required ES cluster changes without any change in the Flink Job). It is a bit of overhead, but you could have different topics for your producers and Flink sources, and have another simple Flink job mirror data from one topic (where producers produce to) to the other (where Flink consumes from). When you want to stop writing to ES, just stop that job using the REST API. To not write a new Flink job you could use MirrorMaker or similar, but to stop it you may have to kill its pod.
Or another option is architecting the Elasticsearch indexes so they can support your cluster evolution without having to stop the Flink app. It is hard to know what do you'd need to exactly change, but by writing into aliases and playing with the write index flag you may be able to achieve what you want. I've done this in the past, but it is true that if your mappings change a lot it may be hard to do,