暂停/恢复在迁移过程中

发布于 2025-02-09 14:33:33 字数 608 浏览 1 评论 0原文

我正在使用apache flink来从给定的一组kafka主题传播到elasticsearch cluster中。

我面临的问题是,有时alasticsearch群集发展,我必须(1)修改映射,(2)复制数据...当我必须将flink作业指向新的别名/索引时,有很多更新使其成为旧索引。

因此,我想知道最好的方法是什么。我可以有停机时间,但是如果可能的话,我想避免这种情况。我试图使flink作业减速或暂停(kafka)输入源,直到迁移完成为止,但是我没有找到任何端点。

flink作业在应用程序中运行模式。

如果有人可以阐明如何完成此操作:暂停/通过API或类似的东西恢复工作,我将非常感谢输入。我唯一的约束是围绕停止应用程序(如停止/杀死吊舱):这是可能的,但是由于访问kubernetes簇的访问,因此太麻烦了。

I'm using Apache Flink to propagate updates from a given set of Kafka topics into an Elasticsearch cluster.

The problem I'm facing is that sometimes the Elasticsearch cluster evolves and I have to (1) modify the mappings, (2) copy over the data...and by the time I have to point the Flink jobs to the new alias/index, there are plenty of updates that made it to the old index.

So I wonder what's the best way to approach this. I can have downtime, but I would like to avoid this if possible. I was trying to make the Flink jobs to slowdown or pause the (Kafka) input sources until the migration finishes, but I didn't find any endpoint for this.

The Flink jobs run in application mode.

If anyone can shed some light on how to accomplish this: pause/resume the jobs via an API or something similar, I will really appreciate the input. The only constraint I have is around stopping the applications (as in stopping/killing pods): it's possible, but too troublesome due to access constraints to the Kubernetes clusters.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

暮色兮凉城 2025-02-16 14:33:33

我可能会考虑使用Flink Rest API使用保存点停止工作: https://nightlies.apache.org/flink/flink/flink/flink-docs-stable/docs/ops/rest_api/#jobs-jobid-stop

如果那个浮网应用是相当大,并且有很多状态,如果您不想停止,您也可以尝试停止将数据发送到输入Kafka主题(假设它可以在制作新的映射和indecs时正确编写所需的ES群集更改,而无需更改Flink作业)。它有点开销,但是您可能会为生产者和夹杂源具有不同的主题,并且从一个主题(生产者生产到)到另一个主题(flink from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from from)。当您想停止写入ES时,只需使用REST API停止该工作。要不写新的弗林克工作,您可以使用Mirrormaker或类似的工作,但是要停止它,您可能必须杀死其豆荚。

或者另一个选项是构建Elasticsearch索引,以便它们可以支持您的群集演变,而不必停止Flink应用程序。很难知道您需要做些什么才能准确改变,但是通过写入别名并使用Write索引标志,您可能可以实现想要的目标。我过去已经做到了,但是,如果您的映射发生了很大的变化,这是很难做到的,

I'd probably look into stopping the job with a save point using Flink REST API: https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/rest_api/#jobs-jobid-stop

If that Flink app is pretty big and has lots of state, you can also try to just stop sending data to the input Kafka topic if you don't want to stop it (assuming that it can properly write with the new mappings and indeces after you've made the required ES cluster changes without any change in the Flink Job). It is a bit of overhead, but you could have different topics for your producers and Flink sources, and have another simple Flink job mirror data from one topic (where producers produce to) to the other (where Flink consumes from). When you want to stop writing to ES, just stop that job using the REST API. To not write a new Flink job you could use MirrorMaker or similar, but to stop it you may have to kill its pod.

Or another option is architecting the Elasticsearch indexes so they can support your cluster evolution without having to stop the Flink app. It is hard to know what do you'd need to exactly change, but by writing into aliases and playing with the write index flag you may be able to achieve what you want. I've done this in the past, but it is true that if your mappings change a lot it may be hard to do,

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文