取消确认 apache beam 管道中的一些发布/订阅消息

发布于 2025-01-20 11:23:06 字数 99 浏览 7 评论 0原文

目前,我们有一个用例,我们希望在满足某些条件后稍后处理一些消息。 是否可以取消确认 apache beam 管道中的某些发布/订阅消息,这些消息将在可见性超时后可用,我们可以稍后处理?

Currently we have a use case where we want to process some messages at later point of time, after some conditions met.
Is it possible to unacknowledge some pub/sub messages in apache beam pipeline which will be later available after visibility time out which we can process later?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

听风吹 2025-01-27 11:23:06

您无法使用 Apache Beam 解开消息。当消息在管道中正确摄取时,它们会被自动确认。

您可以将它们保留在管道中并重新处理它们,直到满足条件。但您可能会出现拥塞,或者过度使用数据流资源。最好先清理消息,例如在云功能上,当消息无效时取消确认消息,并在目标 PubSub 主题中发布有效消息。

You can't unack the message with Apache beam. When the message are correctly ingested in the pipeline, they are acked automatically.

You can keep them in the pipeline and reprocess them until the conditions are met. But you could have a congestion, or an overusage of Dataflow resources for nothing. It could be better to clean the message before, on a Cloud Functions for instance, that unack the message when they aren't valid, and publish in a target PubSub topic the valid messages.

晨敛清荷 2025-01-27 11:23:06

作为 @guillaume 建议的替代方案,您还可以将“稍后处理”消息(以原始格式)存储在 BigQuery 或 Cloud Bigtable 等存储介质中。所有消息都将由管道确认,然后可以在管道内部完成隔离,其中“有效”消息照常处理,而“无效”消息则保留在存储中以供将来处理。

一旦满足处理条件,就可以从存储介质中检索“无效”消息并进行处理,然后可以将其从存储中删除。如果“无效”消息将在消息保留期(7 天)之后得到处理,这可能是一个可行的解决方案。

上述工作流程的灵感来自于Google Cloud 博客的此部分。我认为“无效”消息是“坏”数据。

As an alternative to @guillaume's suggestion, you can also store the "to-be-processed-later" messages (in raw format) in storage mediums such as BigQuery or Cloud Bigtable. All the messages will be acked by the pipeline and then the segregation can be done inside the pipeline where the "valid" messages are processed as usual while the "invalid" messages are preserved in storage for future processing.

Once the processing conditions are satisfied, the "invalid" messages can be retrieved from the storage medium and processed after which they can be deleted from storage. This could be a viable solution if the "invalid" messages will be processed after the message retention period which is 7 days.

The above workflow is inspired by this section of the Google Cloud blog. I considered the "invalid" messages to be "bad" data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文