在单个数据流管线中彼此独立的失败/成功运行作业
我正在尝试使用单个管道将 Avro 格式的数据从 GCS 加载到 Big Query。例如,我尝试加载 10 个表,这意味着单个管道中有 10 个并行作业。 现在,如果第三个作业失败,则所有后续作业都会失败。如何使其他作业独立于其中一项作业的失败/成功而运行?
I am trying to load data in Avro format from GCS to Big Query, using a single pipeline. There are 10 tables for instance that I am trying to load, which means 10 parallel jobs in a single pipeline.
Now if the 3rd job fails, all the subsequent jobs fail. How can I make the other jobs run independent of the failure/success of one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果不实现自定义逻辑,则不能在单个数据流管线中隔离不同的步骤(例如,Custom noreflow noreferrer“>” dofn/pardo 实施)。 Some I/O connectors such as BigQuery offer a way to send failed requests to a
You cannot isolate different steps within a single Dataflow pipeline without implementing custom logic (for example, custom DoFn/ParDo implementations). Some I/O connectors such as BigQuery offer a way to send failed requests to a dead-letter queue in some write modes but this might not give what you want. If you want full isolation you should run separate jobs and combine them into a workflow using a orchestration framework such as Apache Airflow.