可以将Apache Beam管道用于批量编排吗?
我是Apache Beam环境中的新手。 试图适合Apache Beam管道进行批处理编排。
我对批处理的定义如下如下
==>一组工作,
job ==>可以有一个或多个子工作。
工作/子工作之间可能存在依赖关系。
可以使用我的自定义批次映射Apache Beam管道吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Apache Beam是统一的,用于开发可以在”上运行的批处理和流管道。数据流。您可以使用DataFlow创建和部署管道。梁管道是便携式的,因此您可以根据需要使用任何可用的跑步者。
cloud Composer 可用于根据您的要求进行批处理编排。 Cloud Composer建立在Apache气流上。 Apache Beam和Apache气流都可以一起使用,因为Apache气流可用于触发梁作业。由于您运行自定义作业,因此可以配置横梁和气流以进行批处理编排。
气流旨在执行编排和管道依赖性管理,而光束用于构建执行数据处理系统的数据管道。
Apache Beam is unified for developing both batch and stream pipelines which can be run on Dataflow. You can create and deploy your pipeline using Dataflow. Beam Pipelines are portable so that you can use any of the runners available according to your requirement.
Cloud Composer can be used for batch orchestration as per your requirement. Cloud Composer is built on Apache Airflow. Both Apache Beam and Apache Airflow can be used together since Apache Airflow can be used to trigger the Beam jobs. Since you have custom jobs running, you can configure the beam and airflow for batch orchestration.
Airflow is meant to perform orchestration and also pipeline dependency management while Beam is used to build data pipelines which are executed data processing systems.
我相信作曲家可能更适合您想要制作的东西。从那里,您可以使用气流操作员从环境启动数据流动作业(例如,如果您使用Python,可以使用)。
I believe Composer might be more suited for what you're trying to make. From there, you can launch Dataflow jobs from your environment using Airflow operators (for example, in case you're using Python, you can use the DataflowCreatePythonJobOperator).