最近,亚马逊推出 aws aws步骤步骤功能, :有一些步骤可以创建EMR群集,运行一些lambda功能,提交Spark作业(主要是使用Spark-Submit的Scala作业),最后终止群集。所有这些步骤均为同步类型( arn:aws:states ::: Elasticmapreduce:AddStep.sync
),
有文档和github样本描述了从空气流量等编排框架中提交的作业,但没有什么东西描述如何使用EMR无服务器使用AWS步骤功能。在这方面的任何帮助都将受到赞赏。
主要是我对重新利用类型的任务步骤 arn:aws:ates :::弹性mapreduce:addStep.sync
采用诸如 clusterId
的参数,但如果是EMR Serverless,则没有这样的ID。
总之,等于对于EMR无服务器?
Recently Amazon launched EMR Serverless and I want to repurpose my exiting data pipeline orchestration that uses AWS Step Functions: There are steps that create EMR cluster, run some lambda functions, submit Spark Jobs (mostly Scala jobs using spark-submit) and finally terminate the cluster. All these steps are of sync type (arn:aws:states:::elasticmapreduce:addStep.sync
)
There are documentation and github samples that describe submitting jobs from orchestration framework such as AirFlow but there is nothing that describes how to use AWS Step Function with EMR Serverless. Any help in this regard is appreciated.
Primarily I am interested in repurposing task step function of type arn:aws:states:::elasticmapreduce:addStep.sync
that takes parameters such as ClusterId
but in case of EMR Serverless there is no such id.
In summary is there equivalent of Call Amazon EMR with Step Functions for EMR Serverless?
发布评论
评论(1)
当前,EMR无服务器无直接集成与步进功能。但是,一个可能的解决方案是在顶部添加lambda层,并使用SDK创建EMR无服务器应用程序并提交作业。但是,您将需要一个额外的lambda来实施一个调查器,以跟踪工作的成功(在相互依存的工作中),因为EMR作业很有可能会超越Lambda的15分钟运行时限制。
晚期附加组件:2023年10月发布了台阶函数与EMR-Serverless的本机集成。
Currently there is no direct integration of EMR Serverless with Step Functions. However a possible solution is adding a Lambda Layer on top and use the SDK to create emr serverless applications and submit jobs. However you would need an additional lambda to implement a poller that tracks the success of the jobs (in case of interdependent jobs) as it is highly likely that the emr job will outrun the 15 min runtime limitation of the lambda.
Late Add-On : Native integration of step functions with emr-serverless was released in Oct 2023.
https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-jobs-with-aws-step-functions/