使用 EMR Spot 实例运行 Spark 作业的效率如何?
我想使用 EMR 现货实例来降低我的 Redshift 和 awsglue 成本,但在阅读了它们之后,我想知道如果我正在运行 30 分钟的作业,那么它被中断的可能性有多大,这些现货实例多久被拿走一次在运行作业时,如果它们被拿走,我如何管理我的作业以再次重新运行。
我主要关注的是 Spark 工作。
I want to use EMR spot instances to cut down my Redshift and aws glue costs, but after reading about them I want to know if I am running a 30 mins jobs how likely is it to get interrupted , How often these spot instances are taken away while running a Job and if they are taken away how can I manage my job to re-run again.
Mostly my focus is on spark job.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
基于意见的,但是这里是基于意见的。
出色的阅读: https://aws.amazon.com/blogs/big-data/spark-enhancements-for-prostor---
ol---------- ristimity-on-on-amazon-emr/ 基本上是AWS,您可以使用现场实例并优雅地恢复与纱线退役机制集成。您需要在Spark应用程序中无需代码。
也就是说,如果您想使用点点实例运行,则可以等待输出,但可能需要一段时间。
AWS胶水无服务器,因此与EMR无关。红移的成本也不同。
Opinion-based, but here goes.
Excellent read: https://aws.amazon.com/blogs/big-data/spark-enhancements-for-elasticity-and-resiliency-on-amazon-emr/
Basically AWS allow you to use spot instances and recover gracefully due to integration with YARN’s decommissioning mechanism. You need code nothing in your Spark App.
That said, if you are wanting to run using Spot Instances, you can wait for the output, but it may take a while.
AWS Glue is serverless and hence has nothing to do with EMR. Redshift is also costed differently.