直接从 EMR Map/Reduce 任务访问 S3
我试图弄清楚如何直接从 EMR 映射任务写入 s3 存储桶。我想运行一个 python 流作业,它会从互联网获取一些数据并将其保存到 s3 - 而不将其返回以减少作业。有人能帮我吗?
I am trying to figure out how to write directly from a EMR map task to the s3 bucket. I would like to run a python streaming job which would get some data from the internet and save it to s3 - without returning it back to reduce job. Can anyone help me with that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为什么不将 MR 作业的输出设置为 s3 目录并告诉它没有减速器:
这应该可以完成您想要的操作。
然后你的脚本可以做这样的事情(对不起,ruby):
Why don't you just set the output of your MR job to be a s3 directory and tell it that there is no reducer:
That should do what you want it to.
Then your script can do something like this (sorry, ruby):