EC2 上的 Hadoop 有什么建议吗?
在 EC2 中运行 Hadoop 时,我似乎有两个选择:
- A:使用 Hadoop 附带的 EC2 特定的 shell 脚本自己管理集群。
- B:使用Elastic MapReduce,并为方便起见支付一点额外费用。
我倾向于 B,但我希望有更多经验的人提供一些建议。我的问题是:
- 是否有任何任务可以使用其中一种方法来完成,而另一种方法则不能?
- 除了这两个我忽略的选项之外,还有其他选择吗?
- 如果我选择B,回到A有多容易?也就是说,供应商锁定有什么危险?
When running Hadoop in EC2, I seem to have two options:
- A: Manage the cluster myself, using the EC2-specific shell scripts that come with Hadoop.
- B: Use Elastic MapReduce, and pay a little extra for the convenience.
I'm leaning towards B, but I'd appreciate some advice from people with more experience. Here are my questions:
- Are there any tasks that can be done with one of these methods but not the other?
- Are there other options besides these two that I'm overlooking?
- If I choose B, how easy would it be to go back to A? That is, what's the danger of vendor lock-in?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
第三个选项:
您可以使用 apache Whirr 在 ec2 上设置 hadoop 集群(也支持机架空间)
Third option:
You can use apache whirr to set up an hadoop cluster on ec2 (rackspace is also supported)
与 Amazon Elastic MapReduce (EMR) 开发团队关系密切的人士告诉我,使用 EMR 至少还有另外两个优势:a) Amazon 正在积极对 EMR 上使用的 Hadoop 代码库应用错误修复和性能增强,以及b) Amazon 在 EMR 服务器和 S3 服务器之间采用高性能网络,而 EC2 服务器和 S3 服务器之间可能不可用。
更新:请参阅 @mat 的评论,驳斥了有关使用 EMR 的优势的传闻。
I have been told by people close to the Amazon Elastic MapReduce (EMR) development team that there are at least two other advantages to using EMR: a) Amazon is actively applying bug fixes and performance enhancements to the Hadoop code base used on EMR, and b) Amazon employs a high performance network between EMR servers and S3 servers that may not be available between EC2 servers and S3 servers.
UPDATE: See @mat's comments that refute the rumored advantages of using EMR.
免责声明:我是 Axemblr.com 的创始人,
您还可以使用商业替代品。 Axemblr Tool for Cloudera CDH3 是我们正在构建的一个工具,可以在短短几分钟内部署一个集群,满足您的所有需求(包括 Cloudera Hue、Mahout 和 Pig)。
我们还在构建 EMR 的替代方案,从 API 角度来看,它完全兼容,针对私有云。
如果您想知道为什么在 EC2 上运行 CDH 而不是 EMR 有意义,请参阅:
http://www.quora.com/What-are-the-advantages-disadvantages -运行-Clouderas-distribution-for-Hadoop-on-EC2-instances-而不是使用-Amazons-Elastic-Map-Reduce-Service
Disclaimer: I'm the founder of Axemblr.com
There are also commercial alternatives you can use. Axemblr Tool for Cloudera CDH3 is a tool we are building that can deploy a cluster in just a few minutes with all you need (including Cloudera Hue, Mahout & Pig).
We are also building an alternative to EMR that's fully compatible from an API perspective, targeted at private clouds.
If you are wondering why it makes sense to run CDH on EC2 rather than EMR see:
http://www.quora.com/What-are-the-advantages-disadvantages-running-Clouderas-distribution-for-Hadoop-on-EC2-instances-rather-than-using-Amazons-Elastic-Map-Reduce-Service