Hadoop全分布式模式
我是 Hadoop 的新手。我已经成功开发了一个简单的 Map/Reduce 应用程序,该应用程序在“伪分布式模式”下运行良好。我想在“完全分布式模式”下测试它。对此我有几个问题;
- 处理 1-10GB 的文件大小需要多少台机器(节点)(最少和推荐)?
- 硬件要求是什么(主要是我想知道核心数量、内存空间和磁盘空间)?
I am a newbie to Hadoop. I have managed to develop a simple Map/Reduce application that works fine in 'pseudo distributed mode'.I want to test that in 'fully distributed mode'. I have few questions regarding that;
- How many machines(nodes) do I need (minimum & recommended) for processing a file size of 1-10GB?
- what are the hardware requirements(mainly, I want to know the # of cores, Memory space and disk space)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会查看 Cloudera 的硬件建议: http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
该页面的片段
适用于不同工作负载的各种硬件配置,包括我们最初的“基础”推荐:
(1U/机器): 两个四核 CPU,8GB
内存和 4 个磁盘驱动器(1TB 或
2TB)。注意CPU密集型工作
比如自然语言处理
涉及将大型模型加载到
RAM在处理数据之前应该
配置 2GB RAM/核心
而不是 1GB RAM/核心。
I'd check out Cloudera's hardware recommendations: http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
A snippet from that page
Various hardware configurations for different workloads, including our original “base” recommendation:
(1U/machine): Two quad core CPUs, 8GB
memory, and 4 disk drives (1TB or
2TB). Note that CPU-intensive work
such as natural language processing
involves loading large models into
RAM before processing data and should
be configured with 2GB RAM/core
instead of 1GB RAM/core.