闲置的hadoop master - 如何让它做一些工作?
我启动了一个由两个节点组成的小型集群,并注意到主节点完全空闲,而从节点则完成所有工作。我想知道有什么方法可以让master运行一些任务。我知道对于较大的集群来说,可能需要有一个专用的主服务器,但在 2 节点集群上,这似乎有点过分了。
感谢您提供任何提示,
Vaclav
更多详细信息:
这两个盒子各有 2 个 CPU。该集群已在 Amazon Elastic MapReduce 上设置,但我正在从命令行运行 hadoop。
我刚刚尝试过的集群有:
Hadoop 0.18
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
hadoop jar /home/hadoop/contrib/streaming/hadoop-0.18-streaming.jar \
-jobconf mapred.job.name=map_data \
-file /path/map.pl \
-mapper "map.pl x aaa" \
-reducer NONE \
-input /data/part-* \
-output /data/temp/mapped-data \
-jobconf mapred.output.compress=true
其中输入由 18 个文件组成。
I have launched a small cluster of two nodes and noticed that the master stays completely idle while the slave does all the work. I was wondering what is the way to let master run some of the tasks. I understand that for a larger cluster having a dedicated master may be necessary but on a 2-node cluster it seems an overkill.
Thanks for any tips,
Vaclav
Some more details:
The two boxes have 2 CPUs each. The cluster has been set up on Amazon Elastic MapReduce but I am running hadoop from commandline.
The cluster I just tried it on has:
Hadoop 0.18
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
hadoop jar /home/hadoop/contrib/streaming/hadoop-0.18-streaming.jar \
-jobconf mapred.job.name=map_data \
-file /path/map.pl \
-mapper "map.pl x aaa" \
-reducer NONE \
-input /data/part-* \
-output /data/temp/mapped-data \
-jobconf mapred.output.compress=true
where the input consists of 18 files.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
实际上,hadoop master 并不是负责工作(您运行的任务)的人。
您可以在master运行的同一台机器上启动datanode和tasktracker。
Actually hadoop master is not the one doing work (tasks you run).
You can start datanode and tasktracker on the same machine the master runs.
hadoop 用户列表中的 Steve Loughran 建议在 master 上启动任务跟踪器就可以解决问题。
$ bin/hadoop-daemon.sh start tasktracker
似乎可以工作。您可能需要调整此任务跟踪器的插槽数量。
Steve Loughran on the hadoop-users list suggested that starting a tasktracker on the master would do the trick.
$ bin/hadoop-daemon.sh start tasktracker
Seems to work. You may want to adjust number of slots for this tasktracker.
Hadoop 0.18 可能有所不同,但您可以尝试将 master 的 IP 地址添加到 conf/slaves 文件中 - 然后重新启动集群
It may be different for Hadoop 0.18 but you can try adding the IP address of the master to the conf/slaves file - then restart the cluster