Hadoop伪分布式模式充分利用所有核心
我正在我的 4 核笔记本电脑上以伪分布式模式运行任务。如何确保所有核心都得到有效利用。 目前,我的作业跟踪器显示一次只有一项作业正在执行。这是否意味着只使用一个核心?
以下是我的配置文件。
conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
编辑: 根据答案,我需要在 mapred-site.xml 中添加以下属性
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
</property>
I am running a task in pseudo-distributed mode on my 4 core laptop. How can I ensure that all cores are effectively used.
Currently my job tracker shows that only one job is executing at a time. Does that mean only one core is used?
The following are my configuration files.
conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
EDIT:
As per the answer, I need to add the following properties in mapred-site.xml
<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
</property>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
mapreduce.tasktracker.map.tasks.maximum
和mapreduce.tasktracker.reduce.tasks.maximum
属性控制每个节点的映射和化简任务数量。对于 4 核处理器,从 2/2 开始,然后根据需要更改值。一个槽是一个map或一个reduce槽,将值设置为4/4将使Hadoop框架同时启动4个map和4个reduce任务。一个节点上同时运行总共8个map和reduce任务。mapred.map.tasks
和mapred.reduce.tasks
属性控制作业的 Map/Reduce 任务总数,而不是每个节点的任务数量。另外,mapred.map.tasks 是对 Hadoop 框架的提示,作业的映射任务总数等于 InputSplits 的数量。mapreduce.tasktracker.map.tasks.maximum
andmapreduce.tasktracker.reduce.tasks.maximum
properties control the number of map and reduce tasks per node. For a 4 core processor, start with 2/2 and from there change the values if required. A slot is a map or a reduce slot, setting the values to 4/4 will make the Hadoop framework launch 4 map and 4 reduce tasks simultaneously. A total of 8 map and reduce tasks run at a time on a node.mapred.map.tasks
andmapred.reduce.tasks
properties control the total number of map/reduce tasks for the job and not the # of tasks per node. Also,mapred.map.tasks
is a hint to the Hadoop framework and the total # of map tasks for the job equals the # of InputSplits.mapred.map.tasks
和mapred.reduce.tasks
将控制这一点,并且(我相信)将在mapred-site.xml
中设置。然而,这将这些设置为集群范围的默认值;更常见的是,您会根据每个作业来配置这些。您可以使用-D
在 java 命令行上设置相同的参数mapred.map.tasks
andmapred.reduce.tasks
will control this, and (I believe) would be set inmapred-site.xml
. However this establishes these as cluster-wide defaults; more usually you would configure these on a per-job basis. You can set the same params on the java command line with-D