如何让 Hadoop 使用我系统上的所有核心?

发布于 2024-12-09 08:31:07 字数 149 浏览 0 评论 0原文

我有一个32核的系统。当我使用 Hadoop 运行 MapReduce 作业时,我从未看到 java 进程使用超过 150% CPU(根据 top),并且通常保持在 100% 左右。它应该接近 3200%。

我需要更改哪些属性(以及在哪个文件中)才能启用更多工作人员?

I have a 32 core system. When I run a MapReduce job using Hadoop I never see the java process use more than 150% CPU (according to top) and it usually stays around the 100% mark. It should be closer to 3200%.

Which property do I need to change (and in which file) to enable more workers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夏尔 2024-12-16 08:31:07

可能有两个问题,我在下面概述。我还想指出,这是一个非常常见的问题,您应该查看之前提出的 Hadoop 问题。


您的 mapred.tasktracker.map.tasks.maximum 可能在 conf/mapred-site.xml 中设置得较低。如果当您检查 JobTracker 时,您看到几个待处理的任务,但只有几个正在运行的任务,这将是问题所在。每个任务都是一个线程,因此假设该节点上最多需要 32 个插槽。


否则,您的数据可能没有被分割成足够的块。您正在运行少量数据吗?您的 MapReduce 作业可能仅在几个输入拆分上运行,因此不需要更多映射器。尝试在数百 MB 的数据上运行您的作业,看看您是否仍然遇到相同的问题。
Hadoop 自动分割您的文件。文件分成的块数是文件的总大小除以块大小。默认情况下,一个映射任务将分配给每个块(而不是每个文件)。

在您的conf/hdfs-site.xml配置文件中,有一个dfs.block.size参数。大多数人将此设置为 64 或 128mb。但是,如果您尝试做一些小事情,您可以将其设置为更多地分解工作。

您还可以手动将文件拆分为 32 个块。

There could be two issues, which I outline below. I'd also like to point out that this is a very common question and you should look at the previously asked Hadoop questions.


Your mapred.tasktracker.map.tasks.maximum could be set low in conf/mapred-site.xml. This will be the issue if when you check the JobTracker, you see several pending tasks, but only a few running tasks. Each task is a single thread, so you would hypothetically need 32 maximum slots on that node.


Otherwise, likely your data is not being split into enough chunks. Are you running over a small amount of data? It could be that your MapReduce job is running over only a few input splits and thus does not require more mappers. Try running your job over hundreds of MB of data instead and see if you still have the same issue.
Hadoop automatically splits your files. The number of blocks a file is split up into is the total size of the file divided by the block size. By default, one map task will be assigned to each block (not each file).

In your conf/hdfs-site.xml configuration file, there is a dfs.block.size parameter. Most people set this to 64 or 128mb. However, if you are trying to do something tiny you could set this up to split up the work more.

You can also manually split your file into 32 chunks.

墨小墨 2024-12-16 08:31:07

我认为你需要将“mapreduce.framework.name”设置为“yarn”,因为默认值是“local”。

将以下内容放入您的mapred-site.xml中

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

I think you need to set "mapreduce.framework.name" to "yarn",because the default value is "local".

put the following into your mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文