为什么所有的reduce 任务都在一台机器上结束?

发布于 2024-12-02 18:31:28 字数 285 浏览 1 评论 0 原文

我在Hadoop平台(cloudera发行版)中编写了一个相对简单的map-reduce程序。每张地图和除了常规的 Map-Reduce 任务之外,Reduce 还将一些诊断信息写入标准输出。

然而,当我查看这些日志文件时,我发现 Map 任务在节点之间分布相对均匀(我有 8 个节点)。但reduce任务标准输出日志只能在一台机器上找到。

我想,这意味着所有的reduce 任务最终都在一台机器上执行,这是有问题且令人困惑的。

有人知道这里发生了什么事吗?是配置问题吗? 我怎样才能让减少的工作也均匀分布?

I wrote a relatively simple map-reduce program in Hadoop platform (cloudera distribution). Each Map & Reduce write some diagnostic information to standard ouput besides the regular map-reduce tasks.

However when I'm looking at these log files, I found that Map tasks are relatively evenly distributed among the nodes (I have 8 nodes). But the reduce task standard output log can only be found in one single machine.

I guess, that means all the reduce tasks ended up executing in a single machine and that's problematic and confusing.

Does anybody have any idea what's happening here ? Is it configuration problem ?
How can I make the reduce jobs also distribute evenly ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

三人与歌 2024-12-09 18:31:28

如果映射器的输出都具有相同的键,它们将被放入单个减速器中。

如果您的作业有多个减速器,但它们都在一台机器上排队,那么您就会遇到配置问题。

使用 Web 界面 (http://MACHINE_NAME:50030) 监视作业并查看其具有的减速器以及正在运行它们的机器。还可以深入研究其他信息,这些信息将提供有助于解决问题的信息。

关于您的配置的几个问题:

  • 有多少个减速器正在运行?
  • 每个节点上有多少个可用的减速器?
  • 运行reducer的节点是否更好
    硬件比其他节点好?

If the output from your mappers all have the same key they will be put into a single reducer.

If your job has multiple reducers, but they all queue up on a single machine, then you have a configuration issue.

Use the web interface (http://MACHINE_NAME:50030) to monitor the job and see the reducers it has as well as what machines are running them. There is other information that can be drilled into that will provide information that should be helpful in figuring out the issue.

Couple questions about your configuration:

  • How many reducers are running for the job?
  • How many reducers are available on each node?
  • Is the node running the reducer better
    hardware than the other nodes?
影子是时光的心 2024-12-09 18:31:28

Hadoop 通过使用 分区器
如果您只输出几个键并希望在​​您的减速器之间均匀分布,那么您最好为输出数据实现自定义分区器。例如,

public class MyCustomPartitioner extends Partitioner<KEY, VALUE>
{
    public int getPartition(KEY key, VALUE value, int numPartitions) {
            // do something based on the key or value to determine which 
            // partition we want it to go to.
    }
}

您可以在作业配置中设置此自定义分区程序,

Job job = new Job(conf, "My Job Name");
job.setPartitionerClass(MyCustomPartitioner.class);

如果您想根据作业设置进行任何进一步的配置,您还可以在自定义分区程序中实现可配置接口。
另外,检查您是否没有在配置中的任何位置(查找“mapred.reduce.tasks”)或代码中将减少任务的数量设置为 1,例如

job.setNumReduceTasks(1); 

Hadoop decides which Reducer will process which output keys by the use of a Partitioner
If you are only outputting a few keys and want an even distribution across your reducers, you may be better off implementing a custom Partitioner for your output data. eg

public class MyCustomPartitioner extends Partitioner<KEY, VALUE>
{
    public int getPartition(KEY key, VALUE value, int numPartitions) {
            // do something based on the key or value to determine which 
            // partition we want it to go to.
    }
}

You can then set this custom partitioner in the job configuration with

Job job = new Job(conf, "My Job Name");
job.setPartitionerClass(MyCustomPartitioner.class);

You can also implement the Configurable interface in your custom Partitioner if you want to do any further configuration based on job settings.
Also, check that you haven't set the number of reduce tasks to 1 anywhere in the configuration (look for "mapred.reduce.tasks"), or in code, eg

job.setNumReduceTasks(1); 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文