如何在 Haoop v 0.21 中调用 Partitioner
在我的应用程序中,我想根据键创建尽可能多的减速器作业。现在,我当前的实现将所有键和值写入单个(reducer)输出文件中。所以为了解决这个问题,我使用了一个分区器,但我无法调用该类。分区器应该在选择Map任务之后和选择reduce任务之前调用,但它没有。分区器的代码如下是
public class MultiWayJoinPartitioner extends Partitioner<Text, Text> {
@Override
public int getPartition(Text key, Text value, int nbPartitions) {
return (key.getFirst().hashCode() & Integer.MAX_VALUE) % nbPartitions;
return 0;
}
}
这个代码是根据键和值对文件进行分区是否正确,输出将自动传输到减速器?
In my application I want to create as many reducer jobs as possible based on the keys. Now my current implementation writes all the keys and values in a single (reducer) output file. So to solve this, I have used one partitioner but I cannot call the class.The partitioner should be called after the selection Map task and before the selection reduce task but it did not.The code of the partitioner is the following
public class MultiWayJoinPartitioner extends Partitioner<Text, Text> {
@Override
public int getPartition(Text key, Text value, int nbPartitions) {
return (key.getFirst().hashCode() & Integer.MAX_VALUE) % nbPartitions;
return 0;
}
}
Is this code is correct to partition the files based on the keys and values and the output will be transfer to the reducer automatically??
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您不会显示所有代码,但通常有一个类(称为“Job”或“MR”类)用于配置映射器、减速器、分区器等,然后实际将作业提交给hadoop。在此类中,您将拥有一个具有许多属性的作业配置对象,其中之一是减速器的数量。将此属性设置为您的 hadoop 配置可以处理的任何数字。
一旦作业配置了给定数量的减速器,该数量将被传递到您的分区(顺便说一句,这看起来是正确的)。您的分区器将开始为键/值对返回适当的减速器/分区。这就是获得尽可能多的减速器的方法。
You don't show all of your code, but there is usually a class (called the "Job" or "MR" class) that configures the mapper, reducer, partitioner, etc. and then actually submits the job to hadoop. In this class you will have a job configuration object that has many properties, one of which is the number of reducers. Set this property to whatever number your hadoop configuration can handle.
Once the job is configured with a given number of reducers, that number will be passed into your partition (which looks correct, by the way). Your partitioner will start returning the appropriate reducer/partition for the key/value pair. That's how you get as many reducers as possible.