MapReduceBase 和 Mapper 已弃用
public static class Map extends MapReduceBase implements Mapper
MapReduceBase
、Mapper
和 JobConf
在 Hadoop 0.20.203 中已弃用。
我们现在应该用什么?
编辑 1 - 对于 Mapper
和 MapReduceBase
,我发现我们只需要扩展 Mapper
public static class Map extends Mapper
<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
编辑 2 - 对于 JobConf< /code> 我们应该使用这样的配置:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setMapperClass(WordCount.Map.class);
}
编辑 3 - 我根据新的 API 找到了一个很好的教程:http://sonerbalkir.blogspot.com/2010/01/new-hadoop -api-020x.html
public static class Map extends MapReduceBase implements Mapper
MapReduceBase
, Mapper
and JobConf
are deprecated in Hadoop 0.20.203.
What should we use now?
Edit 1 - for the Mapper
and the MapReduceBase
, I found that we just need to extends the Mapper
public static class Map extends Mapper
<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
Edit 2 - For JobConf
we should use configuration like this:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setMapperClass(WordCount.Map.class);
}
Edit 3 - I found a good tutorial according to the new API : http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Javadoc 包含该废弃类的使用信息:
例如 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobConf.html
编辑:当您使用maven和开放类声明(F3)时maven可以自动下载源代码,你会看到javadoc注释的内容和解释。
Javadoc contains info what to use instaed of this depraceated classes:
e.g. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobConf.html
Edit: When you use maven and open class declaration (F3) maven can automatically download source code and you'll see content of javadoc comments with explanations.
没有太大不同新旧API之间的功能,除了旧API支持推送到map/reduce函数,而新API支持推送和拉取 API。尽管如此,新的 API 更加简洁并且易于发展。
这里是JIRA,用于介绍新的API。此外,旧 API 在 0.21 中已未弃用,并将被弃用在 发布 0.22 或 0.23。
您可以找到有关新 API 或有时称为“上下文对象”的更多信息 此处 和 此处。
There is not much different functionality wise between the old and the new API, except that the old API supports push to the map/reduce functions, while the new API supports both push and pull API. Although, the new API is much cleaner and easy to evolve.
Here is the JIRA for the introduction of the new API. Also, the old API has been un-deprecated in 0.21 and will be deprecated in release 0.22 or 0.23.
You can find more information about the new API or sometimes called the 'context objects' here and here.