使用mapred或mapreduce包来创建Hadoop作业更好吗?

发布于 2024-12-07 05:32:15 字数 247 浏览 1 评论 0 原文

要创建 MapReduce 作业,您可以使用旧的 org.apache.hadoop.mapred 包或较新的 org.apache.hadoop.mapreduce 包来处理 Mappers 和 Reducers、作业。 ..第一个已被标记为已弃用,但同时已恢复。现在我想知道使用旧的mapred包还是新的mapreduce包来创建作业更好,为什么。或者它只是取决于您是否需要像 MultipleTextOutputFormat 这样的东西,它只在旧的 mapred 包中可用?

To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been marked as deprecated but this got reverted meanwhile. Now I wonder whether it is better to use the old mapred package or the new mapreduce package to create a job and why. Or is it just dependent on whether you need stuff like the MultipleTextOutputFormat which is only available in the old mapred package?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

时光暖心i 2024-12-14 05:32:15

在功能方面,旧版 (oahmapred) 和新版 (oahmapreduce) API 之间没有太大区别。唯一显着的区别是记录被推送到旧 API 中的映射器/减速器。而新的API同时支持拉/推机制。您可以在此处获取有关拉取机制的更多信息。

此外,自 0.21 起,旧 API 已未弃用。您可以找到有关新 API 的更多信息 此处

正如您提到的,某些类(如 MultipleTextOutputFormat)尚未迁移到新 API,由于这个原因和上述原因,最好坚持使用旧 API(尽管翻译通常非常简单)。

Functionality wise there is not much difference between the old (o.a.h.mapred) and the new (o.a.h.mapreduce) API. The only significant difference is that records are pushed to the mapper/reducer in the old API. While the new API supports both pull/push mechanism. You can get more information about the pull mechanism here.

Also, the old API has been un-deprecated since 0.21. You can find more information about the new API here.

As you mentioned some of the classes (like MultipleTextOutputFormat) have not been migrated to the new API, due to this and the above mentioned reason it's better to stick to the old API (although a translation is usually quite simple).

尸血腥色 2024-12-14 05:32:15

新旧 API 都很好。不过新的 API 更干净。尽可能使用新 API,并在需要新 API 中不存在的特定类(例如 MultipleTextOutputFormat)时使用旧 API,

但请注意不要混合使用旧 API 和旧 API。同一 Mapreduce 作业中的新 API。这会导致奇怪的问题。

Both the old and new APIs are good. The new API is cleaner though. Use the new API wherever you can, and use the old one wherever you need specific classes that are not present in the new API (like MultipleTextOutputFormat)

But do take care not to use a mix of the old and new APIs in the same Mapreduce job. That leads to weird problems.

阪姬 2024-12-14 05:32:15

旧API (mapred)

  1. 存在于org.apache.hadoop.mapred包中

  2. 提供map/reduce作业配置。

  3. 根据迭代器
  4. 摘要

新 API (mapreduce)

  1. 存在于包 org.apache.hadoop.mapreduce

  2. 作业配置是由单独的类完成的,称为 JobConf,它是 Configuration 的扩展

  3. 根据 Iterable 减少给定键的值

  4. 软件包摘要

Old API (mapred)

  1. Exists in Package org.apache.hadoop.mapred

  2. Provide A map/reduce job configuration.

  3. Reduces values for a given key, based on the Iterator
  4. Package Summary

New API (mapreduce)

  1. Exists in Package org.apache.hadoop.mapreduce

  2. Job configuration is done by separate class, Called JobConf which is extension of Configuration
    Class

  3. Reduces values for a given key, based on the Iterable

  4. Package Summary

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文