MapReduce排序迭代器

发布于 2024-12-09 13:38:52 字数 821 浏览 0 评论 0原文

我正在阅读MapRedcue的源代码，以更好地了解MapReduce的内部机制。当我试图理解映射阶段生成的数据如何合并并发送到reduce函数以进行进一步处理时，我遇到了问题。源代码看起来太复杂，我只想了解它的概念。

我想知道的是在传递给reduce() 函数之前如何对值（作为参数迭代器）进行排序。在MapTask.runOldReducer() 中，它将通过传递RawKeyValueIterator 创建ReduceValuesIterator，其中将调用Merger.merge() 并执行许多操作（例如收集段）。阅读代码后，在我看来，它只是尝试按键排序，并且该键附带的值将被聚合/收集而不会被删除。例如，map()可能会产生

    Key                              Value
    http://www.abcfood.com/aLink     object A
    http://www.abcfood.com/bLink     object B
    http://www.abcfood.com/cLink     object C

然后在reduce()中，

Key将为http://www.abcfood.com/< /a> 和 Values 将包含对象 A、对象 B 和对象 C。

因此它是按键 http:// 排序的www.abcfood.com/？这是正确的吗？或者它对什么进行排序然后传递给reduce函数？

非常感谢。

原文

I am reading the source code of MapRedcue to gain more understanding MapReduce's internal mechanism. And I have problem when trying to understand how data produced in map phase are merged and sent to reduce function for further processing. The source code looks too complicated and I just want to know its concepts.

What I want to know is how the values (as parameter Iterator) are sorted before passing to reduce() function. Within MapTask.runOldReducer() it will create ReduceValuesIterator by passing RawKeyValueIterator, where Merger.merge() will get called and lots of actions will be performed (e.g. collect segments). After reading code, it seems to me it only tries to sort by key and the values accompanied with that key will be aggregated/ collected without being removed. For instance, map() may produce

    Key                              Value
    http://www.abcfood.com/aLink     object A
    http://www.abcfood.com/bLink     object B
    http://www.abcfood.com/cLink     object C

Then in reduce(),

Key will be http://www.abcfood.com/ and Values will contain object A, object B, and object C.

So it is sorted by the key http://www.abcfood.com/? Is this correct? Or what is it sorted and then passed to reduce function?

Many thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

椒妓 2024-12-16 13:38:52

假设这是您的输入：

Key                              Value
http://www.example.com/asd       object A
http://www.abcfood.com/aLink     object A
http://www.abcfood.com/bLink     object B
http://www.abcfood.com/cLink     object C
http://www.example.com/t1        object X

减速器将得到这个：（不保证值的顺序）

Key                              Values
http://www.abcfood.com/          [ "object A", "object C", "object B" ]
http://www.example.com/          [ "object X", "object A" ]

assuming this is your input :

Key                              Value
http://www.example.com/asd       object A
http://www.abcfood.com/aLink     object A
http://www.abcfood.com/bLink     object B
http://www.abcfood.com/cLink     object C
http://www.example.com/t1        object X

the reducer will get this : (there is no guarantee on order of values)

Key                              Values
http://www.abcfood.com/          [ "object A", "object C", "object B" ]
http://www.example.com/          [ "object X", "object A" ]

回复收藏 0 原文