在Reduce阶段之后实现称为合并的第三阶段
我需要添加第三个阶段 - 合并 - 它结合了单独的并行Reduce任务的输出。这使得可以执行诸如连接和构建笛卡尔积之类的操作。任何人都可以帮助我如何做到这一点吗?我检查了没有Hadoop 0.21 API支持他的功能。
I need to add a third phase – merge – which combines the outputs of separate, parallel Reduce tasks.This makes it possible to do things like joins and build cartesian products.Can anyone help me how to do it??I checked there is no Hadoop 0.21 API to support his function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
continue
Hadoop is a MapReduce (not MapReduceMerge!) framework and this is not likely to change. That said, you could file a Jira or ask at http://getsatisfaction.com/cloudera/ to get the official stand on this.
If you need joins you should try Pig (the only one I have hands-on experience with, but there are others too - Hive,...). Pig makes joins quite simple to do.