使用 MapReduce 进行排列
有没有办法用 MapReduce 生成排列?
输入文件:
1 title1
2 title2
3 title3
我的目标:
1,2 title1,title2
1,3 title1,title3
2,3 title2,title3
Is there a way to generate permutations with MapReduce?
input file:
1 title1
2 title2
3 title3
my goal:
1,2 title1,title2
1,3 title1,title3
2,3 title2,title3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于文件将具有
n
个输入,因此排列应具有n^2
个输出。您可以让n
个任务执行其中的n
个操作,这是有道理的。我相信你可以这样做(假设只针对一个文件):将你的输入文件放入 DistributedCache 可供您的 Mapper/Reducers 只读访问。在文件的每一行上进行输入分割(就像在 WordCount 中一样)。因此,映射器将收到一行(例如示例中的
title1
)。然后从 DistributedCache 中的文件中读取行并发出键/值对:将键作为输入,将值作为 DistributedCache 文件中的每一行。在此模型中,您应该只需要一个 Map 步骤。
像这样的东西:
Since a file will have
n
inputs, the permutations should haven^2
outputs. It makes sense that you could haven
tasks performn
of those operations. I believe you could do this (assuming only for one file):Put your input file into the DistributedCache to be accessible as read-only to your Mapper/Reducers. Make an input split on each line of the file (like in WordCount). The mapper will thus recieve one line (e.g.
title1
in your example). Then read the lines out of the file in the DistributedCache and emit your key/value pairs: with the key as your input and the values as each line from the file from DistributedCache.In this model, you should only need a Map step.
Something like: