具有已排序文件的 Hadoop MapReduce
我正在使用 Hadoop MapReduce。我已经在 HDFS 中获取了数据,并且每个文件中的数据都已排序。是否可以强制 MapReduce 在映射阶段后不使用数据?我尝试将 map.sort.class 更改为无操作,但它不起作用(即数据未按我的预期排序)。有没有人尝试做类似的事情并设法实现它?
I'm working with Hadoop MapReduce. I've got data in HDFS and data in each file is already sorted. Is it possible to force MapReduce not to resort the data after map phase? I've tried to change the map.sort.class to no-op, but it didn't work (i.e. the data wasn't sorted as I'd expected). Does anyone tried doing something similar and managed to achieve it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这取决于您想要什么样式的结果,排序结果还是未排序结果?
如果你需要对结果进行排序,我认为hadoop不适合做这项工作。有两个原因:
如果您不需要对结果进行排序,我认为这个补丁可能是您想要的:
支持地图输出中不排序数据流并减少合并短语:https://issues.apache.org/jira/browse/MAPREDUCE-3397
I think it depends on what style result you want, sorted result or unsorted result?
If you need result be sorted, I think hadoop is not suitable to do this work. There are two reasons:
If you do not need result be sorted,I think this patch may be what you want:
Support no sort dataflow in map output and reduce merge phrase : https://issues.apache.org/jira/browse/MAPREDUCE-3397