为什么 Hadoop 中正确的缩减数量是 0.95 或 1.75?
hadoop 文档指出:
正确的归约次数似乎是 0.95 或 1.75 乘以 (*mapred.tasktracker.reduce.tasks.maximum)。
有了 0.95,所有的减少都可以立即启动并开始 地图完成时传输地图输出。用1.75更快 节点将完成第一轮缩减并启动第二轮 减少的浪潮在负载平衡方面做得更好。
这些值相当恒定吗?当您选择这些数字之间或之外的值时,结果是什么?
The hadoop documentation states:
The right number of reduces seems to be 0.95 or 1.75 multiplied by
( * mapred.tasktracker.reduce.tasks.maximum).With 0.95 all of the reduces can launch immediately and start
transferring map outputs as the maps finish. With 1.75 the faster
nodes will finish their first round of reduces and launch a second
wave of reduces doing a much better job of load balancing.
Are these values pretty constant? What are the results when you chose a value between these numbers, or outside of them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这些价值观应该是您的情况所需要的。 :)
以下是我对这些值的好处的理解:
.95 是允许最大程度地利用可用的减速器。如果 Hadoop 默认使用单个化简器,则不会分配化简,导致其花费的时间比应有的时间长。减速器的增加和时间的减少有近乎线性的拟合(在我有限的情况下)。如果1个reducer需要16分钟,那么8个reducer需要2分钟。
1.75 是一个试图优化节点中机器的性能差异的值。它将创建多个减速器,以便较快的机器将采用额外的减速器,而较慢的机器则不会。
该数字 (1.75) 需要根据您的硬件进行比 0.95 值更多的调整。如果您有 1 台快速机器和 3 台慢速机器,也许您只需要 1.10。这个数字需要更多的实验才能找到适合您的硬件配置的值。如果减速器的数量太多,慢速机器将再次成为瓶颈。
The values should be what your situation needs them to be. :)
The below is my understanding of the benefit of the values:
The .95 is to allow maximum utilization of the available reducers. If Hadoop defaults to a single reducer, there will be no distribution of the reducing, causing it to take longer than it should. There is a near linear fit (in my limited cases) to the increase in reducers and the reduction in time. If it takes 16 minutes on 1 reducer, it takes 2 minutes on 8 reducers.
The 1.75 is a value that attempts to optimize the performance differences o the machines in a node. It will create more than a single pass of reducers so that the faster machines will take on additional reducers while slower machines do not.
This figure (1.75) is one that will need to be adjusted much more to your hardware than the .95 value. If you have 1 quick machine and 3 slower, maybe you'll only want 1.10. This number will need more experimentation to find the value that fits your hardware configuration. If the number of reducers is too high, the slow machines will be the bottleneck again.
添加 Nija 上面所说的内容以及一些个人经验:
0.95 有点意义,因为您正在利用集群的最大容量,但同时,您也需要考虑一些空任务槽位以发生所发生的情况以防您的某些减速机出现故障。如果您使用 1 倍数量的reduce 任务槽,则失败的reduce必须等待至少一个reducer 完成。如果您使用 0.85 或 0.75 个reduce 任务槽,则您没有充分利用集群。
To add to what Nija said above, and also a bit of personal experience:
0.95 makes a bit of sense because you are utilizing the maximum capacity of your cluster, but at the same time, you are accounting for some empty task slots for what happens in case some of your reducers fail. If you're using 1x the number of reduce task slots, your failed reduce has to wait until at least one reducer finishes. If you're using 0.85, or 0.75 of the reduce task slots, you're not utilizing as much of your cluster as you could.
我们可以说这些数字不再有效。现在根据《Hadoop:权威指南》一书和hadoop wiki,我们的目标是reducer应该通过5分钟。
书中片段:
We can say that these numbers are not valid anymore. Now acording to the book "Hadoop: definitive guide" and hadoop wiki we target that reducer should process by 5 minutes.
Fragment from the book: