dfs.replication.max 的含义是什么
关于 HDFS
dfs.replication.max 的含义是什么?
来自文档 - https:/ /hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
它只是说 - 最大块复制
但还是不明白这个意思
regarding to HDFS
what is the meaning of dfs.replication.max ?
from doc - https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
its say only that - Maximal block replication
but still not understand this meaning
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
让我们想一想。我们有一个最小复制数,通常是 3。
为什么有一个最大复制数?也许您进行了大量维护并定期从集群中取出一个节点。您最终可能会[取出节点]并[将节点重新放回]集群中,并且可以合理地认为,节点离开和返回时可能会出现一个块的 4 个副本。这可能是一个很好的情况,因为您的定期维护有一个额外的副本,这样维护并不总是需要大量的复制。您可以接受最多 4 个副本作为复制。极端地说,如果您有一个文件的 50 个副本,这可能会有点失控,因为重复次数太多并开始占用 hdfs 空间。将最大值视为您可能开始剔除额外副本的时间。
Let's think through this. We have a min replication and this is typically 3.
Why have a max? Maybe you do a lot of maintenance and regularly take a node out of the cluster. You may end up by [taking nodes out] and [replacing nodes back in ] the cluster and it's reasonable to think 4 replicas of a block might happen with nodes leaving and returning. This might be a good situation due to your regular maintenance to have an extra copy hanging around so that maintenance doesn't always require lot of replication. You might accept 4 replicas as a max to replication. Taken to the extreme, this might get a little out of hand if you have 50 replicas of a file as this is just too much duplication and starts to eat into hdfs space. Think of the max as the time you might start to cull extra replicas.