HDFS 上的文件块
Hadoop是否保证同一文件的不同块将存储在集群中的不同机器上?显然,复制的块将位于不同的机器上。
Does Hadoop guarantee that different blocks from same file will be stored on different machines in the cluster? Obviously replicated blocks will be on different machines.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不会。如果您查看HDFS 架构指南,您将看到(在图中)文件
part-1
的复制因子为 3,并且由标记为 2、4 和 5 的三个块组成。请注意块 2 和 5 的结构在一种情况下,在同一个数据节点上。No. If you look at the HDFS Architecture Guide, you'll see (in the diagram) that file
part-1
has a replication factor of 3, and is made up of three blocks labelled 2, 4, and 5. Note how blocks 2 and 5 are on the same Datanode in one case.显然不是: http://hadoop.apache.org/common/ docs/r0.20.2/hdfs_user_guide.html#Rebalancer
Apparently not: http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Rebalancer
我认为恰恰相反。撇开复制不谈,每个数据节点将每个数据块存储为本地文件系统中自己的文件。
On the contrary I think. Setting aside replication, each datanode stores each block of data as its own file in the local file system.
Hadoop 并不能保证这一点。由于这会带来巨大的安全损失,因此如果您在作业中请求文件,则宕机的数据节点将导致整个作业失败。只是因为块不可用。无法想象您的问题的用例,也许您可以多说一些以了解您的真正意图。
Well Hadoop does not guarantee that. Since that is a huge loss of security, if you are requesting a file within a job, a downed datanode will cause the complete job to fail. Just because a block is not available. Can't imagine a usecase for your question, maybe you can tell a bit more to understand what your intention really was.