Hadoop HDFS 最大文件大小
我的一位同事认为 HDFS 没有最大文件大小,即通过分区为 128 / 256 meg 块,可以存储任何大小的文件(显然 HDFS 磁盘有一个大小,并且会受到限制,但这是唯一的限制)。我找不到任何说有限制的内容,所以她是正确的吗?
谢谢,吉姆
A colleague of mine thinks that HDFS has no maximum file size, i.e., by partitioning into 128 / 256 meg chunks any file size can be stored (obviously the HDFS disk has a size and that will limit, but is that the only limit). I can't find anything saying that there is a limit so is she correct?
thanks, jim
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
显然有一个实际的限制。但物理上 HDFS 块 ID 是 Java 长整型
因此它们的最大值为 2^63,如果您的块大小为 64 MB,则最大大小为 512 yottabytes。
Well there is obviously a practical limit. But physically HDFS Block IDs are Java longs
so they have a max of 2^63 and if your block size is 64 MB then the maximum size is 512 yottabytes.
我认为她说的 HDFS 上没有最大文件大小是正确的。您唯一可以真正设置的是块大小,默认情况下为 64 MB。我猜想可以存储任何长度的大小,唯一的限制可能是文件的大小越大,容纳它的硬件就越大。
I think she's right about saying there's no maximum file size on HDFS. The only thing you can really set is the chunk size, which is 64 MB by default. I guess sizes of any length can be stored, the only constraint could be that the bigger the size of the file, the greater the hardware to accommodate it.
我不是 Hadoop 专家,但据我所知,尽管存在诸如总体存储容量和最大命名空间大小等隐性因素,但单个文件大小没有明确的限制。此外,可能还有关于实体数量和目录大小的管理报价。 本文档中很好地描述了 HDFS 容量主题。引用在此处进行了描述和讨论此处。
我建议额外关注最后一个链接引用的 Michael G Noll 的博客,它涵盖了许多特定于 hadoop 的主题。
I am not an expert in Hadoop, but AFAIK, there is no explicit limitation on a single file size, though there are implicit factors such as overall storage capacity and maximum namespace size. Also, there might be administrative quotes on number of entities and directory sizes. The HDFS capacity topic is very well described in this document. Quotes are described here and discussed here.
I'd recommend paying some extra attention to the Michael G Noll's blog referred by the last link, it covers many hadoop-specific topics.