HDFS 目录中允许的最大文件数是多少?

发布于 2024-11-15 19:40:29 字数 39 浏览 2 评论 0原文

HDFS (hadoop) 目录中允许的最大文件和目录数是多少?

What is the maximum number of files and directories allowed in a HDFS (hadoop) directory?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

|煩躁 2024-11-22 19:40:30

在现代 Apache Hadoop 版本中,各种 HDFS 限制由名称中带有 fs-limits 的配置属性控制,所有这些限制都有合理的默认值。这个问题特别询问目录中子级的数量。它由 dfs.namenode.fs-limits.max-directory-items 定义,默认值为 1048576。

请参阅 中的 Apache Hadoop 文档hdfs-default.xml 获取 fs-limits 配置属性及其默认值的完整列表。为方便起见,在此处复制粘贴:

<property>
  <name>dfs.namenode.fs-limits.max-component-length</name>
  <value>255</value>
  <description>Defines the maximum number of bytes in UTF-8 encoding in each
      component of a path.  A value of 0 will disable the check.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-directory-items</name>
  <value>1048576</value>
  <description>Defines the maximum number of items that a directory may
      contain. Cannot set the property to a value less than 1 or more than
      6400000.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.min-block-size</name>
  <value>1048576</value>
  <description>Minimum block size in bytes, enforced by the Namenode at create
      time. This prevents the accidental creation of files with tiny block
      sizes (and thus many blocks), which can degrade
      performance.</description>
</property>

<property>
    <name>dfs.namenode.fs-limits.max-blocks-per-file</name>
    <value>1048576</value>
    <description>Maximum number of blocks per file, enforced by the Namenode on
        write. This prevents the creation of extremely large files which can
        degrade performance.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattrs-per-inode</name>
  <value>32</value>
  <description>
    Maximum number of extended attributes per inode.
  </description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattr-size</name>
  <value>16384</value>
  <description>
    The maximum combined size of the name and value of an extended attribute
    in bytes. It should be larger than 0, and less than or equal to maximum
    size hard limit which is 32768.
  </description>
</property>

所有这些设置都使用 Apache Hadoop 社区决定的合理默认值。通常建议用户不要调整这些值,除非在非常特殊的情况下。

In modern Apache Hadoop versions, various HDFS limits are controlled by configuration properties with fs-limits in the name, all which have reasonable default values. This question specifically asked about number of children in a directory. That's defined by dfs.namenode.fs-limits.max-directory-items, and its default value is 1048576.

Refer to the Apache Hadoop documentation in hdfs-default.xml for the full list of fs-limits configuration properties and their default values. Copy-pasting here for convenience:

<property>
  <name>dfs.namenode.fs-limits.max-component-length</name>
  <value>255</value>
  <description>Defines the maximum number of bytes in UTF-8 encoding in each
      component of a path.  A value of 0 will disable the check.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-directory-items</name>
  <value>1048576</value>
  <description>Defines the maximum number of items that a directory may
      contain. Cannot set the property to a value less than 1 or more than
      6400000.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.min-block-size</name>
  <value>1048576</value>
  <description>Minimum block size in bytes, enforced by the Namenode at create
      time. This prevents the accidental creation of files with tiny block
      sizes (and thus many blocks), which can degrade
      performance.</description>
</property>

<property>
    <name>dfs.namenode.fs-limits.max-blocks-per-file</name>
    <value>1048576</value>
    <description>Maximum number of blocks per file, enforced by the Namenode on
        write. This prevents the creation of extremely large files which can
        degrade performance.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattrs-per-inode</name>
  <value>32</value>
  <description>
    Maximum number of extended attributes per inode.
  </description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattr-size</name>
  <value>16384</value>
  <description>
    The maximum combined size of the name and value of an extended attribute
    in bytes. It should be larger than 0, and less than or equal to maximum
    size hard limit which is 32768.
  </description>
</property>

All of these settings use reasonable default values as decided upon by the Apache Hadoop community. It is generally recommended that users do not tune these values except in very unusual circumstances.

月牙弯弯 2024-11-22 19:40:30

来自 http://blog.cloudera.com/blog/2009 /02/the-small-files-problem/

HDFS中的每个文件、目录和块都被表示为namenode内存中的一个对象,根据经验,每个对象占用150字节。因此,1000 万个文件,每个文件使用一个块,将使用大约 3 GB 的内存。对于当前的硬件来说,扩展到超出这个水平是一个问题。当然十亿个文件是不可行的。

From http://blog.cloudera.com/blog/2009/02/the-small-files-problem/:

Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible.

胡大本事 2024-11-22 19:40:30

块和文件存储在 HashMap 中。所以你必须使用 Integer.MAX_VALUE。
所以一个目录没有任何限制,但整个文件系统。

The blocks and files are stored in a HashMap. So you are bound to Integer.MAX_VALUE.
So a directory does not have any limitation, but the whole FileSystem.

傲娇萝莉攻 2024-11-22 19:40:30

这个问题特别提到了HDFS,但一个相关的问题是在一个Hadoop集群上可以存储多少个文件。

如果您使用 MapR 的文件系统,会有不同的答案。在这种情况下,数十亿个文件可以毫无问题地存储在集群上。

This question specifically mentions HDFS, but a related question is how many files can you store on a Hadoop cluster.

That has a different answer if you use MapR's file system. In that case, billions of files can be stored on the cluster without a problem.

笑看君怀她人 2024-11-22 19:40:30

在HDFS中,最大文件名长度为255字节。因此,关于一个文件对象仅占用 150 字节的说法是不正确或不准确的。在计算内存的字节数时,我们应该取一个对象的最大占用量。

in HDFS, the maximal file name length is 255 bytes. so, the saying about one file object only occupies 150 Bytes are not correct or exact. when calculating the bytes for memory, we should take the maximal occupation of one object.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文