一个目录中有多少个文件算太多(在 Windows 和 Linux 上)?
可能的重复:
目录中的文件数量过多?
有人告诉我,在目录中放置太多文件可能会导致 Linux 和 Windows 中的性能问题。这是真的吗?如果是这样,避免这种情况的最佳方法是什么?
Possible Duplicate:
How many files in a directory is too many?
I was told that putting too many files in a directory can cause performance problems in Linux, and Windows. Is this true? And if so, what's the best way to avoid this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
目前 Windows 文件系统是 NTFS。卷上的最大文件数量为 4,294,967,295。驱动器上的文件编目发生在 B+ 树中,这为您提供了 Log(N) 查找。
在旧的 FAT32 上,文件夹中的文件数量限制为 64K。索引也是通过每个文件夹的列表来完成的,因此在几千次之后性能急剧下降。您可能不需要担心 FAT32,除非您的受众使用 DOS、Windows 95,98 或 Millenium(恶心)。
在 Linux 上,这实际上取决于您使用的文件系统(如果您决定这样做,也可以是 NTFS),extf3 每个目录的文件数限制为 32k。查找也是 B+ 树,将为您提供 LOG(N) 查找
进一步查看后,您的问题实际上应该是关于文件系统的限制。
The Windows file system is currently NTFS. The max amount of files on a volume is 4,294,967,295. File cataloging on the drive takes place in a B+ Tree which gives you a Log(N) lookup.
On the old FAT32 there was a limit of 64K files in a folder. Indexing was also done by a list per folder, therefore after a couple of thousand performance dropped off drastically. You probably do not need to worry about FAT32, unless your audience has DOS, windows 95,98 or Millenium (Yuck).
On Linux it really depends on the File System you are using (It could be NTFS if you decide to do so) extf3 has a limitation of 32k files per directory. The lookup is also B+ Tree and will give you LOG(N) lookup
After looking this through further your question should really be regarding limitations of file systems.
根据这篇 Microsoft 文章,目录的查找时间与目录数量的平方成正比。条目。 (尽管这是 NT 3.5 的一个错误。)
在 软件论坛上的老乔尔。一个答案是,性能似乎在 1000 到 3000 个文件之间下降,并且一张海报达到了 18000 个文件的硬限制。还有一篇文章声称可以包含 300,000 个文件,但随着所有 8.3 文件名都用完,搜索时间会迅速减少。
为了避免大型目录,请创建一层、两层或多层子目录并将文件散列到这些子目录中。最简单的哈希类型使用文件名的字母。因此,假设您选择了 3 层嵌套,则以 abc0001.txt 开头的文件将被放置为 a\b\c\abc0001.txt。 3 可能有点过头了 - 每个目录使用两个字符会减少嵌套级别的数量。例如
ab\abc0001.txt
。如果您预计任何目录的内容将远远多于 ca,则只需要进行两层嵌套。 3000 个文件。According to this Microsoft article, the lookup time of a directory increases proportional to the square of the number of entries. (Although that was a bug against NT 3.5.)
A similar question was asked on the Old Joel on Software Forum. One answer was that performance seems to drop between 1000 and 3000 files, and one poster hit a hard limit at 18000 files. Still another post claims that 300,000 files are possible but search times decrease rapidly as all the 8.3 filenames are used up.
To avoid large directories, create one, two or more levels of subdirectories and hash the files into those. The simplest kind of hash uses the letters of the filename. So a file starting abc0001.txt would be placed as a\b\c\abc0001.txt, assuming you chose 3 levels of nesting. 3 is probably overkill - using two characters per directory reduces the number of nesting levels. e.g.
ab\abc0001.txt
. You will only need to go to two levels of nesting if you anticipate that any directory will have vastly more than ca. 3000 files.