一个目录下有很多文件?
我在Linux平台上开发一些PHP项目。将数千个图像(文件)放在一个目录中是否有任何缺点?这是不会增长的闭集。另一种方法是使用基于某个 ID 的目录结构来分隔这些文件(这样的话,一个目录中只有 100 个文件)。
我问这个问题是因为当我查看不同网站上的图像 URL 时,经常会看到这种分离。您可以看到目录分离是通过这样一种方式完成的,一个目录中的图像不超过数百张。
如果不将数千个文件(非增长集)放在一个目录中而是将它们分成一组(例如 100 个),我会得到什么?值得让事情复杂化吗?
更新:
- 目录中的文件不会有任何编程迭代(只是通过文件名直接访问图像)
- 我想强调的是图像集是关闭的。图像不足 5000 张,仅此而已。
- 此映像没有逻辑分类
- 不需要人工访问/浏览
- 映像具有唯一的文件名
- 操作系统:Debian/Linux 2.6.26-2-686,文件系统:ext3
来自答案的宝贵信息:
为什么分开许多文件到不同的目录:
- “在 nfs 上使用 ext3 时,每个目录限制 32k 个文件”
- 性能原因(访问速度)[但对于数千个文件,如果不进行测量,很难说它是否值得]
I develop some PHP project on Linux platform. Are there any disadvantages of putting several thousand images (files) in one directory? This is closed set which won't grow. The alternative would be to separate this files using directory structure based on some ID (this way there would be let's say only 100 in one directory).
I ask this question, because often I see such separation when I look at images URLs on different sites. You can see that directory separation is done in such way, that no more then several hundreds images are in one directory.
What would I gain by not putting several thousand files (of not growing set) in one directory but separating them in groups of e.g. 100? Is it worth complicating things?
UPDATE:
- There won't be any programmatic iteration over files in a directory (just a direct access to an image by it's filename)
- I want to emphasize that the image set is closed. It's less then 5000 images, and that is it.
- There is no logical categorization of this images
- Human access/browse is not required
- Images have unique filenames
- OS: Debian/Linux 2.6.26-2-686, Filesystem: ext3
VALUABLE INFORMATION FROM THE ANSWERS:
Why separate many files to different directories:
- "32k files limit per directory when using ext3 over nfs"
- performance reason (access speed) [but for several thousand files it is difficult to say if it's worth, without measuring ]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
除了通过将图像分成子目录来加快文件访问速度之外,您还可以在达到文件系统的自然限制之前显着扩展可以跟踪的文件数量。
一种简单的方法是使用
md5()
文件名,然后使用前 n 个字符作为目录名(例如,substr(md5($filename), 2)
)。这确保了合理均匀的分布(与采用直接文件名的前 n 个字符相比)。In addition to faster file access by separating images into subdirectories, you also dramatically extend the number of files you can track before hitting the natural limits of the filesystem.
A simple approach is to
md5()
the file name, then use the first n characters as the directory name (eg,substr(md5($filename), 2)
). This ensures a reasonably even distribution (vs taking the first n characters of the straight filename).通常这种分割的原因是文件系统性能。
对于一组封闭的 5000 个文件,我不确定是否值得这么麻烦。
我建议您尝试一种简单的方法,将所有文件放在一个目录中,但要密切注意访问文件所需的实际时间。
如果您发现它不够快,无法满足您的需求,您可以按照您的建议进行拆分。
出于性能原因,我必须自己拆分文件。
另外,当在 nfs 上使用 ext3 时,我遇到了每个目录 32k 文件的限制(不确定这是 nfs 还是 ext3 的限制)。
所以这是分成多个目录的另一个原因。
无论如何,请尝试使用单个目录,并且仅在您发现它不够快时才进行拆分。
usually the reason for such splitting is file system performance.
for a closed set of 5000 files I am not sure it's worth the hassle.
I suggest that you try the simple approach of putting all the files in one directory thing, but keep an eye open on the actual time it takes to access the files.
if you see that it's not fast enough for your needs, you can split it like you suggested.
I had to split files myself for performance reasons.
in addition I bumped into a 32k files limit per directory when using ext3 over nfs (not sure if it's a limit of nfs or ext3).
so that's another reason to split into multiple directories.
in any case, try with a single dir and only split if you see it's not fast enough.
如果您不希望出现任何文件名冲突并且不需要在任何时候迭代这些图像,则没有理由将这些文件拆分到多个目录中。
但是,如果您能想到一个建议性的分类,对图像进行一些排序也不是一个坏主意,即使只是出于维护原因。
There is no reason to split those files into multiple directories, if you won't expect any filename conflicts and if you don't need to iterate over those images at any point.
But still, if you can think of a suggestive categorization, it's not a bad idea to sort the images a bit, even if it is just for maintenance reasons.
我认为这个问题有两个方面:
您正在使用的 Linux 文件系统是否有效支持包含数千个文件的目录。我不是专家,但我认为较新的文件系统不会有问题。
特定 PHP 函数是否存在性能问题?我认为直接访问文件应该没问题,但如果您正在执行目录列表,那么您最终可能会遇到时间或内存问题。
I think there is two aspects to this question:
Does the Linux file system that you're using efficiently support directories with thousands of files. I'm not an expert, but I think the newer file systems won't have problems.
Are there performance issues with specific PHP functions? I think direct access to files should be okay, but if you're doing directory listings then you might eventually run into time or memory problems.
我能想象它会产生不利影响的唯一原因是在迭代目录时。更多文件,意味着更多迭代。但这基本上是我从编程角度所能想到的。
The only reason I could imagine where it would be detrimental was when iterating over the directory. More files, means more iterations. But that's basically all I can think of from a programming perspective.
几千张图还是可以的。当您访问目录时,操作系统按 4K 块读取其文件列表。如果您有简单的目录结构,并且其中有许多(例如数十万)文件,则可能需要一些时间来读取整个文件列表。
Several thousand images are still okay. When you access a directory, operating systems reads the listing of its files by blocks of 4K. If you have plain directory structure, it may take time to read the whole file listing if there are many (e. g. hundred thousand) files in it.
如果可以选择更改文件系统,我建议将所有图像存储到 ReiserFS 文件系统中。它非常适合快速存储/访问大量小文件。
如果没有,MightyE 将它们分成文件夹的反应是最合乎逻辑的,并且将大大增加访问时间。
If changing the filesystem is an option, I'd recommend moving wherever you store all the images to a ReiserFS filesystem. It is excellent at fast storage/access of lots of small files.
If not, MightyE's response of breaking them into folders is most logical and will increase access times by a considerable margin.