是在更少的文件夹中包含更多的文件更快,还是在更多的文件夹中包含更少的文件更快?
嘿大家。我正在创建一个将生成和存储数百万张图像的应用程序。在开始之前,我想知道是否有人知道是否最好生成更多文件夹并在每个文件夹中只保留几个文件,或者我应该使用几个文件夹并用大量文件填充它们?
生成器将用 C++ 编写,文件将通过 GET 请求直接访问。
谢谢, 史蒂夫
Hey all. I'm creating an application that is going to be generating and storing millions of images. Before I start on this, I'm wondering if anyone knows if it's better to generate more folders and only keep a few files in each, or should I use a few folders and fill them up with lots of files?
The generator will be written in C++ and the files will be accessed directly via GET requests.
Thanks,
Steve
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在速度、可管理性等方面:使用更多文件夹。如果您检查一些大型应用程序,通常它们会将文件分割在许多文件夹中。大多数应用程序和/或文件系统不喜欢一个文件夹中包含太多文件。从程序员的角度来看,这并不重要。
In terms of speed, manageability etc: go with more folders. If you examine a few big applications, generally, they split up the files in many folders. Most applications and/or file systems doesn't like too many files in one folder. From a programmers point of view, it doesn't matter.
与以往一样,您需要在特定的部署平台上运行一些针对各种场景的测试。请注意,您没有提到您正在运行的操作系统/文件系统等。
我通常会在深度嵌套的层次结构(可能快速但难以管理)和所有内容都存储在一个目录中的平面层次结构之间实现某种平衡。后一种情况过去曾在大多数平台上给我带来性能问题。您需要存储多少数据以及您需要解决方案的性能将决定您如何构建目录,并且一些实验将为您提供指导。
As ever, you need to run some tests with various scenarios on your particular deployment platform. Note that you've not mentioned which OS/filesystem etc. that you're running on.
I would generally implement some balance between a deeply nested hierarchy (fast but difficult to manage, possibly), and a flat hierarchy with everything stored in one directory. This latter case has caused me performance problems on most platforms in the past. How much data you need to store and how performant you need your solution will dictate how you structure your directories, and some experimentation will give you pointers here.
想到的事情:
Pro“更少的文件夹”
Pro“更多文件夹”:
针对这些竞争压力进行优化将取决于了解典型的使用模式是什么样的,这意味着您可能必须首先猜测。
但为了方便在屏幕上显示,我建议每个目录包含几个以上且少于一百个条目。然后您可以收集统计数据并从那里进行调整。
Things that come to mind:
Pro "fewer folders"
Pro "more folders":
Optimizing against these competing pressures will depend on knowing what a typical use pattern looks like, which means you may have to guess initially.
But just for convenient display on the screen I'd suggest more than a handful and fewer than a hundred entries per directory. Then you can collect statistics and adjust from there.
@dmckee
无需点击,因为图像全部自动加载。想想地图软件。
@布莱恩·阿格纽
它将在某种 Linux 云上运行/提供服务。不管怎么想,我都不是一个 IT 人,只是一个程序员。但它肯定会扩展到一堆机器上。
@Onkelborg
我同意。我的倾向是使用更多的文件夹和更少的文件。我认为布局会类似于...
set/zoom-level/column/row.jpg
我想使用文件名/目录结构来提取文件而不查询服务器。如果我们放大五倍,并且该较大图像的左上角坐标为 25,600 x 15,360,给定一个 256 像素的方形图块,一些基本数学将给出以下 URL:
2389/5/20/12.jpg
其中“2389”是图块集 ID。所以你可以看到图像只会存储在三层深度的目录中。包含图像的目录可能会根据缩放级别保存 4 - ~100 张图像。或者也许是十几个到几百个(文件夹稍微少一些),如果这样的话...
set/zoom-level/row/column.jpg
我遇到了一个类似的系统,它使用类似的四叉树系统,并注意到它们必须在奇怪的、非系统性的地方分解成新的文件夹,这让我认为他们这样做是为了性能问题或其他限制。
当我写这篇文章时,我想我意识到第一种布局可能是正确的选择。需要迭代查找请求的文件的项目较少。我只是在考虑碎片化,但我想这将是 IT 的工作。 ;)
@dmckee
No clicks, as the images all load automatically. Think mapping software.
@Brian Agnew
It will run/served on some sort of Linux cloud thing. I'm not an IT guy by any stretch of the imagination, just the programmer. But it will definitely be scaled out to a bunch of machines.
@Onkelborg
I concur. My inclination has been to go with more folders and less files, as well. I'm thinking the layout would be something like...
set/zoom-level/column/row.jpg
I wanted to use filename/directory structure to pull files without querying a server. If we're zoomed in by a factor of five and the top left coordinate is 25,600 x 15,360 of this larger image, given a 256 pixel square tile, some basic math would give me this URL:
2389/5/20/12.jpg
Where "2389" is a tile-set ID. So you can see images would only be stored in directories three levels deep. The directories with images would hold maybe 4 - ~100 images based on zoom level. Or maybe a dozen to a few hundred (with slightly less folders), if went this way...
set/zoom-level/row/column.jpg
I came across a similar system that used a similar quad tree system and notice that they had to break out into new folders at odd, non-systemic spots that made me think they did it for performance issues or other limitations.
As I've written this, I think I'm realize that the first layout is probably the way to go. It's less items to iterate through to find requested file. I'm just thinking of fragmentation, but I guess that will be IT's job. ;)