是在更少的文件夹中包含更多的文件更快,还是在更多的文件夹中包含更少的文件更快?

发布于 2024-09-27 16:13:11 字数 157 浏览 0 评论 0原文

嘿大家。我正在创建一个将生成和存储数百万张图像的应用程序。在开始之前,我想知道是否有人知道是否最好生成更多文件夹并在每个文件夹中只保留几个文件,或者我应该使用几个文件夹并用大量文件填充它们?

生成器将用 C++ 编写,文件将通过 GET 请求直接访问。

谢谢, 史蒂夫

Hey all. I'm creating an application that is going to be generating and storing millions of images. Before I start on this, I'm wondering if anyone knows if it's better to generate more folders and only keep a few files in each, or should I use a few folders and fill them up with lots of files?

The generator will be written in C++ and the files will be accessed directly via GET requests.

Thanks,
Steve

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦醒灬来后我 2024-10-04 16:13:11

在速度、可管理性等方面:使用更多文件夹。如果您检查一些大型应用程序,通常它们会将文件分割在许多文件夹中。大多数应用程序和/或文件系统不喜欢一个文件夹中包含太多文件。从程序员的角度来看,这并不重要。

In terms of speed, manageability etc: go with more folders. If you examine a few big applications, generally, they split up the files in many folders. Most applications and/or file systems doesn't like too many files in one folder. From a programmers point of view, it doesn't matter.

思慕 2024-10-04 16:13:11

与以往一样,您需要在特定的部署平台上运行一些针对各种场景的测试。请注意,您没有提到您正在运行的操作系统/文件系统等。

我通常会在深度嵌套的层次结构(可能快速但难以管理)和所有内容都存储在一个目录中的平面层次结构之间实现某种平衡。后一种情况过去曾在大多数平台上给我带来性能问题。您需要存储多少数据以及您需要解决方案的性能将决定您如何构建目录,并且一些实验将为您提供指导。

As ever, you need to run some tests with various scenarios on your particular deployment platform. Note that you've not mentioned which OS/filesystem etc. that you're running on.

I would generally implement some balance between a deeply nested hierarchy (fast but difficult to manage, possibly), and a flat hierarchy with everything stored in one directory. This latter case has caused me performance problems on most platforms in the past. How much data you need to store and how performant you need your solution will dictate how you structure your directories, and some experimentation will give you pointers here.

挖鼻大婶 2024-10-04 16:13:11

想到的事情:

Pro“更少的文件夹”

  • 要导航的每个文件夹都意味着用户的另一次点击,以及页面加载时的另一次延迟。
  • 如果用户要导航所有(或树的大部分),所有这些额外的文件只是需要发送更多的字节。与总数相比,这是微不足道的,除非您将“许多文件夹”策略发挥到极致,但它表明存在一定的限制。

Pro“更多文件夹”:

  • 目录内容的长列表将迫使用户滚动、提前输入或以其他方式交互来查找特定文件,而不是选择它因为他们一眼就能看到页面。
  • 用户单击文件夹 Foo 必须等待该目录中的所有项目加载完毕,然后页面才会完成渲染。对于只想要一张图像的用户来说,这可能是明显的滞后和大量字节。
  • 每次访问目录中的项目都需要一些时间。在老式文件系统上,这通常是 O(n) 操作。较新的文件系统支持 O(ln(n)) 访问。这如何影响系统的最佳运行取决于您计划使用的文件系统的性能。还要注意通常的用例(我认为是查看少量目录而不是跨越整个树,不是吗?)。

针对这些竞争压力进行优化将取决于了解典型的使用模式是什么样的,这意味着您可能必须首先猜测。

但为了方便在屏幕上显示,我建议每个目录包含几个以上且少于一百个条目。然后您可以收集统计数据并从那里进行调整。

Things that come to mind:

Pro "fewer folders"

  • Every folder to be navigated means another click for the user, and another lag while the page loads.
  • If the user is going to navigate all of (or a large portion of the tree) all those extra files are just that many more bytes to be sent. This is trivial compared to the total unless you takes the "many folders" strategy to an extreme, but it suggests there is bound somewhere.

Pro "more folders":

  • Long lists of directory contents will force the user to scroll, or type-ahead, or otherwise interact to find a particular file instead of just selecting it ecause they can take in the page at a glance.
  • A user clicking into folder Foo has to wait for all the items in that directory to be loaded before the page will finish rendering. This can be notable lag and a lot of byte for a user wanting only one image.
  • Every access of a item in a directory involves some time. On old fashioned file systems this was often a O(n) operation. Newer file systems support O(ln(n)) access. how this effects the optimum operation of your system depends on the performance of the file system you plan to use. Also take note of the usual use case (which I presume is looking at a small number of directories rather than spanning the whole tree, no?).

Optimizing against these competing pressures will depend on knowing what a typical use pattern looks like, which means you may have to guess initially.

But just for convenient display on the screen I'd suggest more than a handful and fewer than a hundred entries per directory. Then you can collect statistics and adjust from there.

°如果伤别离去 2024-10-04 16:13:11

@dmckee
无需点击,因为图像全部自动加载。想想地图软件。

@布莱恩·阿格纽
它将在某种 Linux 云上运行/提供服务。不管怎么想,我都不是一个 IT 人,只是一个程序员。但它肯定会扩展到一堆机器上。

@Onkelborg
我同意。我的倾向是使用更多的文件夹和更少的文件。我认为布局会类似于...

set/zoom-level/column/row.jpg

我想使用文件名/目录结构来提取文件而不查询服务器。如果我们放大五倍,并且该较大图像的左上角坐标为 25,600 x 15,360,给定一个 256 像素的方形图块,一些基本数学将给出以下 URL:

2389/5/20/12.jpg

其中“2389”是图块集 ID。所以你可以看到图像只会存储在三层深度的目录中。包含图像的目录可能会根据缩放级别保存 4 - ~100 张图像。或者也许是十几个到几百个(文件夹稍微少一些),如果这样的话...

set/zoom-level/row/column.jpg

我遇到了一个类似的系统,它使用类似的四叉树系统,并注意到它们必须在奇怪的、非系统性的地方分解成新的文件夹,这让我认为他们这样做是为了性能问题或其他限制。

当我写这篇文章时,我想我意识到第一种布局可能是正确的选择。需要迭代查找请求的文件的项目较少。我只是在考虑碎片化,但我想这将是 IT 的工作。 ;)

@dmckee
No clicks, as the images all load automatically. Think mapping software.

@Brian Agnew
It will run/served on some sort of Linux cloud thing. I'm not an IT guy by any stretch of the imagination, just the programmer. But it will definitely be scaled out to a bunch of machines.

@Onkelborg
I concur. My inclination has been to go with more folders and less files, as well. I'm thinking the layout would be something like...

set/zoom-level/column/row.jpg

I wanted to use filename/directory structure to pull files without querying a server. If we're zoomed in by a factor of five and the top left coordinate is 25,600 x 15,360 of this larger image, given a 256 pixel square tile, some basic math would give me this URL:

2389/5/20/12.jpg

Where "2389" is a tile-set ID. So you can see images would only be stored in directories three levels deep. The directories with images would hold maybe 4 - ~100 images based on zoom level. Or maybe a dozen to a few hundred (with slightly less folders), if went this way...

set/zoom-level/row/column.jpg

I came across a similar system that used a similar quad tree system and notice that they had to break out into new folders at odd, non-systemic spots that made me think they did it for performance issues or other limitations.

As I've written this, I think I'm realize that the first layout is probably the way to go. It's less items to iterate through to find requested file. I'm just thinking of fragmentation, but I guess that will be IT's job. ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文