Linux 上最适合用户私人文件夹的文件夹分布是什么?

发布于 2024-07-11 07:43:20 字数 349 浏览 4 评论 0原文

在我们的网站中,用户可以拥有许多私人文件。 我们正在考虑什么是最好的发行版,以避免破坏服务器的性能。这些文件通过 Apache 提供服务,每次用户需要管理它们时都应该列出它们。

我们现在的第一个方法是:

var first_level = (int) $user_id/100;
var files_folder = /uf/$first_level/$user_id

这为我们提供了 100 个文件夹的第一级和许多第二级文件夹。 由于并非所有用户都有文件,而且目前我们的用户数量约为 8 万,这意味着每个二级文件夹大约有 800 个文件夹。

您对这种方法有何看法?

In our site, users can have many private files. We are thinking what could be the best distribution so as to avoid destroying the server's performance, These files are served through Apache and should be listed each time the user needs to manage them.

Our first approach right now is:

var first_level = (int) $user_id/100;
var files_folder = /uf/$first_level/$user_id

This gives us a first level of 100 folders and many second level folders.
Since not all users have files and right now we are at about ~80k users, this means about 800 folders per second level folder.

What do you think about this approach?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

有深☉意 2024-07-18 07:43:20

如果您的用户 ID 值分布相当均匀,并且数量将继续增加,那么您可能应该进一步平衡树。 什么是最好的部分取决于您认为最终的数字。 大目录的搜索速度比小目录慢。 虽然 800 个文件并不算糟糕,但也不是很好。 如果您想坚持使用 2 层并且您有 N 个用户(作为您的目标人群),那么您应该瞄准第一层中的 sqrt(N) 个文件夹,其中每个第二层目录中都有 sqrt(N) 个文件夹。 对于 N = 80,000,这意味着每个级别大约有 300 个文件夹。 如果您想考虑 3 层排列,请将平方根替换为立方根。 您可能还会发现使用模算术可以使分布更平滑。 也就是说,第一级可能更好地计算为:

var first_level = (int) ($user_id % 300);

假设您的未识别语言使用 % 作为其模运算符。

CPAN使用基于3层的系统:第一层是用户登录ID的第一个字母; 第二层是前两个字母,第三层是完整的登录 ID。

我在某处读到一些网站(基于大学的,IIRC)发现名字的第一个和最后一个字母给出了一个很好的系统。

If your user ID values are fairly uniformly distributed and the number will continue to increase, then you should probably balance up the tree a bit more. What's best depends in part on where you think you'll end up in terms of numbers. Big directories are slower to search than small ones. While 800 files is not awful, it isn't great either. If you want to stick with 2 tiers and you have N users (as your target population), then you should aim for sqrt(N) folders in the first tier, with sqrt(N) folders in each second tier directory. For N = 80,000, that means about 300 folders per level. If you want to consider a 3 tier arrangement, replace the square root with the cube root. You might also find that using modulo arithmetic gives you a smoother distribution. That is, the first level might be better calculated as:

var first_level = (int) ($user_id % 300);

Assuming your unidentified language uses % for its modulo operator.

CPAN uses a system based on 3 tiers: first tier is the first letter of the user's login ID; the second tier is the first two letters, and the third tier is the full login ID.

I read somewhere that some site (university-based, IIRC) found that first and last letter of name gave a good system.

┼── 2024-07-18 07:43:20

如果您不关心可读性,则流行的可扩展文件夹命名方案类似于鱿鱼使用的内容: <4-bit>/<8-bit>/,因此对于用户 ID 1,文件夹路径可以是 /c4/ca42/1 。

此时,第一级最多有16个目录,第二级最多有256个目录。

这种方法的一大优点是,文件夹的分布在统计上是均匀的,无论您的用户 ID/用户名中是否有漏洞或簇(较小的用户 ID 往往会因消耗而无法使用。)

A popular scalable folder naming scheme if you don't care about readability is something like that squid uses: <4-bit>/<8-bit>/<remaining-116-bit-of-md5-of-whatever-lookup-key> or <whatever-unique-key-you-have>, so for user-id 1 the folder path can be /c4/ca42/1.

In this case, the first level is up to 16 directory and second level is up to 256 directories.

The big advantage of this approach is that the distribution of the folders are statistically uniform, regardless whether you have holes or clusters in your userids/usernames (smaller user ids tend to get unused due to attrition.)

关于从前 2024-07-18 07:43:20

您没有说明使用什么文件系统来存储文件。 您应该很容易创建一个具有您期望的实际负载特征的随机目录树。 然后您可以运行实验,了解您正在考虑的各种策略的性能。

我无法轻松找到有关哪些文件系统使用高效数据结构(例如大型目录的 B 树)的信息。 我确实发现了 MacOS HFS 的说法。 我会研究 XFS 或其他高性能日志文件系统。

You don't say what filesystem is used to store the files. It should be easy for you to create a random directory tree with the characteristics you expect of your real load. Then you can run experiments that will tell you the performance of the various strategies you are considering.

I couldn't easily find information about which filesystems use efficient data structures like B-trees for large directories. I did find a claim that the MacOS HFS does. I would look into XFS or another high-performance, journaling filesystem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文