Linux 上最适合用户私人文件夹的文件夹分布是什么？

发布于 2024-07-11 07:43:20 字数 349 浏览 4 评论 0原文

在我们的网站中，用户可以拥有许多私人文件。我们正在考虑什么是最好的发行版，以避免破坏服务器的性能。这些文件通过 Apache 提供服务，每次用户需要管理它们时都应该列出它们。

我们现在的第一个方法是：

var first_level = (int) $user_id/100;
var files_folder = /uf/$first_level/$user_id

这为我们提供了 100 个文件夹的第一级和许多第二级文件夹。由于并非所有用户都有文件，而且目前我们的用户数量约为 8 万，这意味着每个二级文件夹大约有 800 个文件夹。

您对这种方法有何看法？

原文

In our site, users can have many private files. We are thinking what could be the best distribution so as to avoid destroying the server's performance, These files are served through Apache and should be listed each time the user needs to manage them.

Our first approach right now is:

var first_level = (int) $user_id/100;
var files_folder = /uf/$first_level/$user_id

This gives us a first level of 100 folders and many second level folders.
Since not all users have files and right now we are at about ~80k users, this means about 800 folders per second level folder.

What do you think about this approach?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

有深☉意 2024-07-18 07:43:20

如果您的用户 ID 值分布相当均匀，并且数量将继续增加，那么您可能应该进一步平衡树。什么是最好的部分取决于您认为最终的数字。大目录的搜索速度比小目录慢。虽然 800 个文件并不算糟糕，但也不是很好。如果您想坚持使用 2 层并且您有 N 个用户（作为您的目标人群），那么您应该瞄准第一层中的 sqrt(N) 个文件夹，其中每个第二层目录中都有 sqrt(N) 个文件夹。对于 N = 80,000，这意味着每个级别大约有 300 个文件夹。如果您想考虑 3 层排列，请将平方根替换为立方根。您可能还会发现使用模算术可以使分布更平滑。也就是说，第一级可能更好地计算为：

var first_level = (int) ($user_id % 300);

假设您的未识别语言使用 % 作为其模运算符。

CPAN使用基于3层的系统：第一层是用户登录ID的第一个字母；第二层是前两个字母，第三层是完整的登录 ID。

我在某处读到一些网站（基于大学的，IIRC）发现名字的第一个和最后一个字母给出了一个很好的系统。

If your user ID values are fairly uniformly distributed and the number will continue to increase, then you should probably balance up the tree a bit more. What's best depends in part on where you think you'll end up in terms of numbers. Big directories are slower to search than small ones. While 800 files is not awful, it isn't great either. If you want to stick with 2 tiers and you have N users (as your target population), then you should aim for sqrt(N) folders in the first tier, with sqrt(N) folders in each second tier directory. For N = 80,000, that means about 300 folders per level. If you want to consider a 3 tier arrangement, replace the square root with the cube root. You might also find that using modulo arithmetic gives you a smoother distribution. That is, the first level might be better calculated as:

var first_level = (int) ($user_id % 300);

Assuming your unidentified language uses % for its modulo operator.

CPAN uses a system based on 3 tiers: first tier is the first letter of the user's login ID; the second tier is the first two letters, and the third tier is the full login ID.

I read somewhere that some site (university-based, IIRC) found that first and last letter of name gave a good system.

回复收藏 0 原文