当 LAMP 服务器上有数百万用户时,存储和获取图像的最快、最有效的方法是什么?

发布于 2024-11-27 07:07:28 字数 2405 浏览 2 评论 0原文

这是迄今为止我想出的最好方法,我想知道是否有更好的方法(我确信有!)来存储和获取数百万个用户图像:

为了保持目录大小为了避免对数据库进行任何额外的调用,我使用根据用户的唯一 ID 计算的嵌套目录,如下所示:

$firstDir = './images';
$secondDir = floor($userID / 100000);
$thirdDir = floor(substr($id, -5, 5) / 100);
$fourthDir = $userID;
$imgLocation = "$firstDir/$secondDir/$thirdDir/$fourthDir/1.jpg";

用户 ID ($userID) 的范围从 1 到数百万。

例如,如果我有用户 ID 7654321,则该用户的第一张图片将存储在:

./images/76/543/7654321/1.jpg

对于用户 ID 654321

./images/6/543/654321/1.jpg

对于用户 ID 54321 对于用户 ID 4321则为

./images/0/543/54321/1.jpg

./images/0/43/4321/1.jpg

对于用户 ID 321 则为:

./images/0/3/321/1.jpg

对于用户 ID 21 则为:

./images/0/0/21/1.jpg

对于用户 ID 1 将会是:

./images/0/0/1/1.jpg

这确保了在拥有多达 100,000,000 个用户的情况下,我永远不会有一个包含超过 1,000 个子目录的目录,因此它似乎保持了干净和高效。

我将此方法与以下“哈希”方法进行了基准测试,该方法使用 PHP 中可用的最快哈希方法 (crc32)。此“散列”方法将第二个目录计算为用户 ID 散列中的前 3 个字符,将第三个目录计算为后 3 个字符,以便随机但均匀地分布文件,如下所示:

$hash = crc32($userID);
$firstDir = './images';
$secondDir = substr($hash,0,3);
$thirdDir = substr($hash,3,3);
$fourthDir = $userID;
$imgLocation = "$firstDir/$secondDir/$thirdDir/$fourthDir/1.jpg";

但是,此“散列”方法是比我前面描述的方法慢,所以它不好。

然后,我更进一步,在原始示例中找到了一种更快的计算第三目录的方法 (floor(substr($userID, -5, 5) / 100);),如下所示

$thirdDir = floor(substr($userID, -5, 3));

: ,这改变了前 10,000 个用户 ID 的存储方式/位置,使得某些第三目录具有 1 个用户子目录或 111 而不是 100,但它的优点是速度更快,因为我们没有除以100,所以我认为从长远来看这是值得的。

定义目录结构后,我计划如何存储实际的单个图像:例如,如果用户上传第二张图片,它将与第一张图片放在同一目录中,但名称为 2.jpg。用户的默认图片始终是 1.jpg,因此,如果他们决定将第二张图片设为默认图片,2.jpg 将被重命名为 1.jpg1.jpg 将更名为 2.jpg

最后但并非最不重要的一点是,如果我需要存储同一图像的多个尺寸,我会按如下方式存储用户 ID 1(例如):

1024px:

./images/0/0/1/1024/1.jpg
./images/0/0/1/1024/2.jpg

640px:

./images/0/0/1/640/1.jpg
./images/0/0/1/640/2.jpg

就是这样。

那么,这种方法有没有缺陷呢?如果有的话,您能指出来吗?

有更好的方法吗?如果是这样,您能描述一下吗?

在开始实现这一点之前,我想确保我有最好、最快、最有效的方法来存储和检索图像,这样我就不必再次更改它。

谢谢!

Here is the best method I have come up with so far and I would like to know if there is an even better method (I'm sure there is!) for storing and fetching millions of user images:

In order to keep the directory sizes down and avoid having to make any additional calls to the DB, I am using nested directories that are calculated based on the User's unique ID as follows:

$firstDir = './images';
$secondDir = floor($userID / 100000);
$thirdDir = floor(substr($id, -5, 5) / 100);
$fourthDir = $userID;
$imgLocation = "$firstDir/$secondDir/$thirdDir/$fourthDir/1.jpg";

User ID's ($userID) range from 1 to the millions.

So if I have User ID 7654321, for example, that user's first pic will be stored in:

./images/76/543/7654321/1.jpg

For User ID 654321:

./images/6/543/654321/1.jpg

For User ID 54321 it would be:

./images/0/543/54321/1.jpg

For User ID 4321 it would be:

./images/0/43/4321/1.jpg

For User ID 321 it would be:

./images/0/3/321/1.jpg

For User ID 21 it would be:

./images/0/0/21/1.jpg

For User ID 1 it would be:

./images/0/0/1/1.jpg

This ensures that with up to 100,000,000 users, I will never have a directory with more than 1,000 sub-directories, so it seems to keep things clean and efficient.

I benchmarked this method against using the following "hash" method that uses the fastest hash method available in PHP (crc32). This "hash" method calculates the Second Directory as the first 3 characters in the hash of the User ID and the Third Directory as the next 3 character in order to distribute the files randomly but evenly as follows:

$hash = crc32($userID);
$firstDir = './images';
$secondDir = substr($hash,0,3);
$thirdDir = substr($hash,3,3);
$fourthDir = $userID;
$imgLocation = "$firstDir/$secondDir/$thirdDir/$fourthDir/1.jpg";

However, this "hash" method is slower than the method I described earlier above, so it's no good.

I then went one step further and found an even faster method of calculating the Third Directory in my original example (floor(substr($userID, -5, 5) / 100);) as follows:

$thirdDir = floor(substr($userID, -5, 3));

Now, this changes how/where the first 10,000 User ID's are stored, making some third directories have either 1 user sub-directory or 111 instead of 100, but it has the advantage of being faster since we do not have to divide by 100, so I think it is worth it in the long-run.

Once the directory structure is defined, here is how I plan on storing the actual individual images: if a user uploads a 2nd pic, for example, it would go in the same directory as their first pic, but it would be named 2.jpg. The default pic of the user would always just be 1.jpg, so if they decide to make their 2nd pic the default pic, 2.jpg would be renamed to 1.jpg and 1.jpg would be renamed 2.jpg.

Last but not least, if I needed to store multiple sizes of the same image, I would store them as follows for User ID 1 (for example):

1024px:

./images/0/0/1/1024/1.jpg
./images/0/0/1/1024/2.jpg

640px:

./images/0/0/1/640/1.jpg
./images/0/0/1/640/2.jpg

That's about it.

So, are there any flaws with this method? If so, could you please point them out?

Is there a better method? If so, could you please describe it?

Before I embark on implementing this, I want to make sure I have the best, fastest, and most efficient method for storing and retrieving images so that I don't have to change it again.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

山人契 2024-12-04 07:07:28

不要关心计算路径时的微小速度差异,这并不重要。重要的是图像在目录中的分布有多好和均匀,生成的路径有多短,推断命名约定有多难(让我们将 1.jpg 替换为 2.jpg ..哇,它正在工作..) 。

例如,在您的哈希解决方案中,路径完全基于用户 ID,这会将属于一个用户的所有图片放在同一目录中。

使用整个字母表(小写和大写,如果您的 FS 支持),而不仅仅是数字。检查其他软件的功能,检查哈希直接名称的好地方是 google chrome、mozilla...最好使用短目录名称。查找速度更快,在 html 文档中占用的空间更少。

Do not care about the small speed differences of calculting the path, it doesn't matter. What matters is how well and uniformly the images are distributed in the directories, how short is generated the path, how hard is it to deduce the naming convention (lets replace 1.jpg to 2.jpg.. wow, it's working..).

For example in your hash solution the path is entirely based on userid, which will put all pictures belonging to one user to the same directory.

Use the whole alphabet (lower and uppercase, if your FS supports it), not just numbers. Check what other softwares do, a good place to check hashed directy names is google chrome, mozilla, ... It's better to have short directory names. Faster to look up, occupies less space in your html documents.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文