“不要再向我显示我之前看到的内容”的数据库模式在mysql中

发布于 2024-12-06 12:27:09 字数 391 浏览 0 评论 0原文

我正在制作一个向您显示图像的网站。该网站的特色是“不要再向我展示我之前看到的内容”。这意味着,如果您看到图像,它将进入您的“存档”类别。将会有很多图像和类别。我需要非常平滑的数据库架构来实现性能。

当您单击图像时,它会出现在灯箱上,并且在灯箱代码中,它会使用 ajax 发送请求,以便为您存档该图像。

上述数据库模式对于大约 5,000 个图像和 20,000 个用户来说性能是否良好?

users
  user_id
  user_email

pictures
  picture_id
  picture_url
  tags

archived
  user_id
  picture_id

图像将出现在您的面前,但此架构上的所有图像中为您存档的图像除外...

I'm making a website that shows you images. And special feature of site is "don't show me again what I saw before". It means, if you see a image, it goes to your "archive" category. There will be so many images and categories. And I need to very smooth schema of database to perfomance.

When you click a image, it appears on lightbox and in the lightbox code it sends request with ajax to make this image archived just for you.

Is that database schema above performanceful for about 5.000 images and 20.000 users?

users
  user_id
  user_email

pictures
  picture_id
  picture_url
  tags

archived
  user_id
  picture_id

images will appear on front of you with excepting archived images for you from all images on this schema...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

旧时光的容颜 2024-12-13 12:27:09

在不了解所有细节的情况下,这是一个很难回答的问题。您提到会有多少用户和图像。每个用户(平均)在其存档列表中拥有多少张图像?如果该数字很小,则归档表不会接近 100M 行。

100M 行本身应该不是问题,因为数据库可以处理这个问题。关心的问题可能(或将)与您想要查询数据的方式有关。类似于:

SELECT 
  * 
FROM 
  picture
WHERE 
  picture_id NOT IN 
  (
     SELECT picture_id FROM archived WHERE user_id = [userIdParameter]
  )

对于 100M 行,这可能不会表现得很好。

另一种选择是交叉连接用户和图片,以便存档表始终包含笛卡尔积。所以表将是:

archived
  user_id
  picture_id
  visited 

然后您可以像这样查询:

SELECT
  p.*
FROM
  picture p
  INNER JOIN archived a ON p.picture_id = a.picture_id
WHERE 
  a.user_id = [userIdParameter]
  AND a.visited = [false]

通过适当的索引,这应该可以接受,但是会出现必须确保每次将用户或图片添加到系统时都在存档表中创建行的问题。这也意味着您总是拥有等于图片*用户数的行数(在您的示例中为100M)。对于您的情况来说,这可能是不可取的。

最重要的是,您将必须创建一些接近预期量的测试数据,并进行一些接近您的负载的性能测试。如果您认为这是系统的关键潜在性能瓶颈,那么值得投入时间。

This is a diificult question to answer without knowing all the details. You mention how many users and images there will be. How many images will each user (on average) have in their archived list? If that number is small, the archived table won't approach 100M rows.

100M rows should not be a problem by itself, as the database can handle this. The concern may (or will) be with the way you are going to want to query the data. Something like:

SELECT 
  * 
FROM 
  picture
WHERE 
  picture_id NOT IN 
  (
     SELECT picture_id FROM archived WHERE user_id = [userIdParameter]
  )

That will likely not perform very well with 100M rows.

Another option would be to cross join users and pictures so that the archived table always contains a Cartesian product. So the table would be:

archived
  user_id
  picture_id
  visited 

Then you could query like so:

SELECT
  p.*
FROM
  picture p
  INNER JOIN archived a ON p.picture_id = a.picture_id
WHERE 
  a.user_id = [userIdParameter]
  AND a.visited = [false]

This should perform acceptably with proper indexing, but would present the problem of having to make sure rows are created in the archived table any time a user or picture is added to the system. It also means you would always have a number of rows equal to pictures * users (100M in your example). That may not be desirable in your case.

Bottom line, you are going to have to create some test data that approximates your expected volume and do some performance testing that approximates your load. If you think this is the critical potential performance bottleneck for your system, it will be worth the time investment.

苏佲洛 2024-12-13 12:27:09

我使用“NOT IN”解决方案一段时间,并且开始出现性能问题。因为我没有强大的服务器来执行包含大量数据的查询。

所以,我找到了最有效的答案:“Collection Shuffle”

我正在使用用户 ID 种子对集合进行洗牌,并仅保存用户最后的图像索引 ID。用户回来后,​​查找该用户的索引 id 最后离开的位置,显示他集合中的下一个 id。

这是非常轻且精确的解决方案。谢谢大家:)

I used "NOT IN" solution for a while and there is performance problems started. Because I don't have a strong server to execute that query with lot of datas.

So, I found the most performanceful answer : "Collection Shuffle"

I'm shuffleing the collection with a userid seed and saving just users last image index id. After user comes back, looking to where this user's index id left lastly, showing next id from his collection.

This is really light and exactly solution. Thanks for everyone :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文