用于在 MySQL 中获取随机行的无缝辅助 ID

发布于 2024-11-06 09:14:25 字数 1361 浏览 0 评论 0原文

在我从事的几个项目中，我遇到过需要从大型（> 1M 行）表中获取随机行的情况。对于这么大的表，ORDER BY rand() LIMIT 1 是不可行的，因为它会很快使数据库崩溃。

通常的解决方案是生成 MIN(id) 和 MAX(id) 之间的随机数，然后直接选择该行。但是，如果 id 序列中存在很大的间隙，则需要进行大量重新滚动或使用 WHERE id >= :myrandomnumber 这将导致在较大间隙之后的行获得比平均的。

我一直在考虑通过创建一个仅用于随机化目的的新索引列（例如 id2）来解决此问题。该列始终是 1 和表中行数之间的无间隙序列。

问题：保持该序列无间隙的最佳方法是什么？

我想到的第一个解决方案是创建一个辅助表 recycled_ids，其中包含列 tablename 和 id2。每当从 tablename 中删除一行时，该行的 id2 就会插入到 recycled_ids 中。插入新行时，会从 recycled_ids 中选择 id2，或者如果没有可用的，则会创建一个新行。有没有更简单的方法？

额外问题：是否有 ORM 或框架已经做到了这一点，或者具有其他有效的随机行选择？这是现有的模式吗？它有名称吗？

更新：我为此编写了一个快速基准测试，并针对一个包含 125,000 行且行间有 30,000 个间隙的表运行它。结果非常有希望：

Fetch a random row 100 times using id2: 0.0234689712524 seconds
Fetch a random row 100 times using ORDER BY rand() LIMIT 1: 54.992347002 seconds

插入测试数据时，我每插入五行就删除一行随机行。该序列始终保持无间隙。

for($i=1; $i<=$amount; $i++) {
    insert_row();
    if($i % 5 == 0)
        delete_random_row();
}

在我的低端虚拟服务器上，使用 $amount = 10000 再次运行该循环需要 9 秒。每行 0.009 秒，其中包括每五次迭代删除一个随机行。随着表的增长，它确实会变慢，但获取随机行却不会。

我原来的问题仍然适用。

原文

In several projects I've been working on I have encountered the need to fetch random rows from large (>1M rows) tables. With tables this large, ORDER BY rand() LIMIT 1 is no option as it will quickly bring the database to it's knees.

The usual solution has been to generat a random number between MIN(id) and MAX(id) and select that row directly. However, if there are big gaps in the id sequence this will require either lots of re-rolls or using WHERE id >= :myrandomnumber which will lead to rows that succeed large gaps getting significantly more hits than average.

I've been thinking to solve this problem by creating a new indexed column solely for randomizing purposes, say id2. This column would always be a gapless sequence between 1 and the number of rows in the table.

Question: What would be the best way to keep this sequence gapless?

The first solution that comes to mind is creating a helper table recycled_ids that will contain columns tablename and id2. Whenever a row is deleted from tablename, the id2 of that row is inserted to recycled_ids. When new rows are inserted, the id2 is either selected from recycled_ids or if none are available, a new one is created. Is there a simpler way?

Bonus questions: Are there ORMs or frameworks that already do this or have otherwise efficient random row selection? Is this an existing pattern, does it have a name?

Update: I wrote a quick benchmark for this and ran it against a table with 125,000 rows and 30,000 gaps between them. The results are pretty promising:

Fetch a random row 100 times using id2: 0.0234689712524 seconds
Fetch a random row 100 times using ORDER BY rand() LIMIT 1: 54.992347002 seconds

When inserting the test data, I removed one random row for every five rows inserted. The sequence stays gapless the whole time.

for($i=1; $i<=$amount; $i++) {
    insert_row();
    if($i % 5 == 0)
        delete_random_row();
}

Running that loop again with $amount = 10000 takes 9 seconds on my low-end vserver. That's 0.009 seconds per row and it includes deleting a random row every five iterations. It does get slower as the table grows, but fetching a random row does not.

My original questions still apply.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟酉 2024-11-13 09:14:25

这是我的做法 -

从表中选择 MAX(id)
在 PHP（或您使用的任何语言）中，生成 1 和 MAX(id) 之间的随机整数
SELECT * FROM table WHERE id > ;= $random ORDER BY id ASC LIMIT 1
如果 3 没有返回任何内容，SELECT * FROM table WHERE id $random ORDER BY id DESC LIMIT 1

避免运行任何极其缓慢的查询。它还避免了额外的列，保持无间隙确实是一项令人讨厌的工作！

回复收藏 0 原文

追我者格杀勿论 2024-11-13 09:14:25

我想说的是排名可以拯救。

SET @rank:= 1;  

SELECT * FROM
  (
  SELECT @rank:= @rank + 1 as rank, * FROM table1  
  ) s
WHERE s.rank = $random;

Ranking to the rescue I'd say.

SET @rank:= 1;  

SELECT * FROM
  (
  SELECT @rank:= @rank + 1 as rank, * FROM table1  
  ) s
WHERE s.rank = $random;

回复收藏 0 原文

~没有更多了~

关于作者

陈年往事

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

用于在 MySQL 中获取随机行的无缝辅助 ID

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

用于在 MySQL 中获取随机行的无缝辅助 ID

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。