在SQL Server中随机选择N条记录,不重复

发布于 2024-11-03 06:42:44 字数 610 浏览 1 评论 0原文

如何一次从表中随机选择 N 条记录,而不重复先前由同一操作返回的记录?

一个明显的解决方案是:

SELECT TOP 5 * FROM
MyTable
WHERE Id NOT IN (SELECT Id FROM PreviouslyReturned)
ORDER BY newid()

但是随着 MyTable 开始增长,这不是真的效率低下吗?

我有一长串记录,对于回合制游戏,我一次需要五个记录,而不重复已为给定游戏提取的任何记录。因为我知道大约会发生多少回合,所以我可以在游戏开始前选择一个非常大的随机样本,但我宁愿它是“动态的”。我发现这个问题,它使用 MySQL 的随机种子。

最终会有如此多的记录,重复不会成为问题(记录>>N),但在那之前,我需要记录是唯一的。顺便说一句,我使用 Fluent NHibernate 作为我的持久层;也许 NHibernate 有一些功能可以实现这一点。

How do I select N random records from a table at a time without repetition of records previously returned by the same operation?

An obvious solution is:

SELECT TOP 5 * FROM
MyTable
WHERE Id NOT IN (SELECT Id FROM PreviouslyReturned)
ORDER BY newid()

But wouldn't that be really inefficient as MyTable starts to grow?

I have a long list of records and I require five records at a time for a turn-based game without repeating any of the records already pulled for the given game. Since I know approximately how many turns will take place, I could select a random sample before the game starts that is significantly large, but I would rather it be "dynamic". I found this question, which uses a random seed with MySQL.

Eventually there will be so many records that repetition won't be an issue (records >> N), but until then, I need records to be unique. On a sidenote, I use Fluent NHibernate for my persistence layer; perhaps NHibernate has some feature which allows this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一世旳自豪 2024-11-10 06:42:44

不重复任何记录

这不叫程序员之间选择随机记录。随机选择的值重复的频率比您想象的要高,事实上,计算重复的次数和长度是统计学家检测作弊的一种方法。

您正在寻找的称为随机播放。洗牌使一组有限事物(例如卡片或钥匙)的顺序随机化。 (随机化行的顺序并不意味着选择随机行。)

在您的情况下,计划存储每个用户已使用的密钥集。随机选择一组尚未包含在该组中的行。有多种方法可以存储每个随机行集(键);确保您可以分辨出哪一组是最后组或当前组。

without repeating any of the records

That's not called selecting random records among programmers. Values selected randomly repeat more often than you think, and in fact counting the number and length of repeats is one way statisticians detect cheating.

What you're looking for is called a shuffle. Shuffling randomizes the order of a finite set of things, like cards or keys. (Randomizing the order of rows doesn't mean the same thing as selecting random rows.)

In your case, plan to store the set of keys already used for each user. Select a random set of rows that aren't already in that set. There are several ways to store each random set of rows (keys); make sure you can tell which one is the last or current set.

自由如风 2024-11-10 06:42:44

您可以将表的所有主键存储在第二个表中,从该表中随机选择(从原始表中检索关联的行)并在选择后删除它们(当然是从辅助表中)。

我希望这种方式比存储已使用的键并构造 WHERE NOR IN (分别 EXCEPT)子句更有效(删除应该与插入大致相同) ,并且选择应该明显更快,无需附加子句)。但当然,这必须得到证明(通过分析......)

使用 问题您引用的。

You could just store all primary keys of your table in a second table, select randomly from this one (retrieving the associated rows from the original table) and delete them after selection (from the auxiliary table, of course).

I would expect this way to be more efficient than storing the keys already used and constructing a WHERE NOR IN (resp. EXCEPT) clause (deleting should be approximately as performant as inserting, and selecting should work significantely faster without an additional clause). But of course this would have to be proved (by profiling ...)

Selecting five random ids should work fine using the approach in the question you've cited.

恋竹姑娘 2024-11-10 06:42:44

尝试

SELECT TOP 5 *
FROM YOUR_TABLE
ORDER BY CHECKSUM(NEWID())

这里有一个相关的问题详细介绍了随机洗牌选择的语义:
随机选择并不总是返回单行< /a>

SQL Server 对每个查询计算 RAND 一次,这意味着 mySQL 技巧无论如何都不起作用。

编辑:这也足够了

SELECT TOP 5 *
FROM YOUR_TABLE
ORDER BY NEWID()

我阅读了您更新的问题并有另一个建议:
创建索引视图

SELECT mt.*
FROM MyTable mt
LEFT JOIN PreviouslyReturned pr ON mt.Id = pr.Id
WHERE pr.Id Is NULL

或类似的东西

然后

SELECT TOP 5 *
FROM YOUR_INDEXED_VIEW
ORDER BY NEWID()

Try

SELECT TOP 5 *
FROM YOUR_TABLE
ORDER BY CHECKSUM(NEWID())

There's a related question that goes into detail on the semantics of random shuffle selection here :
Random select is not always returning a single row

SQL Server evaluates RAND once per query, which means that the mySQL trick will not work anyway.

Edit: this is also adequate

SELECT TOP 5 *
FROM YOUR_TABLE
ORDER BY NEWID()

I read your updated question and have another suggestion:
Create an indexed view on

SELECT mt.*
FROM MyTable mt
LEFT JOIN PreviouslyReturned pr ON mt.Id = pr.Id
WHERE pr.Id Is NULL

or something similar

Then

SELECT TOP 5 *
FROM YOUR_INDEXED_VIEW
ORDER BY NEWID()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文