“按 newid() 排序” - 它是如何运作的?

发布于 2024-10-17 01:23:07 字数 245 浏览 3 评论 0原文

我知道如果我运行此查询,

select top 100 * from mytable order by newid()

它将从我的表中获取 100 条随机记录。

但是,我对它的工作原理有点困惑,因为我在 select 列表中没有看到 newid() 。有人可以解释一下吗?这里的 newid() 有什么特别之处吗?

I know that If I run this query

select top 100 * from mytable order by newid()

it will get 100 random records from my table.

However, I'm a bit confused as to how it works, since I don't see newid() in the select list. Can someone explain? Is there something special about newid() here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

意中人 2024-10-24 01:23:07

我知道 NewID() 的作用,我只是
试图了解它有何帮助
在随机选择中。难道是那个
(1) select语句会选择
mytable 中的所有内容,每个 (2)
选定行,添加
NewID() 生成的唯一标识符,
(3) 按此对行进行排序
uniqueidentifier 和 (4) 选择
排序列表中的前 100 名?

是的。这几乎是完全正确的(除了它不一定需要对所有行进行排序)。您可以通过查看实际的执行计划来验证这一点。

SELECT TOP 100 * 
FROM master..spt_values 
ORDER BY NEWID()

计算标量运算符为每行添加 NEWID() 列(在我的示例查询中,表中为 2506),然后表中的行按此列排序,并选择前 100 行。

SQL Server 实际上并不需要从位置 100 开始对整个集合进行排序,因此它使用 TOP N 排序运算符尝试在内存中执行整个排序操作 (对于 N 的小值)

计划

I know what NewID() does, I'm just
trying to understand how it would help
in the random selection. Is it that
(1) the select statement will select
EVERYTHING from mytable, (2) for each
row selected, tack on a
uniqueidentifier generated by NewID(),
(3) sort the rows by this
uniqueidentifier and (4) pick off the
top 100 from the sorted list?

Yes. this is pretty much exactly correct (except it doesn't necessarily need to sort all the rows). You can verify this by looking at the actual execution plan.

SELECT TOP 100 * 
FROM master..spt_values 
ORDER BY NEWID()

The compute scalar operator adds the NEWID() column on for each row (2506 in the table in my example query) then the rows in the table are sorted by this column with the top 100 selected.

SQL Server doesn't actually need to sort the entire set from positions 100 down so it uses a TOP N sort operator which attempts to perform the entire sort operation in memory (for small values of N)

Plan

薯片软お妹 2024-10-24 01:23:07

一般来说,它的工作原理如下:

  • mytable中的所有行都是“循环”的,
  • 对每一行执行NEWID()行
  • 根据NEWID()中的随机数进行排序
  • 选择100第一行

In general it works like this:

  • All rows from mytable is "looped"
  • NEWID() is executed for each row
  • The rows are sorted according to random number from NEWID()
  • 100 first row are selected
红衣飘飘貌似仙 2024-10-24 01:23:07

正如 MSDN 所说:

NewID() 创建类型的唯一值
唯一标识符。

您的表格将按此随机值排序。

as MSDN says:

NewID() Creates a unique value of type
uniqueidentifier.

and your table will be sorted by this random values.

ヅ她的身影、若隐若现 2024-10-24 01:23:07

我有一个不重要的查询,它使用 newId() 并连接许多表。它在大约 3 秒内返回大约 10k 行。因此,在性能不太差的情况下, newId() 可能没问题。并没有产生很大的影响。但是,newId() 对于大型表来说是不好的。

以下是 Brent Ozar 博客的解释 - https: //www.brentozar.com/archive/2018/03/get-random-row-large-table/

从上面的链接中,我总结了可用于生成随机 ID 的方法。您可以阅读博客了解更多详细信息。

从大表中获取随机行的 4 种方法:

  1. 方法 1,不好:ORDER BY NEWID() >表现不好!
  2. 方法2,更好但奇怪:TABLESAMPLE >许多陷阱&不是真的
    随机的!
  3. 方法3,最好但需要代码:随机主键>
    最快,但不适用于负数。
  4. 方法4、OFFSET-FETCH(2012+)>仅适用于集群
    指数。

有关方法 3 的更多信息:
获取表中顶部的 ID 字段,生成一个随机数,然后查找该 ID。对于前 N 行,调用下面的代码 N 次或生成 N 个随机数并在 IN 子句中使用。

/* Get a random number smaller than the table's top ID */
DECLARE @rand BIGINT;
DECLARE @maxid INT = (SELECT MAX(Id) FROM dbo.Users);
SELECT @rand = ABS((CHECKSUM(NEWID()))) % @maxid;

/* Get the first row around that ID */
SELECT TOP 1 *
FROM dbo.Users AS u
WHERE u.Id >= @rand;

I have an unimportant query which uses newId() and joins many tables. It returns about 10k rows in about 3 seconds. So, newId() might be ok in such cases where performance is not too bad & does not have a huge impact. But, newId() is bad for large tables.

Here is the explanation from Brent Ozar's blog - https://www.brentozar.com/archive/2018/03/get-random-row-large-table/.

From the above link, I have summarized the methods which you can use to generate a random id. You can read the blog for more details.

4 ways to get a random row from a large table:

  1. Method 1, Bad: ORDER BY NEWID() > Bad performance!
  2. Method 2, Better but Strange: TABLESAMPLE > Many gotchas & is not really
    random!
  3. Method 3, Best but Requires Code: Random Primary Key >
    Fastest, but won't work for negative numbers.
  4. Method 4, OFFSET-FETCH (2012+) > Only performs properly with a clustered
    index.

More on method 3:
Get the top ID field in the table, generate a random number, and look for that ID. For top N rows, call the code below N times or generate N random numbers and use in an IN clause.

/* Get a random number smaller than the table's top ID */
DECLARE @rand BIGINT;
DECLARE @maxid INT = (SELECT MAX(Id) FROM dbo.Users);
SELECT @rand = ABS((CHECKSUM(NEWID()))) % @maxid;

/* Get the first row around that ID */
SELECT TOP 1 *
FROM dbo.Users AS u
WHERE u.Id >= @rand;
红颜悴 2024-10-24 01:23:07

使用 select top 100 randid = newid(), * from mytable order by randid
那么你就会被澄清..

use select top 100 randid = newid(), * from mytable order by randid
you will be clarified then..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文