涉及 RAND(...NEWID()) 的 INNER JOIN 返回意外的行数
我想生成一些具有随机值的样本数据。
我有一个 5 行的迷你表,id 从 1 到 5,每行都有一些文本。
然后我想生成 65536 行 - 第一列每行的值为 1,第二列是 1 到 5 之间的随机数,没有 NULL 值。
然后我想加入这两个表。使用 ROW_NUMBER() % 5
方法 INNER/OUTER JOIN
返回 65536 行。
我想使用由 NEWID
作为种子的 RAND
,而不是这个伪随机列。
正如我所怀疑的,LEFT JOIN
返回 65536 行,但 INNER JOIN
为每次调用返回不同的行数。
当LFINAL
表具体化为临时表时,INNER JOIN
起作用并也返回65536行。
有人可以向我解释为什么 INNER JOIN
没有按我的预期返回 65536 行吗?
WITH Names AS
(
SELECT id, row_name
FROM
(SELECT 1, 'Row 1' UNION ALL
SELECT 2, 'Row 2' UNION ALL
SELECT 3, 'Row 3' UNION ALL
SELECT 4, 'Row 4' UNION ALL
SELECT 5, 'Row 5') AS D (id, row_name)
),
L0 AS
(
SELECT c
FROM
(SELECT 1 UNION ALL SELECT 1) AS D(c)
), --2^1
L1 AS
(
SELECT 1 AS c
FROM L0 AS A
CROSS JOIN L0 AS B
), --2^2
L2 AS
(
SELECT 1 AS c
FROM L1 AS A
CROSS JOIN L1 AS B
), --2^4
L3 AS
(
SELECT 1 AS c
FROM L2 AS A
CROSS JOIN L2 AS B
), --2^8
L4 AS
(
SELECT 1 AS c
FROM L3 AS A
CROSS JOIN L3 AS B
), --2^16 = 65536
LFINAL AS
(
SELECT
c,
--ROW_NUMBER() OVER (ORDER BY c) % 5 + 1 AS rnd FROM L4)
FLOOR(RAND(CONVERT(VARBINARY, NEWID()))*5) + 1 AS rnd
FROM
L4
)
SELECT *
FROM LFINAL lf
LEFT JOIN Names n ON n.id = lf.rnd
I want to generate some sample data with random values.
I have a mini table with 5 rows, ids from 1 to 5 with some text for every row.
Then I want to generate 65536 rows - first column has value 1 for every row, second column is random number between 1 and 5, without NULL values.
Then I want to join these two tables. With ROW_NUMBER() % 5
approach INNER/OUTER JOIN
returns 65536 rows.
Instead of this pseudo random column I want to use RAND
seeded by NEWID
.
LEFT JOIN
returns 65536 rows as I suspected, but INNER JOIN
returns different row count for every call.
When LFINAL
table is materialized into temp table, then INNER JOIN
works and returns 65536 rows too.
Can somebody explain to me why INNER JOIN
does not return 65536 rows as I expected?
WITH Names AS
(
SELECT id, row_name
FROM
(SELECT 1, 'Row 1' UNION ALL
SELECT 2, 'Row 2' UNION ALL
SELECT 3, 'Row 3' UNION ALL
SELECT 4, 'Row 4' UNION ALL
SELECT 5, 'Row 5') AS D (id, row_name)
),
L0 AS
(
SELECT c
FROM
(SELECT 1 UNION ALL SELECT 1) AS D(c)
), --2^1
L1 AS
(
SELECT 1 AS c
FROM L0 AS A
CROSS JOIN L0 AS B
), --2^2
L2 AS
(
SELECT 1 AS c
FROM L1 AS A
CROSS JOIN L1 AS B
), --2^4
L3 AS
(
SELECT 1 AS c
FROM L2 AS A
CROSS JOIN L2 AS B
), --2^8
L4 AS
(
SELECT 1 AS c
FROM L3 AS A
CROSS JOIN L3 AS B
), --2^16 = 65536
LFINAL AS
(
SELECT
c,
--ROW_NUMBER() OVER (ORDER BY c) % 5 + 1 AS rnd FROM L4)
FLOOR(RAND(CONVERT(VARBINARY, NEWID()))*5) + 1 AS rnd
FROM
L4
)
SELECT *
FROM LFINAL lf
LEFT JOIN Names n ON n.id = lf.rnd
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我能够重现该问题。 这应该是预期的行为,因为:
newid()
不是每行生成的,而是每次调用生成的测试时,SQL Server 似乎在幕后执行以下操作:
请注意,327680 行集中的每一行都有 20% 的概率与过滤器匹配。您将总共获得大约 65536 行,但不完全那么多。
我建议将随机数插入临时表(变量)中,以便数字具体化,然后连接。
I was able to reproduce the problem. This should be the expected behavior because:
newid()
is not generated per row, it is generated per invocationWhen tested, SQL server seems to do the following behind the scenes:
Notice that each row from the set of 327680 rows has 20% probability of matching the filter. You will get around 65536 rows in total but not exactly that many.
I would suggest inserting the random numbers into a temporary table (variable) so that the numbers are materialized, then join.