涉及 RAND(...NEWID()) 的 INNER JOIN 返回意外的行数

发布于 2025-01-12 06:08:20 字数 1490 浏览 1 评论 0原文

我想生成一些具有随机值的样本数据。

我有一个 5 行的迷你表,id 从 1 到 5,每行都有一些文本。

然后我想生成 65536 行 - 第一列每行的值为 1,第二列是 1 到 5 之间的随机数,没有 NULL 值。

然后我想加入这两个表。使用 ROW_NUMBER() % 5 方法 INNER/OUTER JOIN 返回 65536 行。

我想使用由 NEWID 作为种子的 RAND,而不是这个伪随机列。

正如我所怀疑的,LEFT JOIN 返回 65536 行,但 INNER JOIN 为每次调用返回不同的行数。

LFINAL表具体化为临时表时,INNER JOIN起作用并也返回65536行。

有人可以向我解释为什么 INNER JOIN 没有按我的预期返回 65536 行吗?

WITH Names AS 
(
    SELECT id, row_name 
    FROM 
        (SELECT 1, 'Row 1' UNION ALL 
         SELECT 2, 'Row 2' UNION ALL
         SELECT 3, 'Row 3' UNION ALL
         SELECT 4, 'Row 4' UNION ALL
         SELECT 5, 'Row 5') AS D (id, row_name)
),
L0 AS 
(
    SELECT c 
    FROM 
        (SELECT 1 UNION ALL SELECT 1) AS D(c)
),  --2^1
L1 AS 
(
    SELECT 1 AS c 
    FROM L0 AS A 
    CROSS JOIN L0 AS B
),          --2^2
L2 AS 
(
    SELECT 1 AS c 
    FROM L1 AS A 
    CROSS JOIN L1 AS B
),          --2^4
L3 AS 
(
    SELECT 1 AS c 
    FROM L2 AS A 
    CROSS JOIN L2 AS B
),          --2^8
L4 AS 
(
    SELECT 1 AS c 
    FROM L3 AS A 
    CROSS JOIN L3 AS B
),          --2^16 = 65536
LFINAL AS 
(
    SELECT 
        c, 
        --ROW_NUMBER() OVER (ORDER BY c) % 5 + 1 AS rnd FROM L4)
        FLOOR(RAND(CONVERT(VARBINARY, NEWID()))*5) + 1 AS rnd 
    FROM 
        L4
)
SELECT * 
FROM LFINAL lf
LEFT JOIN Names n ON n.id = lf.rnd

I want to generate some sample data with random values.

I have a mini table with 5 rows, ids from 1 to 5 with some text for every row.

Then I want to generate 65536 rows - first column has value 1 for every row, second column is random number between 1 and 5, without NULL values.

Then I want to join these two tables. With ROW_NUMBER() % 5 approach INNER/OUTER JOIN returns 65536 rows.

Instead of this pseudo random column I want to use RAND seeded by NEWID.

LEFT JOIN returns 65536 rows as I suspected, but INNER JOIN returns different row count for every call.

When LFINAL table is materialized into temp table, then INNER JOIN works and returns 65536 rows too.

Can somebody explain to me why INNER JOIN does not return 65536 rows as I expected?

WITH Names AS 
(
    SELECT id, row_name 
    FROM 
        (SELECT 1, 'Row 1' UNION ALL 
         SELECT 2, 'Row 2' UNION ALL
         SELECT 3, 'Row 3' UNION ALL
         SELECT 4, 'Row 4' UNION ALL
         SELECT 5, 'Row 5') AS D (id, row_name)
),
L0 AS 
(
    SELECT c 
    FROM 
        (SELECT 1 UNION ALL SELECT 1) AS D(c)
),  --2^1
L1 AS 
(
    SELECT 1 AS c 
    FROM L0 AS A 
    CROSS JOIN L0 AS B
),          --2^2
L2 AS 
(
    SELECT 1 AS c 
    FROM L1 AS A 
    CROSS JOIN L1 AS B
),          --2^4
L3 AS 
(
    SELECT 1 AS c 
    FROM L2 AS A 
    CROSS JOIN L2 AS B
),          --2^8
L4 AS 
(
    SELECT 1 AS c 
    FROM L3 AS A 
    CROSS JOIN L3 AS B
),          --2^16 = 65536
LFINAL AS 
(
    SELECT 
        c, 
        --ROW_NUMBER() OVER (ORDER BY c) % 5 + 1 AS rnd FROM L4)
        FLOOR(RAND(CONVERT(VARBINARY, NEWID()))*5) + 1 AS rnd 
    FROM 
        L4
)
SELECT * 
FROM LFINAL lf
LEFT JOIN Names n ON n.id = lf.rnd

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

美人骨 2025-01-19 06:08:20

我能够重现该问题。 这应该是预期的行为,因为

  • newid()不是每行生成的,而是每次调用生成的
  • 优化器可以自由评估多次一行

测试时,SQL Server 似乎在幕后执行以下操作:

for (row 1 ... 5 in dbo.names)
    for (row 1 ... 65536 in lfinal)
        add the pair {dbo.names.id, rand(newid...) as rnd} to selection
filter (where dbo.names.id = rnd)

请注意,327680 行集中的每一行都有 20% 的概率与过滤器匹配。您将总共获得大约 65536 行,但不完全那么多。

我建议将随机数插入临时表(变量)中,以便数字具体化,然后连接。

I was able to reproduce the problem. This should be the expected behavior because:

  • The newid() is not generated per row, it is generated per invocation
  • The optimizer is free to evaluate a row more than once

When tested, SQL server seems to do the following behind the scenes:

for (row 1 ... 5 in dbo.names)
    for (row 1 ... 65536 in lfinal)
        add the pair {dbo.names.id, rand(newid...) as rnd} to selection
filter (where dbo.names.id = rnd)

Notice that each row from the set of 327680 rows has 20% probability of matching the filter. You will get around 65536 rows in total but not exactly that many.

I would suggest inserting the random numbers into a temporary table (variable) so that the numbers are materialized, then join.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文