SqlServer随机数据生成观察

发布于 2024-10-08 11:02:41 字数 1819 浏览 0 评论 0原文

我有一个问题,为什么这两个查询的输出不同。我本来希望他们能以同样的方式工作。

查询 1:

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue, obfuscateValue)
select [firstname], 
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
from Customer
where [firstname] is not null

select * from @cache;

查询 2:

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null

update c
set c.obfuscateValue = t.Value
from @cache c
join 
(
    select originalValue,
    (       
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
    ) as Value
    from @cache
) t on t.originalValue = c.originalValue

select * from @cache;

他们应该执行相同的操作,但第一个查询返回以下结果:

Jonathon    73
Everett 73
Janet   73
Andy    73
Shauna  73

第二个:

Jonathon    82
Everett 40
Janet   68
Andy    79
Shauna  29

正如您所注意到的,第二个结果中的第二列具有不同的值,而第一个 - 相同的值。

看起来在第一个查询中

(select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID())

只被调用一次。

有人能解释这个谜团吗?

I have a question on why the output of these two queries differ. I would have expected them to work the same.

Query 1:

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue, obfuscateValue)
select [firstname], 
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
from Customer
where [firstname] is not null

select * from @cache;

Query 2:

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null

update c
set c.obfuscateValue = t.Value
from @cache c
join 
(
    select originalValue,
    (       
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
    ) as Value
    from @cache
) t on t.originalValue = c.originalValue

select * from @cache;

They should do the same, but first query returns following results:

Jonathon    73
Everett 73
Janet   73
Andy    73
Shauna  73

And second:

Jonathon    82
Everett 40
Janet   68
Andy    79
Shauna  29

As you noticed, the second column in second result has different values, while first - same values.

It looks like in first query the

(select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID())

is called only once.

Can someone explain this mystery?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梨涡少年 2024-10-15 11:02:41

我认为可以通过另一种方式生成随机值。

这是生成 [a-zA-Z]{3,6} 的方法

declare @min int, @max int;
declare @alpha varchar(max)

set @min = 3;
set @max = 6;
set @alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

insert into @cache (originalValue, obfuscateValue)
select [firstname], LEFT(t.Value, case when t.maxLen < @min then @min else t.maxLen end)
from Customer 
join
(
    select ABS(CHECKSUM(NEWID()))%@max + 1 as maxLen,
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) as Value
)t on t.Value is not null
where [firstname] is not null

select * from @cache;

I think random values can be generated in another way.

This is how to generated [a-zA-Z]{3,6}

declare @min int, @max int;
declare @alpha varchar(max)

set @min = 3;
set @max = 6;
set @alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

insert into @cache (originalValue, obfuscateValue)
select [firstname], LEFT(t.Value, case when t.maxLen < @min then @min else t.maxLen end)
from Customer 
join
(
    select ABS(CHECKSUM(NEWID()))%@max + 1 as maxLen,
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) as Value
)t on t.Value is not null
where [firstname] is not null

select * from @cache;
凉栀 2024-10-15 11:02:41

一根线?

SELECT
     RIGHT( --number of zeros to match expected max length. Or use REPLICATE.
        '000000' + CAST(
          --The 2 newid() expression means we'll get a larger number
          --less chance of using leading static zeroes
          CAST(CHECKSUM(NEWD_ID()) as bigint) * CAST(CHECKSUM(NEWD_ID()) as bigint)
            as varchar(30))
        --The 3 gives us the desired mask. Currently 3 digits.
        , 3)

One line?

SELECT
     RIGHT( --number of zeros to match expected max length. Or use REPLICATE.
        '000000' + CAST(
          --The 2 newid() expression means we'll get a larger number
          --less chance of using leading static zeroes
          CAST(CHECKSUM(NEWD_ID()) as bigint) * CAST(CHECKSUM(NEWD_ID()) as bigint)
            as varchar(30))
        --The 3 gives us the desired mask. Currently 3 digits.
        , 3)
对你而言 2024-10-15 11:02:41

您的假设是正确的,第一个查询仅运行“select top”一次。这种行为的发生是由于优化器选择优化查询的方式所致。它决定因为子查询(选择顶部查询)是独立的,并且与外部选择查询不相关,所以它在执行计划中使用 Tablespool(Lazy Spool)运算符。这会导致将选择的顶部值放置在 tempdb 中以供重用。

由于优化器选择使用嵌套循环运算符将所有数据汇集在一起​​,无需重新绑定,因此使用假脱机值,而不是对每个输入外部行重新应用查询。

在第二个查询期间,优化器选择不使用 Tablespool 运算符(我相信这样做是为了证明输入表来自 tempdb)。因此,您可以为临时表中的每个输入行重新应用选择顶级子查询。

如果需要,如果您想强制执行计划按需要执行,您可以使用表/查询提示。

You are correct in your assumption that the first query is only running the ‘select top’ once. The behavior is happening because of how the optimizer chose to optimize the query. It decided because the subqueries (the select top queries) are self-contained and are not correlated with the outside select query it uses a Tablespool (Lazy Spool) operator in the execution plan. This causes the select top value to be placed in the tempdb for reuse.

Because the optimizer chooses to use a Nested Loops operator to bring all the data together no rebinding is needed, the spooled value is used instead of reapplying the queries for each input outer row.

During the second query the optimizer chose not to use a Tablespool operator (I believe do to fact the input table being from tempdb). So you have the select top subqueries being reapplied for each input row from the temporary table.

If needed, you may be able to use a table/query hints if you want to force the execution plan to perform as desired.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文