SqlServer随机数据生成观察

发布于 2024-10-08 11:02:41 字数 1819 浏览 0 评论 0原文

我有一个问题，为什么这两个查询的输出不同。我本来希望他们能以同样的方式工作。

查询 1：

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue, obfuscateValue)
select [firstname], 
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
from Customer
where [firstname] is not null

select * from @cache;

查询 2：

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null

update c
set c.obfuscateValue = t.Value
from @cache c
join 
(
    select originalValue,
    (       
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
    ) as Value
    from @cache
) t on t.originalValue = c.originalValue

select * from @cache;

他们应该执行相同的操作，但第一个查询返回以下结果：

Jonathon    73
Everett 73
Janet   73
Andy    73
Shauna  73

第二个：

Jonathon    82
Everett 40
Janet   68
Andy    79
Shauna  29

正如您所注意到的，第二个结果中的第二列具有不同的值，而第一个 - 相同的值。

看起来在第一个查询中

(select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID())

只被调用一次。

有人能解释这个谜团吗？

原文

I have a question on why the output of these two queries differ. I would have expected them to work the same.

Query 1:

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue, obfuscateValue)
select [firstname], 
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
from Customer
where [firstname] is not null

select * from @cache;

Query 2:

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null

update c
set c.obfuscateValue = t.Value
from @cache c
join 
(
    select originalValue,
    (       
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
    ) as Value
    from @cache
) t on t.originalValue = c.originalValue

select * from @cache;

They should do the same, but first query returns following results:

Jonathon    73
Everett 73
Janet   73
Andy    73
Shauna  73

And second:

Jonathon    82
Everett 40
Janet   68
Andy    79
Shauna  29

As you noticed, the second column in second result has different values, while first - same values.

It looks like in first query the

(select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID())

is called only once.

Can someone explain this mystery?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梨涡少年 2024-10-15 11:02:41

我认为可以通过另一种方式生成随机值。

这是生成 [a-zA-Z]{3,6} 的方法

declare @min int, @max int;
declare @alpha varchar(max)

set @min = 3;
set @max = 6;
set @alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

insert into @cache (originalValue, obfuscateValue)
select [firstname], LEFT(t.Value, case when t.maxLen < @min then @min else t.maxLen end)
from Customer 
join
(
    select ABS(CHECKSUM(NEWID()))%@max + 1 as maxLen,
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) as Value
)t on t.Value is not null
where [firstname] is not null

select * from @cache;

I think random values can be generated in another way.

This is how to generated [a-zA-Z]{3,6}

declare @min int, @max int;
declare @alpha varchar(max)

set @min = 3;
set @max = 6;
set @alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

insert into @cache (originalValue, obfuscateValue)
select [firstname], LEFT(t.Value, case when t.maxLen < @min then @min else t.maxLen end)
from Customer 
join
(
    select ABS(CHECKSUM(NEWID()))%@max + 1 as maxLen,
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) as Value
)t on t.Value is not null
where [firstname] is not null

select * from @cache;

回复收藏 0 原文

凉栀 2024-10-15 11:02:41

一根线？

SELECT
     RIGHT( --number of zeros to match expected max length. Or use REPLICATE.
        '000000' + CAST(
          --The 2 newid() expression means we'll get a larger number
          --less chance of using leading static zeroes
          CAST(CHECKSUM(NEWD_ID()) as bigint) * CAST(CHECKSUM(NEWD_ID()) as bigint)
            as varchar(30))
        --The 3 gives us the desired mask. Currently 3 digits.
        , 3)

One line?

SELECT
     RIGHT( --number of zeros to match expected max length. Or use REPLICATE.
        '000000' + CAST(
          --The 2 newid() expression means we'll get a larger number
          --less chance of using leading static zeroes
          CAST(CHECKSUM(NEWD_ID()) as bigint) * CAST(CHECKSUM(NEWD_ID()) as bigint)
            as varchar(30))
        --The 3 gives us the desired mask. Currently 3 digits.
        , 3)

回复收藏 0 原文

对你而言 2024-10-15 11:02:41

您的假设是正确的，第一个查询仅运行“select top”一次。这种行为的发生是由于优化器选择优化查询的方式所致。它决定因为子查询（选择顶部查询）是独立的，并且与外部选择查询不相关，所以它在执行计划中使用 Tablespool（Lazy Spool）运算符。这会导致将选择的顶部值放置在 tempdb 中以供重用。

由于优化器选择使用嵌套循环运算符将所有数据汇集在一起，无需重新绑定，因此使用假脱机值，而不是对每个输入外部行重新应用查询。

在第二个查询期间，优化器选择不使用 Tablespool 运算符（我相信这样做是为了证明输入表来自 tempdb）。因此，您可以为临时表中的每个输入行重新应用选择顶级子查询。

如果需要，如果您想强制执行计划按需要执行，您可以使用表/查询提示。

回复收藏 0 原文

~没有更多了~