SqlServer随机数据生成观察
我有一个问题,为什么这两个查询的输出不同。我本来希望他们能以同样的方式工作。
查询 1:
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');
while @i1 <= ASCII('9')
begin
insert into @table1 (c)
select (CHAR(@i1))
set @i1 = @i1 +1;
end
insert into @cache (originalValue, obfuscateValue)
select [firstname],
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
from Customer
where [firstname] is not null
select * from @cache;
查询 2:
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');
while @i1 <= ASCII('9')
begin
insert into @table1 (c)
select (CHAR(@i1))
set @i1 = @i1 +1;
end
insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null
update c
set c.obfuscateValue = t.Value
from @cache c
join
(
select originalValue,
(
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
) as Value
from @cache
) t on t.originalValue = c.originalValue
select * from @cache;
他们应该执行相同的操作,但第一个查询返回以下结果:
Jonathon 73
Everett 73
Janet 73
Andy 73
Shauna 73
第二个:
Jonathon 82
Everett 40
Janet 68
Andy 79
Shauna 29
正如您所注意到的,第二个结果中的第二列具有不同的值,而第一个 - 相同的值。
看起来在第一个查询中
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
只被调用一次。
有人能解释这个谜团吗?
I have a question on why the output of these two queries differ. I would have expected them to work the same.
Query 1:
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');
while @i1 <= ASCII('9')
begin
insert into @table1 (c)
select (CHAR(@i1))
set @i1 = @i1 +1;
end
insert into @cache (originalValue, obfuscateValue)
select [firstname],
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
from Customer
where [firstname] is not null
select * from @cache;
Query 2:
declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));
declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');
while @i1 <= ASCII('9')
begin
insert into @table1 (c)
select (CHAR(@i1))
set @i1 = @i1 +1;
end
insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null
update c
set c.obfuscateValue = t.Value
from @cache c
join
(
select originalValue,
(
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
) as Value
from @cache
) t on t.originalValue = c.originalValue
select * from @cache;
They should do the same, but first query returns following results:
Jonathon 73
Everett 73
Janet 73
Andy 73
Shauna 73
And second:
Jonathon 82
Everett 40
Janet 68
Andy 79
Shauna 29
As you noticed, the second column in second result has different values, while first - same values.
It looks like in first query the
(select top 1 c from @table1 order by NEWID()) +
(select top 1 c from @table1 order by NEWID())
is called only once.
Can someone explain this mystery?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为可以通过另一种方式生成随机值。
这是生成 [a-zA-Z]{3,6} 的方法
I think random values can be generated in another way.
This is how to generated [a-zA-Z]{3,6}
一根线?
One line?
您的假设是正确的,第一个查询仅运行“select top”一次。这种行为的发生是由于优化器选择优化查询的方式所致。它决定因为子查询(选择顶部查询)是独立的,并且与外部选择查询不相关,所以它在执行计划中使用 Tablespool(Lazy Spool)运算符。这会导致将选择的顶部值放置在 tempdb 中以供重用。
由于优化器选择使用嵌套循环运算符将所有数据汇集在一起,无需重新绑定,因此使用假脱机值,而不是对每个输入外部行重新应用查询。
在第二个查询期间,优化器选择不使用 Tablespool 运算符(我相信这样做是为了证明输入表来自 tempdb)。因此,您可以为临时表中的每个输入行重新应用选择顶级子查询。
如果需要,如果您想强制执行计划按需要执行,您可以使用表/查询提示。
You are correct in your assumption that the first query is only running the ‘select top’ once. The behavior is happening because of how the optimizer chose to optimize the query. It decided because the subqueries (the select top queries) are self-contained and are not correlated with the outside select query it uses a Tablespool (Lazy Spool) operator in the execution plan. This causes the select top value to be placed in the tempdb for reuse.
Because the optimizer chooses to use a Nested Loops operator to bring all the data together no rebinding is needed, the spooled value is used instead of reapplying the queries for each input outer row.
During the second query the optimizer chose not to use a Tablespool operator (I believe do to fact the input table being from tempdb). So you have the select top subqueries being reapplied for each input row from the temporary table.
If needed, you may be able to use a table/query hints if you want to force the execution plan to perform as desired.