有没有办法延迟存储过程执行计划的编译？

发布于 2024-08-29 02:33:45 字数 2395 浏览 6 评论 0原文

（乍一看，这可能看起来像直接执行语句和从存储过程执行语句时的不同执行计划或为什么 SqlServer 优化器与参数如此混淆？，但我的实际问题有点不同）

好吧，这个问题让我难住了几个小时。我这里的示例抽象得可笑，所以我怀疑是否可以在本地重新创建，但它为我的问题提供了上下文（另外，我正在运行 SQL Server 2005）。

我有一个存储过程，基本上有两个步骤：构造一个临时表，用很少的行填充它，然后查询一个与该临时表连接的非常大的表。它有多个参数，但最相关的是日期时间“@MinDate”。本质上：

create table #smallTable (ID int)

insert into #smallTable
select (a very small number of rows from some other table)

select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @MinDate

如果我只是将其作为普通查询执行，通过将 @MinDate 声明为局部变量并运行它，它会生成一个执行速度非常快的最佳执行计划（首先加入 #smallTable，然后仅加入在执行其他操作时考虑来自 aGiantTable 的行的非常小的子集）。似乎意识到 #smallTable 很小，因此从它开始会很有效。这很好。

但是，如果我将其设为一个以 @MinDate 作为参数的存储过程，则会产生完全低效的执行计划。（我每次都会重新编译它，所以这不是一个糟糕的缓存计划......至少，我确实希望不是）

但这就是它变得奇怪的地方。如果我将过程更改为以下内容：

declare @LocalMinDate datetime
set @LocalMinDate = @MinDate --where @MinDate is still a parameter

create table #smallTable (ID int)

insert into #smallTable
select (a very small number of rows from some other table)

select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @LocalMinDate

那么它会给我有效的计划！

所以我的理论是这样的：当作为普通查询（而不是存储过程）执行时，它会等到最后一刻才构建昂贵查询的执行计划，因此查询优化器知道#smallTable很小并使用该信息给出高效的计划。

但是当作为存储过程执行时，它会立即创建整个执行计划，因此它无法使用这一点信息来优化计划。

但为什么使用本地声明的变量会改变这一点呢？为什么会延迟执行计划的创建？事实真的是这样吗？如果是这样，即使不以这种方式使用局部变量，是否有办法强制延迟编译（如果这确实是这里发生的情况）？

更一般地说，是否有人有关于何时为存储过程的每个步骤创建执行计划的来源？谷歌搜索没有提供任何有用的信息，但我不认为我正在寻找正确的东西。或者我的理论完全没有根据？

编辑：自从发布以来，我已经了解了参数嗅探，我认为这就是导致执行计划过早编译的原因（除非存储过程确实一次编译），所以我的问题仍然是 -你能强制延迟吗？或者完全禁用嗅探？

这个问题是学术性的，因为我可以通过将 select * from aGiantTable 替换为

select * from (select * from aGiantTable where ID in (select ID from #smallTable)) as aGiantTable

Or 只是吸收它并屏蔽参数来强制制定更有效的计划，但这种不一致仍然让我很好奇。

tl;dnr

这是一个非常长的问题，简而言之：

完整的执行计划是在第一次调用存储过程时创建的，还是在执行时创建的？也就是说，如果存储过程由多个步骤组成，则每个步骤的执行计划是在首次调用该过程时创建的，还是仅在过去的步骤完成执行后创建（同样是第一次调用时）？

原文

(At first glance this may look like a duplicate of Different execution plan when executing statement directly and from stored procedure or Why does the SqlServer optimizer get so confused with parameters?, but my actual question is a bit different)

Alright, this one's had me stumped for a few hours. My example here is ridiculously abstracted, so I doubt it will be possible to recreate locally, but it provides context for my question (Also, I'm running SQL Server 2005).

I have a stored procedure with basically two steps, constructing a temp table, populating it with very few rows, and then querying a very large table joining against that temp table. It has multiple parameters, but the most relevant is a datetime "@MinDate." Essentially:

create table #smallTable (ID int)

insert into #smallTable
select (a very small number of rows from some other table)

select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @MinDate

If I just execute this as a normal query, by declaring @MinDate as a local variable and running that, it produces an optimal execution plan that executes very quickly (first joins on #smallTable and then only considers a very small subset of rows from aGiantTable while doing other operations). It seems to realize that #smallTable is tiny, so it would be efficient to start with it. This is good.

However, if I make that a stored procedure with @MinDate as a parameter, it produces a completely inefficient execution plan. (I am recompiling it each time, so it's not a bad cached plan...at least, I sure hope it's not)

But here's where it gets weird. If I change the proc to the following:

declare @LocalMinDate datetime
set @LocalMinDate = @MinDate --where @MinDate is still a parameter

create table #smallTable (ID int)

insert into #smallTable
select (a very small number of rows from some other table)

select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @LocalMinDate

Then it gives me the efficient plan!

So my theory is this: when executing as a plain query (not as a stored procedure), it waits to construct the execution plan for the expensive query until the last minute, so the query optimizer knows that #smallTable is small and uses that information to give the efficient plan.

But when executing as a stored procedure, it creates the entire execution plan at once, thus it can't use this bit of information to optimize the plan.

But why does using the locally declared variables change this? Why does that delay the creation of the execution plan? Is that actually what's happening? If so, is there a way to force delayed compilation (if that indeed is what's going on here) even when not using local variables in this way?

More generally, does anyone have sources on when the execution plan is created for each step of a stored procedure? Googling hasn't provided any helpful information, but I don't think I'm looking for the right thing. Or is my theory just completely unfounded?

Edit: Since posting, I've learned of parameter sniffing, and I assume this is what's causing the execution plan to compile prematurely (unless stored procedures indeed compile all at once), so my question remains -- can you force the delay? Or disable the sniffing entirely?

The question is academic, since I can force a more efficient plan by replacing the select * from aGiantTable with

select * from (select * from aGiantTable where ID in (select ID from #smallTable)) as aGiantTable

Or just sucking it up and masking the parameters, but still, this inconsistency has me pretty curious.

tl;dnr

This is an egregiously long question, so in brief:

Is the full execution plan created when the stored procedure is first called, or as it executes? That is, if a stored procedure consists of multiple steps, is the execution plan for each step created when the procedure is first called, or is it only created after past steps have finished executing (again, the first time it's called)?

分享到QQ

分享到微博