SQL Server 查询计划差异
当从参数化查询更改为非参数化查询时,我无法理解 SQL Server 中语句的估计查询计划的行为。
我有以下查询:
DECLARE @p0 UniqueIdentifier = '1fc66e37-6eaf-4032-b374-e7b60fbd25ea'
SELECT [t5].[value2] AS [Date], [t5].[value] AS [New]
FROM (
SELECT COUNT(*) AS [value], [t4].[value] AS [value2]
FROM (
SELECT CONVERT(DATE, [t3].[ServerTime]) AS [value]
FROM (
SELECT [t0].[CookieID]
FROM [dbo].[Usage] AS [t0]
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = @p0)
GROUP BY [t0].[CookieID]
) AS [t1]
OUTER APPLY (
SELECT TOP (1) [t2].[ServerTime]
FROM [dbo].[Usage] AS [t2]
WHERE ((([t1].[CookieID] IS NULL) AND ([t2].[CookieID] IS NULL))
OR (([t1].[CookieID] IS NOT NULL) AND ([t2].[CookieID] IS NOT NULL)
AND ([t1].[CookieID] = [t2].[CookieID])))
AND ([t2].[CookieID] IS NOT NULL)
AND ([t2].[ProductID] = @p0)
ORDER BY [t2].[ServerTime]
) AS [t3]
) AS [t4]
GROUP BY [t4].[value]
) AS [t5]
ORDER BY [t5].[value2]
该查询由 Linq2SQL 表达式生成并从 LINQPad 中提取。这会产生一个很好的查询计划(据我所知)并在数据库上执行大约 10 秒。但是,如果我将参数的两次使用替换为精确值,即将两个 '= @p0' 部分替换为 '= '1fc66e37-6eaf-4032-b374-e7b60fbd25ea' ' 我会得到不同的估计查询计划,并且查询现在运行时间要长得多(超过60秒,还没看透)。
为什么执行看似无害的替换会产生效率低得多的查询计划和执行?我已使用“DBCC FreeProcCache”清除了过程缓存,以确保我没有缓存错误的计划,但该行为仍然存在。
我真正的问题是我可以忍受 10 秒的执行时间(至少在很长一段时间内),但我无法忍受 60 秒以上的执行时间。我的查询将(如上所述)由 Linq2SQL 生成,因此它在数据库上执行,这
exec sp_executesql N'
...
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = @p0)
...
AND ([t2].[ProductID] = @p0)
...
',N'@p0 uniqueidentifier',@p0='1FC66E37-6EAF-4032-B374-E7B60FBD25EA'
会产生同样糟糕的执行时间(我认为这很奇怪,因为这似乎使用参数化查询。
我不是在寻求建议关于要创建哪些索引等,我只是想了解为什么三个看似相似的查询的查询计划和执行如此不同
编辑:我已经上传了非参数化的执行计划 。以及参数化查询以及参数化查询的执行计划(如 建议) Heinz),此处
希望它能帮助你帮助我:)
I'm having trouble understanding the behavior of the estimated query plans for my statement in SQL Server when a change from a parameterized query to a non-parameterized query.
I have the following query:
DECLARE @p0 UniqueIdentifier = '1fc66e37-6eaf-4032-b374-e7b60fbd25ea'
SELECT [t5].[value2] AS [Date], [t5].[value] AS [New]
FROM (
SELECT COUNT(*) AS [value], [t4].[value] AS [value2]
FROM (
SELECT CONVERT(DATE, [t3].[ServerTime]) AS [value]
FROM (
SELECT [t0].[CookieID]
FROM [dbo].[Usage] AS [t0]
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = @p0)
GROUP BY [t0].[CookieID]
) AS [t1]
OUTER APPLY (
SELECT TOP (1) [t2].[ServerTime]
FROM [dbo].[Usage] AS [t2]
WHERE ((([t1].[CookieID] IS NULL) AND ([t2].[CookieID] IS NULL))
OR (([t1].[CookieID] IS NOT NULL) AND ([t2].[CookieID] IS NOT NULL)
AND ([t1].[CookieID] = [t2].[CookieID])))
AND ([t2].[CookieID] IS NOT NULL)
AND ([t2].[ProductID] = @p0)
ORDER BY [t2].[ServerTime]
) AS [t3]
) AS [t4]
GROUP BY [t4].[value]
) AS [t5]
ORDER BY [t5].[value2]
This query is generated by a Linq2SQL expression and extracted from LINQPad. This produces a nice query plan (as far as I can tell) and executes in about 10 seconds on the database. However, if I replace the two uses of parameters with the exact value, that is replace the two '= @p0' parts with '= '1fc66e37-6eaf-4032-b374-e7b60fbd25ea' ' I get a different estimated query plan and the query now runs much longer (more than 60 seconds, haven't seen it through).
Why is it that performing the seemingly innocent replacement produces a much less efficient query plan and execution? I have cleared the procedure cache with 'DBCC FreeProcCache' to ensure that I was not caching a bad plan, but the behavior remains.
My real problem is that I can live with the 10 seconds execution time (at least for a good while) but I can't live with the 60+ sec execution time. My query will (as hinted above) by produced by Linq2SQL so it is executed on the database as
exec sp_executesql N'
...
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = @p0)
...
AND ([t2].[ProductID] = @p0)
...
',N'@p0 uniqueidentifier',@p0='1FC66E37-6EAF-4032-B374-E7B60FBD25EA'
which produces the same poor execution time (which I think is doubly strange since this seems to be using parameterized queries.
I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.
EDIT: I have uploaded execution plans for the non-parameterized and the parameterized query as well as an execution plan for a parameterized query (as suggested by Heinz) with a different GUID here
Hope it helps you help me :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您提供显式值,SQL Server 可以使用该字段的统计信息来做出“更好”的查询计划决策。不幸的是(正如我最近亲身经历的那样),如果统计信息中包含的信息具有误导性,有时 SQL Server 就会做出错误的选择。
如果您想更深入地研究这个问题,我建议您检查一下使用其他 GUID 时会发生什么:如果它对不同的具体 GUID 使用不同的查询计划,则表明使用了统计数据。在这种情况下,您可能需要查看
sp_updatestats
和相关命令。编辑:查看DBCC SHOW_STATISTICS:“慢”和“快”GUID 可能位于直方图中的不同桶中。我遇到了类似的问题< /a>,我通过添加
INDEX
解决了这个问题SQL 的表提示,它“指导”SQL Server 找到“正确”的查询计划。基本上,我研究了“快速”查询期间使用的索引,并将它们硬编码到 SQL 中。这远非最佳或优雅的解决方案,但我还没有找到更好的解决方案......If you provide an explicit value, SQL Server can use statistics of this field to make a "better" query plan decision. Unfortunately (as I've experienced myself recently), if the information contained in the statistics is misleading, sometimes SQL Server just makes the wrong choices.
If you want to dig deeper into this issue, I recommend you to check what happens if you use other GUIDs: If it uses a different query plan for different concrete GUIDs, that's an indication that statistics data is used. In that case, you might want to look at
sp_updatestats
and related commands.EDIT: Have a look at DBCC SHOW_STATISTICS: The "slow" and the "fast" GUID are probably in different buckets in the histogram. I've had a similar problem, which I solved by adding an
INDEX
table hint to the SQL, which "guides" SQL Server towards finding the "right" query plan. Basically, I've looked at what indices are used during a "fast" query and hard-coded those into the SQL. This is far from an optimal or elegant solution, but I haven't found a better one yet...您似乎有两个索引:
第一个索引不覆盖
CookieID
但按ServerTime
排序,因此对于 less 选择性 < code>ProductID 的(即您有很多的)第二个索引确实涵盖了所有列,但未排序,因此对于更多选择性
ProductID(那些你很少的)。
平均而言,
ProductID
基数使得SQL Server
期望第二种方法高效,这就是当您使用参数化查询或显式提供选择性时使用的方法GUID 的。
但是,您的原始 GUID 被认为选择性较低,这就是使用第一种方法的原因。
不幸的是,第一种方法需要对 CookieID 进行额外的过滤,这就是它实际上效率较低的原因。
You seem to have two indexes:
The first index does not cover
CookieID
but is ordered onServerTime
and hence is more efficient for the less selectiveProductID
's (i. e. those that you have many)The second index does cover all columns but is not ordered, and hence is more efficient for more selective
ProductID
's (those that you have few).In average, you
ProductID
cardinality is so thatSQL Server
expects the second method to be efficient, which is what it uses when you use parametrized queries or explicitly provide selectiveGUID
's.However, your original
GUID
is considered less selective, that's why the first method is used.Unfortunately, the first method requires additional filtering on
CookieID
which is why it's less efficient in fact.我的猜测是,当您采用非参数化路线时,您的 guid 必须从 varchar 转换为 UniqueIdentifier,这可能会导致索引不被使用,而它将在采用参数化路线时使用。
我已经看到这种情况发生在使用在 where 子句中针对使用日期时间的列具有小日期时间的查询时。
My guess is that when you take the non paramaterized route, your guid has to be converted from a varchar to a UniqueIdentifier which may cause an index not to be used, while it will be used taking the paramatarized route.
I've seen this happen with using queries that have a smalldatetime in the where clause against a column that uses a datetime.
如果不查看执行计划就很难判断,但是如果我要猜测一个原因,我会说它是参数嗅探和糟糕的统计数据的结合 - 如果你很难 -将 GUID 编码到查询中,查询优化器尝试针对该参数值优化查询。我相信参数化/准备好的查询也会发生同样的事情(这称为参数嗅探 - 执行计划针对第一次执行准备好的语句时使用的参数进行了优化),但是当您声明时这绝对不会发生参数并在查询中使用它。
正如我所说,SQL Server 尝试针对该值优化执行计划,因此通常您应该会看到更好的结果。在这里,它所依据的决策信息似乎是不正确/具有误导性的,并且当它优化通用参数值的查询时,您会更好(出于某种原因)。
然而,这主要是猜测 - 如果不执行就不可能真正判断 - 如果您可以将执行计划上传到某个地方,那么我相信有人能够帮助您了解真正的原因。
Its difficult to tell without looking at the execution plans, however if I was going to guess at a reason I'd say that its a combinaton of parameter sniffing and poor statistics - In the case where you hard-code the GUID into the query, the query optimiser attempts to optimise the query for that value of the parameter. I believe that the same thing happens with the parameterised / prepared query (this is called parameter sniffing - the execution plan is optimised for the parameters used the first time that the prepared statement is executed), however this definitely doesn't happen when you declare the parameter and use it in the query.
Like I said, SQL server attempt to optimise the execution plan for that value, and so usually you should see better results. It seems here that that information it is basing its decisions on is incorrect / misleading, and you are better off (for some reason) when it optimises the query for a generic parameter value.
This is mostly guesswork however - its impossible to tell really without the execution - if you can upload the executuion plan somewhere then I'm sure someone will be able to help you with the real reason.