在查询中使用过滤器和 CONTAINSTABLE 时执行计划不佳
我们有一个有趣的问题,我希望有人能帮助阐明这一问题。从高层次来看,问题如下:
以下查询执行速度很快(1 秒):
SELECT SA.*
FROM cg.SEARCHSERVER_ACTYS AS SA
JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID
但是如果我们向查询添加过滤器,则大约需要 2 分钟才能返回:
SELECT SA.*
FROM cg.SEARCHSERVER_ACTYS AS SA
JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID
WHERE SA.CHG_DATE>'19 Feb 2010'
查看两个查询的执行计划,我发现可以看到,在第二种情况下,有两个地方实际行数和估计行数之间存在巨大差异,这些地方是:
1) 对于 FulltextMatch 表值函数,估计约为 22,000 行,实际为 2900 万行(然后在连接之前过滤到 1670 行)并且 2) 对于全文索引上的索引查找,其中估计为 1 行,实际为 13,000 行
作为估计的结果,优化器选择使用嵌套循环连接(因为它假设行数较少) )因此该计划效率低下。
我们可以通过(a)参数化查询并向查询添加选项(OPTIMIZE FOR UNKNOWN)或(b)强制使用 HASH JOIN 来解决该问题。在这两种情况下,查询都会在不到 1 秒的时间内返回,并且估计值看起来很合理。
我的问题实际上是“为什么在表现不佳的情况下使用的估计如此不准确以及可以采取哪些措施来改进它们”?
此处使用的索引视图上的索引统计信息是最新的。
非常感谢任何帮助。
We have an interesting problem that I was hoping someone could help to shed some light on. At a high level the problem is as below:
The following query executes quickly (1 second):
SELECT SA.*
FROM cg.SEARCHSERVER_ACTYS AS SA
JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID
but if we add a filter to the query, then it takes approximately 2 minutes to return:
SELECT SA.*
FROM cg.SEARCHSERVER_ACTYS AS SA
JOIN CONTAINSTABLE(CG.SEARCHSERVER_ACTYS, NOTE, 'reports') AS T1 ON T1.[Key]=SA.UNIQUE_ID
WHERE SA.CHG_DATE>'19 Feb 2010'
Looking at the execution plan for the two queries, I can see that in the second case there are two places where there are huge differences between the actual and estimated number of rows, these being:
1) For the FulltextMatch table valued function where the estimate is approx 22,000 rows and the actual is 29 million rows (which are then filtered down to 1670 rows before the join) and
2) For the index seek on the full text index, where the estimate is 1 row and the actual is 13,000 rows
As a result of the estimates, the optimiser is choosing to use a nested loops join (since it assumes a small number of rows) hence the plan is inefficient.
We can work around the problem by either (a) parameterising the query and adding an OPTION (OPTIMIZE FOR UNKNOWN) to the query or (b) by forcing a HASH JOIN to be used. In both of these cases the query returns in sub 1 second and the estimates appear reasonable.
My question really is 'why are the estimates being used in the poorly performing case so wildly inaccurate and what can be done to improve them'?
Statistics are up to date on the indexes on the indexed view being used here.
Any help greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
结果发现问题出在SQL Server的版本上。该问题在 SQL Server 2008(无服务包)中出现,并通过升级到 SQL Server 2008 SP1(并添加 CU5)得到解决。由于我们没有在未安装 CU5 的情况下进行测试,因此我无法确定修复程序是 SP1 还是 CU5 附带的。没关系,问题解决了。士气?让您的服务器保持最新状态。
The problem here turned out to be with the version of SQL Server. The problem manifested itself with SQL Server 2008 (no service pack) and was resolved by upgrading to SQL Server 2008 SP1 (and adding CU5). Since we did not test without CU5 installed I cannot determine if the fix came with SP1 or CU5. No matter, the issue is resolved. Morale? Keep your server up to date.
也许您可以在相关列上添加一些统计信息 - 这将有助于 SQL Server 更好地估计行数及其内容。
目前涉及哪些统计或指标?
Perhaps you could add some statistics on the column in question - that will help SQL Server make better estimates about both the number of rows and their contents.
What statistics or indexes are currently involved?