SQL Server 如何计算出估计的行数?
我正在尝试调试一个相当复杂的存储过程,该存储过程连接多个表(10-11)。我发现对于树的一部分,估计的行数与实际的行数有很大不同 - 在最坏的情况下,SQL Server 估计将返回 1 行,而实际上返回了 55,000 行!
我正在尝试找出原因 - 我的所有统计数据都是最新的,并且我已经使用几个表上的 FULLSCAN 更新了统计数据。我没有使用任何用户定义的函数或表变量。据我所知,SQL Server 应该能够准确估计将返回多少行,但它继续选择一个计划,在这种情况下它会执行数万次 RDI 查找(当它预计只执行 1 次时)或 2)。
我可以做什么来尝试理解为什么估计行数超出这么多?
更新:因此,查看该计划,我发现了一个特别可疑的节点 - 它使用以下谓词对表进行表扫描:
status <> 5
AND [type] = 1
OR [type] = 2
此谓词返回整个表(630 行 - 该表扫描本身并不是性能不佳的根源),但是 SQL Server 的估计行数仅为 37。然后 SQL Server 继续执行几个嵌套循环,将其用于 RDI 查找、索引扫描和索引查找。难道这就是我严重误判的根源吗?我如何让它估计更合理的行数?
I'm trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I'm seeing that for a part of the tree the estimated number of rows drasticly differs from the actual number of rows - at its worst SQL server estimates that 1 row will be returned, when in actuality 55,000 rows are returned!
I'm trying to work out why this is - all of my statistics are up-to-date, and I've updated statistics with a FULLSCAN on several tables. I'm not using any user defined functions or table variables. As far as I can see SQL server should be able to exactly estimate how many rows are going to be returned, but it continues to choose a plan which cases it to perform tens of thousands of RDI lookups (when it is expecting to only perform 1 or 2).
What can I do to try and understand why the estimated number of rows is out by so much?
UPDATE: So looking at the plan I've found one node in particular which seems suspicous - its a table scan on a table using the following predecate:
status <> 5
AND [type] = 1
OR [type] = 2
This predicate returns the entire table (630 rows - the table scan itself it NOT the source of the poor performance) however SQL server has the estimated number of rows at just 37. SQL server then goes on to do several nested loops with this onto RDI lookups, index scans and index seeks. Could this be the source of my massive miscalculation? How do I get it to estimate a more sensible number of rows?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
重建索引可能会解决估计行值不正确的问题
rebuilding your indexes might resolve the incorrect estimated rows value issue
SQL Server
将每个索引拆分为最多200
个范围,其中包含以下数据(来自 此处):通常,大多数填充的值都会进入
RANGE_HI_KEY
。然而,它们可能会进入该范围,这可能会导致分布倾斜。
想象一下这些数据(以及其他数据):
键值行数
SQL Server
通常构建两个范围:1
到3
和4
到下一个填充值,这使得这些统计信息:,这意味着当搜索
2
时,只有1
行,最好使用索引访问。但如果
3
进入该范围,则统计信息如下:优化器认为键
2
有3334
行,并且索引访问过多昂贵的。SQL Server
splits each index into up to200
ranges with the following data (from here):Usually, most populated values go into
RANGE_HI_KEY
.However, they can get into the range and this can lead to the skew in distribution.
Imagine these data (among the others):
Key value Count of rows
SQL Server
usually builds two ranges:1
to3
and4
to the next populated value, which makes these statistics:, which means the when searching for, say,
2
, there is but1
row and it's better to use the index access.But if
3
goes inside the range, the statistics are these:The optimizer thinks there are
3334
rows for the key2
and index access is too expensive.它使用为每个索引保留的统计数据。
(您还可以在非索引列上创建统计信息)
更新数据库中每个表的所有统计信息(警告:在非常大的数据库上将花费一些时间。不要在生产服务器上执行此操作没有与您的 DBA 检查...):
如果您没有定期计划的工作来重建最活跃的索引(即大量插入或删除),您应该考虑重建索引(与上述相同的警告适用):
It uses statistics, which it keeps for each index.
(You can also create statistics on non-indexed columns)
To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):
If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):
由于您已经更新了统计信息,我会尝试消除任何参数嗅探:
Since you already updated the statistics, I'd try to eliminate any parameter sniffing: