SQL Server 如何计算出估计的行数？

发布于 2024-08-06 00:02:34 字数 606 浏览 6 评论 0原文

我正在尝试调试一个相当复杂的存储过程，该存储过程连接多个表（10-11）。我发现对于树的一部分，估计的行数与实际的行数有很大不同 - 在最坏的情况下，SQL Server 估计将返回 1 行，而实际上返回了 55,000 行！

我正在尝试找出原因 - 我的所有统计数据都是最新的，并且我已经使用几个表上的 FULLSCAN 更新了统计数据。我没有使用任何用户定义的函数或表变量。据我所知，SQL Server 应该能够准确估计将返回多少行，但它继续选择一个计划，在这种情况下它会执行数万次 RDI 查找（当它预计只执行 1 次时）或 2)。

我可以做什么来尝试理解为什么估计行数超出这么多？

更新：因此，查看该计划，我发现了一个特别可疑的节点 - 它使用以下谓词对表进行表扫描：

status <> 5
AND [type] = 1
OR [type] = 2

此谓词返回整个表（630 行 - 该表扫描本身并不是性能不佳的根源），但是 SQL Server 的估计行数仅为 37。然后 SQL Server 继续执行几个嵌套循环，将其用于 RDI 查找、索引扫描和索引查找。难道这就是我严重误判的根源吗？我如何让它估计更合理的行数？

原文

I'm trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I'm seeing that for a part of the tree the estimated number of rows drasticly differs from the actual number of rows - at its worst SQL server estimates that 1 row will be returned, when in actuality 55,000 rows are returned!

I'm trying to work out why this is - all of my statistics are up-to-date, and I've updated statistics with a FULLSCAN on several tables. I'm not using any user defined functions or table variables. As far as I can see SQL server should be able to exactly estimate how many rows are going to be returned, but it continues to choose a plan which cases it to perform tens of thousands of RDI lookups (when it is expecting to only perform 1 or 2).

What can I do to try and understand why the estimated number of rows is out by so much?

UPDATE: So looking at the plan I've found one node in particular which seems suspicous - its a table scan on a table using the following predecate:

status <> 5
AND [type] = 1
OR [type] = 2

This predicate returns the entire table (630 rows - the table scan itself it NOT the source of the poor performance) however SQL server has the estimated number of rows at just 37. SQL server then goes on to do several nested loops with this onto RDI lookups, index scans and index seeks. Could this be the source of my massive miscalculation? How do I get it to estimate a more sensible number of rows?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙女 2024-08-13 00:02:35

重建索引可能会解决估计行值不正确的问题

回复收藏 0 原文

吹梦到西洲 2024-08-13 00:02:34

SQL Server 将每个索引拆分为最多 200 个范围，其中包含以下数据（来自此处）：

RANGE_HI_KEY
显示直方图步长上限的键值。
RANGE_ROWS
指定范围内有多少行（它们小于此RANGE_HI_KEY，但大于之前较小的RANGE_HI_KEY）。
EQ_ROWS
指定有多少行完全等于 RANGE_HI_KEY。
AVG_RANGE_ROWS
范围内每个不同值的平均行数。
DISTINCT_RANGE_ROWS
指定此范围内有多少个不同的键值（不包括 RANGE_HI_KEY 之前的前一个键和 RANGE_HI_KEY 本身）；

通常，大多数填充的值都会进入 RANGE_HI_KEY。

然而，它们可能会进入该范围，这可能会导致分布倾斜。

想象一下这些数据（以及其他数据）：

键值行数

1          1
2          1
3          10000
4          1

SQL Server 通常构建两个范围：1 到 3 和 4 到下一个填充值，这使得这些统计信息：

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
3             2           10000    1               2

，这意味着当搜索 2 时，只有 1 行，最好使用索引访问。

但如果 3 进入该范围，则统计信息如下：

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
4             10002       1        3334            3

优化器认为键 2 有 3334 行，并且索引访问过多昂贵的。

SQL Server splits each index into up to 200 ranges with the following data (from here):

RANGE_HI_KEY
A key value showing the upper boundary of a histogram step.
RANGE_ROWS
Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).
EQ_ROWS
Specifies how many rows are exactly equal to RANGE_HI_KEY.
AVG_RANGE_ROWS
Average number of rows per distinct value inside the range.
DISTINCT_RANGE_ROWS
Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);

Usually, most populated values go into RANGE_HI_KEY.

However, they can get into the range and this can lead to the skew in distribution.

Imagine these data (among the others):

Key value Count of rows

1          1
2          1
3          10000
4          1

SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
3             2           10000    1               2

, which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

But if 3 goes inside the range, the statistics are these:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
4             10002       1        3334            3

The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.

回复收藏 0 原文

飘过的浮云 2024-08-13 00:02:34

它使用为每个索引保留的统计数据。

（您还可以在非索引列上创建统计信息）

更新数据库中每个表的所有统计信息（警告：在非常大的数据库上将花费一些时间。不要在生产服务器上执行此操作没有与您的 DBA 检查...）：

exec sp_msforeachtable 'UPDATE STATISTICS ?'

如果您没有定期计划的工作来重建最活跃的索引（即大量插入或删除），您应该考虑重建索引（与上述相同的警告适用）：

exec sp_msforeachtable "DBCC DBREINDEX('?')"

< a href="http://msdn.microsoft.com/en-us/library/dd535534.aspx" rel="nofollow noreferrer">Microsoft SQL Server 2008 中查询优化器使用的统计信息

It uses statistics, which it keeps for each index.

(You can also create statistics on non-indexed columns)

To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):

exec sp_msforeachtable 'UPDATE STATISTICS ?'

If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):

exec sp_msforeachtable "DBCC DBREINDEX('?')"

Statistics Used by the Query Optimizer in Microsoft SQL Server 2008

回复收藏 0 原文

靖瑶 2024-08-13 00:02:34

由于您已经更新了统计信息，我会尝试消除任何参数嗅探：

CREATE PROCEDURE xyz
(
    @param1 int
    ,@param2 varchar(10)

)AS

DECLARE @param_1 int
       ,@param_2 varchar(10)

SELECT @param_1=@param1
      ,@param_2=@param2

...complex query here....
...WHERE column1=@param_1 AND column2=@param_2....

go

Since you already updated the statistics, I'd try to eliminate any parameter sniffing:

CREATE PROCEDURE xyz
(
    @param1 int
    ,@param2 varchar(10)

)AS

DECLARE @param_1 int
       ,@param_2 varchar(10)

SELECT @param_1=@param1
      ,@param_2=@param2

...complex query here....
...WHERE column1=@param_1 AND column2=@param_2....

go

回复收藏 0 原文

~没有更多了~