SQL Server 查询的奇怪执行计划

发布于 2024-12-01 04:36:20 字数 586 浏览 2 评论 0原文

上下文:SQL Server 2008。有 2 个表需要内连接。 事实表有 4000 万行,包含患者密钥、服用的药物以及其他事实。按该顺序组合的药物密钥和患者密钥有一个唯一的索引(非聚集)。 维度表是药物列表(70 行)。 join是根据药物密钥(代理密钥)获取药物代码(业务代码)。 查询:

SELECT a.PKey, a.SomeFact, b.MCode
FROM tblFact a
JOIN tblDIM b ON a.MKey = b.MKey

返回的所有列都是整数。 上面的查询运行了 7 分钟,其执行计划显示使用了 (MKey,PKey) 上的索引。该索引是在运行之前重建的。 当我禁用事实表上的索引(或将数据复制到具有相同结构但没有索引的新表)时,相同的查询只需要 1 分 40 分钟。

IO 统计数据也令人惊叹。

带索引:表“tblFACT”。扫描计数 70,逻辑读取 190296338,物理读取 685138,预读读取 98713

没有索引:表“tblFACT_copy”。扫描计数 17,逻辑读取 468891,物理读取 0,预读读取 419768

问题:为什么它尝试使用索引并沿着低效路径前进?

Context: SQL Server 2008. There are 2 tables to inner join.
The fact table, which has 40 million rows, contains the patient key and the medications administered and other facts. There is a unique index (nonclustered) on medication key and patient key combined in that order.
The dimension table is the medication list (70 rows).
The join is to get the medication code (business code) based on medication key (surrogate key).
Query:

SELECT a.PKey, a.SomeFact, b.MCode
FROM tblFact a
JOIN tblDIM b ON a.MKey = b.MKey

All the columns returned are integer.
The above query runs in 7 minutes and its execution plan shows the index on (MKey,PKey) is used. The index was rebuilt right before the run.
When I disabled the index on the fact table (or copy data to a new table with same structure but without index), the same query takes only 1:40 minutes.

IO Statistics are also stunning.

With index: Table 'tblFACT'. Scan count 70, logical reads 190296338, physical reads 685138, read-ahead reads 98713

Without index: Table 'tblFACT_copy'. Scan count 17, logical reads 468891, physical reads 0, read-ahead reads 419768

Question: why does it try to use the index and head down the inefficient path?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你的背包 2024-12-08 04:36:20

您需要将 SomeFact 添加为 tblFact 索引上的 INCLUDE,以使其 覆盖

目前,该表将被访问两次:一次用于索引,然后再次进行查找以获取 SomeFact 作为 RID 或键查找(取决于是否有聚集索引)

这不会适用于 tblDIM ,因为我假设 MKey 是聚集索引,这使得它隐式覆盖

You need to add SomeFact as an INCLUDE on the tblFact index to make it covering.

Currently, the table will be accessed twice: once for the index and then again for a lookup to get SomeFact either as a RID or key lookup (depends on if there is a clustered index)

This doesn't apply to tblDIM because I assume that MKey is the clustered index which makes it covering implicitly

丢了幸福的猪 2024-12-08 04:36:20

在极少数情况下,数据库会选择不正确的执行计划。在本例中,索引用于连接,但由于所有数据都是从两个表中获取的,因此扫描整个表会更快。
如果向查询添加 WHERE 子句,索引版本会快得多,因为没有索引,它仍然需要扫描整个表,而不是只获取所需的少数记录。

可能有指令鼓励数据库不要使用索引或使用不同的索引,但我不太了解SQL Server。

In rare cases, the database chooses an incorrect execution plan. In this case, the index is used for the join, but since all data is fetched from both tables, it would be faster to just scan the whole table.
The indexed version will be much faster if you add a WHERE clause to the query, because without indexes it will still need to scan the whole table, instead of grabbing just the handful of records it needs.

There may be directives to encourage the database not to use indexes or use different indexes, but I don't know SQL server that well.

心病无药医 2024-12-08 04:36:20

您的统计数据是最新的吗?检查:

SELECT object_name = Object_Name(ind.object_id)
,      IndexName = ind.name
,      StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM   SYS.INDEXES ind
order by
       STATS_DATE(ind.object_id, ind.index_id) desc

更新:

exec sp_updatestats;

Are your statistics up to date? Check with:

SELECT object_name = Object_Name(ind.object_id)
,      IndexName = ind.name
,      StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM   SYS.INDEXES ind
order by
       STATS_DATE(ind.object_id, ind.index_id) desc

Update with:

exec sp_updatestats;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文