SQL Server 查询的奇怪执行计划
上下文:SQL Server 2008。有 2 个表需要内连接。 事实表有 4000 万行,包含患者密钥、服用的药物以及其他事实。按该顺序组合的药物密钥和患者密钥有一个唯一的索引(非聚集)。 维度表是药物列表(70 行)。 join是根据药物密钥(代理密钥)获取药物代码(业务代码)。 查询:
SELECT a.PKey, a.SomeFact, b.MCode
FROM tblFact a
JOIN tblDIM b ON a.MKey = b.MKey
返回的所有列都是整数。 上面的查询运行了 7 分钟,其执行计划显示使用了 (MKey,PKey) 上的索引。该索引是在运行之前重建的。 当我禁用事实表上的索引(或将数据复制到具有相同结构但没有索引的新表)时,相同的查询只需要 1 分 40 分钟。
IO 统计数据也令人惊叹。
带索引:表“tblFACT”。扫描计数 70,逻辑读取 190296338,物理读取 685138,预读读取 98713
没有索引:表“tblFACT_copy”。扫描计数 17,逻辑读取 468891,物理读取 0,预读读取 419768
问题:为什么它尝试使用索引并沿着低效路径前进?
Context: SQL Server 2008. There are 2 tables to inner join.
The fact table, which has 40 million rows, contains the patient key and the medications administered and other facts. There is a unique index (nonclustered) on medication key and patient key combined in that order.
The dimension table is the medication list (70 rows).
The join is to get the medication code (business code) based on medication key (surrogate key).
Query:
SELECT a.PKey, a.SomeFact, b.MCode
FROM tblFact a
JOIN tblDIM b ON a.MKey = b.MKey
All the columns returned are integer.
The above query runs in 7 minutes and its execution plan shows the index on (MKey,PKey) is used. The index was rebuilt right before the run.
When I disabled the index on the fact table (or copy data to a new table with same structure but without index), the same query takes only 1:40 minutes.
IO Statistics are also stunning.
With index: Table 'tblFACT'. Scan count 70, logical reads 190296338, physical reads 685138, read-ahead reads 98713
Without index: Table 'tblFACT_copy'. Scan count 17, logical reads 468891, physical reads 0, read-ahead reads 419768
Question: why does it try to use the index and head down the inefficient path?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要将
SomeFact
添加为tblFact
索引上的 INCLUDE,以使其 覆盖。目前,该表将被访问两次:一次用于索引,然后再次进行查找以获取
SomeFact
作为 RID 或键查找(取决于是否有聚集索引)这不会适用于 tblDIM ,因为我假设 MKey 是聚集索引,这使得它隐式覆盖
You need to add
SomeFact
as an INCLUDE on thetblFact
index to make it covering.Currently, the table will be accessed twice: once for the index and then again for a lookup to get
SomeFact
either as a RID or key lookup (depends on if there is a clustered index)This doesn't apply to
tblDIM
because I assume thatMKey
is the clustered index which makes it covering implicitly在极少数情况下,数据库会选择不正确的执行计划。在本例中,索引用于连接,但由于所有数据都是从两个表中获取的,因此扫描整个表会更快。
如果向查询添加 WHERE 子句,索引版本会快得多,因为没有索引,它仍然需要扫描整个表,而不是只获取所需的少数记录。
可能有指令鼓励数据库不要使用索引或使用不同的索引,但我不太了解SQL Server。
In rare cases, the database chooses an incorrect execution plan. In this case, the index is used for the join, but since all data is fetched from both tables, it would be faster to just scan the whole table.
The indexed version will be much faster if you add a WHERE clause to the query, because without indexes it will still need to scan the whole table, instead of grabbing just the handful of records it needs.
There may be directives to encourage the database not to use indexes or use different indexes, but I don't know SQL server that well.
您的统计数据是最新的吗?检查:
更新:
Are your statistics up to date? Check with:
Update with: