优化 IN 子句
我使用的是Oracle 10g。这是我的查询
select * from Entries
where RefKey in (select RefKey
from Entries
where KeyStat = 1)
and RefKey = Key;
这里 RefKey、Key 和 KeyStat 都已索引。该表在此处未使用的另一列上进行分区。 在此查询中,我选择当前处于活动状态的主密钥(如果 RefKey = Key 则为 master)(KeyStat = 1)。以下是使用 SQLTools 1.21 RC3 的该查询的执行计划。
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 1 | NESTED LOOPS SEMI | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 2 | PARTITION RANGE ALL | | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 3 | TABLE ACCESS FULL | ENTRIES | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| ENTRIES | 10M| 77M| 3 (0)| 00:00:01 | ROWID | ROWID |
|* 5 | INDEX RANGE SCAN | IND_ENTR_REFKEY| 1 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
我担心 ID = 3“表访问已满”。如果此查询中使用的所有列都已建立索引,那么为什么 Oracle 会进行全表扫描。
如何对此进行优化?如果我在内部查询中放入一些值,它返回的速度会快得多。
为了解释为什么需要子查询:我选择至少具有一个活动键的整个批次。 Refkey 并不是唯一的;例如:
Key=1, RefKey=1, Stat=1
Key=2, RefKey=1, Stat=0
Key=3, RefKey=2, Stat=1
I am using Oracle 10g. Here is my query
select * from Entries
where RefKey in (select RefKey
from Entries
where KeyStat = 1)
and RefKey = Key;
Here RefKey, Key and KeyStat all are indexed. The table is partitioned on another column which is not being used here.
In this query, i am selecting master key (if RefKey = Key then master) currently active (KeyStat = 1). Here is the execution plan of this query using SQLTools 1.21 RC3.
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 1 | NESTED LOOPS SEMI | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 2 | PARTITION RANGE ALL | | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 3 | TABLE ACCESS FULL | ENTRIES | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| ENTRIES | 10M| 77M| 3 (0)| 00:00:01 | ROWID | ROWID |
|* 5 | INDEX RANGE SCAN | IND_ENTR_REFKEY| 1 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
I am concerned about ID = 3 "TABLE ACCESS FULL". If all the columns being used in this query are indexed then why oracle is doing full table scan.
How this can be optimized? If i put some values in inner query, it returns much faster.
To explain why the sub-query is necessary: I am selecting the whole batch having at least one active key. Refkey is not unique; for instance:
Key=1, RefKey=1, Stat=1
Key=2, RefKey=1, Stat=0
Key=3, RefKey=2, Stat=1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
优化器忽略 KEYSTAT 上的索引。我猜这是因为 KEYSTAT 的选择性不是很强(相对较少的不同值)和/或因为这些值均匀分布在 ENTRIES 表的整个范围内。如果查询几乎会命中表中的每个块,全表扫描是最好的路径,
通过过滤子查询所获得的速度的提高验证了这一点,
将语句重构为。删除子查询将是提高性能的最佳方法。
倾斜的数据分布通常是性能问题的根源。您看,问题是,数据库不知道 KEYSTAT=1 的选择性要大得多除非我们告诉它,否则您可能需要考虑在收集该索引的统计信息时创建直方图。 href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i42219" rel="nofollow">了解更多信息
请注意,直方图可能会导致错误 。问题并解决它们,尤其是使用绑定变量而不是文字的查询,因此在将它们投入生产之前在沙坑中进行基准测试。
The optimizer is ignoring the index on KEYSTAT. I would guess this is because KEYSTAT is not very selective (relatively few distinct values) and/or because those values are evenly distributed across the entire range of the ENTRIES table. If a query is going to hit pretty much every block in a table, FULL TABLE SCAN is the best path.
This guess is validated by the increased speed you get by filtering the sub-query.
As the others have suggested, refactoring your statement to remove the sub-query would be the best way to improve performance.
Skewed datadistribution is often a source of performance problems. You see, the thing is, the database doesn't know that KEYSTAT=1 is vastly more selective than KEYSTAT=0. Unless we tell it, which is why you might want to consider creating histograms when you gather statistics on that index. Find out more.
Note that histograms can cause problems as well as solve them, especially with queries which use bind variables instaead of literals. So benchmark in a sandpit before putting them into production.
也许我错过了一些东西,但这不应该产生相同的结果吗?
Maybe I miss something but shouldn't this yield the same result?
你能做一个吗?
你应该得到一个解释计划,最后有一堆额外的信息。具体来说,您对该操作 3 感兴趣。它不仅执行全表扫描,而且仅给出行/基数 1。实际上,这意味着它认为不会找到任何内容(因为它永远不会找到任何内容)。在那里给出零值)。
您可以看出它不会期望找到行,因为尽管操作 4 涉及 1000 万行,但它给出的成本为“3”,其中“2”是索引成本。它不期望从“3”开始的扫描找到任何东西,因此它永远不会期望将值插入到索引扫描中,因此它永远不会看到 10M 行。
所以我看到两个问题。首先,它不使用 KeyStat 上的索引。其次,它低估了匹配返回的行数。第一个可能是由数据类型不匹配引起的。也许 KeyStat 是一个字符值。也许它不是索引中的前导列。也许索引已被禁用(在 ETL 作业期间?)。
奇怪的估计也具有启发性。如果该列是索引中的前导列,我希望统计数据能够计算出该列的 max_val 为 1。如果它认为 max_val 为零,那就可以解释“你不会找到任何东西”估计,而且它没有可用于快速找到 max_val 的索引。
Can you do a
You should get an explain plan with a bunch of extra info at the end. Specifically, you are interested in that operation 3. Not only is it doing a full table scan, but it is only giving a rows/cardinality of 1. Really that means it doesn't think it is going to find anything (as it never gives a zero value there).
You can tell that it doesn't expect to ever find a row because although operation 4 talks about 10 million rows, it gives a cost of '3' and of that '2' is the index cost. It doesn't expect the scan from '3' to find anything so it never expects to plug a value into the index scan so it never is going to see 10M rows.
So I see two problems. Firstly, it doesn't use an index on KeyStat. Secondly, it is underestimating the number of rows returned by the match. the first might be caused by a datatype mismatch. Maybe KeyStat is a character value. Maybe it isn't the leading column in the index. Maybe the index has been disabled (during an ETL job ?).
The odd estimate is also suggestive. If the column was the leading column in an index, I'd expect the stats to have worked out that the max_val of the column is 1. If it thinks the max_val is zero, that would explain the "you ain't going to find anything" estimate but also that it didn't have an index to use to quickly find the max_val.