SQL 查询进行全表扫描而不是基于索引的扫描
我有两个表:
create table big( id number, name varchar2(100));
insert into big(id, name) select rownum, object_name from all_objects;
create table small as select id from big where rownum < 10;
create index big_index on big(id);
在这些表上,如果我执行以下查询:
select *
from big_table
where id like '45%'
or id in ( select id from small_table);
它总是进行全表扫描。
Execution Plan
----------------------------------------------------------
Plan hash value: 2290496975
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3737 | 97162 | 85 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| BIG | 74718 | 1897K| 85 (3)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| SMALL | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=45 OR EXISTS (SELECT /*+ */ 0 FROM "SMALL" "SMALL"
WHERE "ID"=:B1))
3 - filter("ID"=:B1)
有什么方法可以重写查询,使其始终进行索引扫描。
I have two tables:
create table big( id number, name varchar2(100));
insert into big(id, name) select rownum, object_name from all_objects;
create table small as select id from big where rownum < 10;
create index big_index on big(id);
On these tables if I execute the following query:
select *
from big_table
where id like '45%'
or id in ( select id from small_table);
it always goes for a Full Table Scan.
Execution Plan
----------------------------------------------------------
Plan hash value: 2290496975
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3737 | 97162 | 85 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| BIG | 74718 | 1897K| 85 (3)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| SMALL | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=45 OR EXISTS (SELECT /*+ */ 0 FROM "SMALL" "SMALL"
WHERE "ID"=:B1))
3 - filter("ID"=:B1)
Are there any ways in which we can rewrite the Query So that it always goes for index Scan.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不,不,不。
您不希望它使用索引。幸运的是,甲骨文比这更聪明。
ID 是数字。虽然它的 ID 值可能为 45,450,451,452,4501,45004,4500003 等,但在索引中这些值将分散在任何地方。如果您采用 ID BETWEEN 450 AND 459 等条件,那么可能值得使用索引。
要使用索引,必须从上到下扫描它(将每个 ID 转换为字符以进行 LIKE 比较)。然后,对于任何匹配,都必须获取 NAME 列。
它决定扫描表更容易、更快捷(表有 75,000 行,无论如何也不算大),而不是在索引和表之间来回移动。
No, no and no.
You do NOT want it to use an index. Luckily Oracle is smarter than that.
ID is numeric. While it might have ID values of 45,450,451,452,4501,45004,4500003 etc, in the indexes these values will be scattered anywhere and everywhere. If you went with a condition such as ID BETWEEN 450 AND 459, then it may be worth using the index.
To use the index it would have to scan it all the way from top to bottom (converting each ID to a character to do the LIKE comparison). Then, for any match, it has to go off to get the NAME column.
It has decided that it is easier to and quicker to scan the table (which, with 75,000 rows isn't that big anyway) rather than mucking about going back and forth between the index and the table.
其他人是对的,你不应该使用这样的数字列。
然而,在这种情况下,实际上是
OR
构造导致了(性能)问题。我不知道版本 11 中是否有所不同,但到版本 10gr2 为止,它会导致过滤操作,基本上是带有相关子查询的嵌套循环。在您的情况下,使用数字列作为 varchar 也会导致全表扫描。您可以像这样重写您的查询:
通过您的测试用例,我最终得到的行数为 174000 行(大行)和 9 行(小行)。
运行查询需要 7 秒,有 1211399 次一致获取。
运行我的查询 0.7 秒并使用 542 次一致获取。
我的查询的解释计划是:
The others are right, you shouldn't use a numeric column like that.
However, it is actually, the
OR <subquery>
construct that is causing a (performance) problem in this case. I don't know if it is different in version 11, but up to version 10gr2, it causes a a filter operation with what is basically a nested loop with a correlated subquery. In your case, the use of a numeric column as a varchar also results in a full table scan.You can rewrite your query like this:
With your test case, I end up with a row count of 174000 rows in big and 9 small.
Running your query takes 7 seconds with 1211399 consistent gets.
Running my query 0,7 seconds and uses 542 consistent gets.
The explain plans for my query is:
像这样的事情可能会起作用:
Something like this might work: