为什么 Oracle 不使用我的索引，除非我告诉它？

发布于 2024-12-07 17:50:55 字数 3324 浏览 1 评论 0原文

我有一个索引：

CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));

和一个 SQL 语句：

SELECT COUNT(*) 
  FROM (SELECT COUNT(*) 
          FROM EMPLOYEE 
         GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) 
        HAVING COUNT(*) > 100);

但它会继续执行全表扫描而不是使用索引，除非我添加提示。

EMPSHIRTNO 不是主键，EMPNO 才是（此处未使用）。

复杂查询

EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
                                        GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
                                       HAVING COUNT(*) > 100);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     1 |       |    24   (9)| 00:00:01 |
|   1 |  SORT AGGREGATE       |          |     1 |       |            |          |
|   2 |   VIEW                |          |   497 |       |    24   (9)| 00:00:01 |
|*  3 |    FILTER             |          |       |       |            |          |
----------------------------------------------------------------------------------
|   4 |     HASH GROUP BY     |          |   497 |  2485 |    24   (9)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| EMPLOYEE |  9998 | 49990 |    22   (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------                             
   3 - filter(COUNT(*)>100)                                                     

17 rows selected.

ANALYZE INDEX BLAH VALIDATE STRUCTURE;

SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;

BTREE_SPACE USED_SPACE
----------- ----------
     176032     150274

简单查询：

EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;

PLAN_TABLE_OUTPUT                                                               
--------------------------------------------------------------------------------
Plan hash value: 2913724801                                                     

------------------------------------------------------------------------------  
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |  
------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT  |          |  9998 |   439K|    23   (5)| 00:00:01 |  
|   1 |  TABLE ACCESS FULL| EMPLOYEE |  9998 |   439K|    23   (5)| 00:00:01 |  
------------------------------------------------------------------------------  

8 rows selected.

也许是因为 NOT NULL 约束是通过 CHECK 约束强制执行的，而不是最初在建表语句中定义的？当我这样做时，它将使用索引：

SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';

对于那些建议它无论如何都需要读取所有行的人（我认为它在计数时不会这样做），索引也不用于此：

SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;

事实上，将EMPSHIRTNO 上的索引并执行 SELECT EMPSHIRTNO FROM EMPLOYEE；也不使用索引。我应该指出 EMPSHIRTNO 不是唯一的，表中存在重复项。

原文

I have an index:

CREATE INDEX BLAH ON EMPLOYEE(SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4));

and an SQL STATEMENT:

SELECT COUNT(*) 
  FROM (SELECT COUNT(*) 
          FROM EMPLOYEE 
         GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) 
        HAVING COUNT(*) > 100);

but it keeps doing a full table scan instead of using the index unless I add a hint.

EMPSHIRTNO is not the primary key, EMPNO is (which isn't used here).

Complex query

EXPLAIN PLAN FOR SELECT COUNT(*) FROM (SELECT COUNT(*) FROM EMPLOYEE
                                        GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)
                                       HAVING COUNT(*) > 100);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1712471557
----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     1 |       |    24   (9)| 00:00:01 |
|   1 |  SORT AGGREGATE       |          |     1 |       |            |          |
|   2 |   VIEW                |          |   497 |       |    24   (9)| 00:00:01 |
|*  3 |    FILTER             |          |       |       |            |          |
----------------------------------------------------------------------------------
|   4 |     HASH GROUP BY     |          |   497 |  2485 |    24   (9)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| EMPLOYEE |  9998 | 49990 |    22   (0)| 00:00:01||
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------                             
   3 - filter(COUNT(*)>100)                                                     

17 rows selected.

ANALYZE INDEX BLAH VALIDATE STRUCTURE;

SELECT BTREE_SPACE, USED_SPACE FROM INDEX_STATS;

BTREE_SPACE USED_SPACE
----------- ----------
     176032     150274

Simple query:

EXPLAIN PLAN FOR SELECT * FROM EMPLOYEE;

PLAN_TABLE_OUTPUT                                                               
--------------------------------------------------------------------------------
Plan hash value: 2913724801                                                     

------------------------------------------------------------------------------  
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |  
------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT  |          |  9998 |   439K|    23   (5)| 00:00:01 |  
|   1 |  TABLE ACCESS FULL| EMPLOYEE |  9998 |   439K|    23   (5)| 00:00:01 |  
------------------------------------------------------------------------------  

8 rows selected.

Maybe it is because the NOT NULL constraint is enforced via a CHECK constraint rather than being defined originally in the table creation statement? It will use the index when I do:

SELECT * FROM EMPLOYEE WHERE SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) = '1234';

For those suggesting that it needs to read all of the rows anyway (which I don't think it does as it is counting), the index is not used on this either:

SELECT SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4) FROM EMPLOYEE;

In fact, putting an index on EMPSHIRTNO and performing SELECT EMPSHIRTNO FROM EMPLOYEE; does not use the index either. I should point out that EMPSHIRTNO is not unique, there are duplicates in the table.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不回头走下去 2024-12-14 17:50:55

由于查询的性质，它无论如何都需要扫描表的每一行。因此，Oracle 可能认为全表扫描是最有效的方法。因为它使用HASH GROUP BY，所以最后没有像oracle 7 days那样令人讨厌的排序。

首先获取衬衫编号的每个 SUBSTR(...) 的数量。因此，查询的第一部分必须扫描整个表。

SELECT COUNT(*) 
FROM EMPLOYEE 
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)

接下来，您要丢弃计数 <= 100 的 SUBSTR(...)。Oracle 需要扫描所有行来验证这。从技术上讲，您可能会说，一旦它有 101，就不再需要了，但我认为 Oracle 无法解决这个问题，特别是当您询问它 SELECT COUNT(*) 中的总数是多少时子查询的。

HAVING COUNT(*) > 100);

所以基本上为了给你你想要的答案，Oracle需要扫描表中的每一行，所以索引对过滤没有帮助。因为它使用哈希分组依据，所以索引对分组也没有帮助。因此，使用索引只会减慢查询速度，这就是 Oracle 不使用它的原因。

Because of the nature of your query it needs to scan every row of the table anyway. So oracle is probably deciding that a full table scan is the most efficient way to do this. Because its using a HASH GROUP BY there is no nasty sort at the end like in oracle 7 days.

First get the count per SUBSTR(...) of shirt no. Its thus first part of the query which has to scan the entire table

SELECT COUNT(*) 
FROM EMPLOYEE 
GROUP BY SUBSTR(TO_CHAR(EMPSHIRTNO), 1, 4)

Next you want to discard the SUBSTR(...) where the count is <= 100. Oracle needs to scan all rows to verify this. Technically you could argue that once it has 101 it doesn't need any more, but I don't think Oracle can work this out, especially as you are asking it what the total numer is in the SELECT COUNT(*) of the subquery.

HAVING COUNT(*) > 100);

So basically to give you the answer you want Oracle needs to scan every row in the table, so an index is no help on filtering. Because its using a hash group by, the index is no help on the grouping either. So to use the index would just slow your query down, which is why Oracle is not using it.

回复收藏 0 原文

留一抹残留的笑 2024-12-14 17:50:55

我认为您可能需要在 SUBSTR(TO_CHAR(EMPSHIRTNO), 1,4); 上构建基于函数的索引； SQL 中的函数倾向于使列上的常规索引无效。

回复收藏 0 原文

余生共白头 2024-12-14 17:50:55

我相信@Codo是正确的。 Oracle无法确定表达式将始终为非空，然后必须假设某些空值可能不会
存储在索引中。

（似乎 Oracle应该能够确定该表达式不可为空。一般来说，任何随机 SUBSTR 表达式总是出现的机会
not null 可能非常低，也许 Oracle 只是将所有 SUBSTR 表达式集中在一起？）

您可以通过以下解决方法之一使索引可用于您的查询：

--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
    ,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);

I believe @Codo is correct. Oracle cannot determine that the expression will always be non-null, and then must assume that some nulls may not
be stored in the index.

(It seems like Oracle should be able to figure out that the expression is not nullable. In general, the chance of any random SUBSTR expression always being
not null is probably very low, maybe Oracle just lumps all SUBSTR expressions together?)

You can make the index usable for your query with one of these work-arounds:

--bitmap index:
create bitmap index blah on employee(substr(to_char(empshirtno), 1, 4));
--multi-column index:
alter table employee add constraint blah primary key (id, empshirtno);
--indexed virtual column:
create table employee(empshirtno varchar2(10) not null
    ,empshirtno_for_index as (substr(empshirtno,1,4)) not null );
create index blah on employee(empshirtno_for_index);

回复收藏 0 原文

~没有更多了~