Oracle 的第一个字符计数导航：计数查询不使用函数索引

发布于 2024-10-16 18:24:48 字数 1566 浏览 0 评论 0原文

甲骨文9i。我们有一个大表（约 1M 行），其中包含带有标题和作者列的内容。我们想编写为该内容的标题和作者提供 AZ 导航的视图（A：1300，B：45000，...）

首先进行一些没有任何索引的准备工作：

select * from content where substr(upper(title),0,1) = 'M'

好一点

select * from content where upper(title) LIKE 'M%'

对于这些内容来说，性能比 ExPlan

TABLE ACCESS content FULL Cost=1624

：两者都是即使没有任何索引也很快。现在是缓慢的部分：

select count(*) from CONTENT WHERE substr(upper(TITLE),0,1) = 'A';

ExPlan：

SORT AGGREGATE else like above.

现在是累积（这就是我们想要的，它真的很慢）：

select substr(upper(title),0,1) , count(*) from content group by substr(upper(title),0,1);

ExPlan：

SORT: GROUP BY COST=8069 / TABLE ACCESS on CONTENT FULL COST=1624

所以我开始创建一个功能索引：

create index CONTENT_TITLE_LETTER_IDX on CONTENT(substr(upper(TITLE),0,1));

这大大加快了单字母计数查询的速度：

select count(*) from CONTENT WHERE substr(upper(TITLE),0,1) = 'A';

ExPlan（它几乎实时响应）：

SORT AGGREGATE COST=1 / INDEX CONTENT_TITLE_LETTER_IDX RANGE SCAN COST=1

但是基本上查询相同内容的累积查询不使用索引（它显示与上面相同的解释计划）。我尝试了提示：

select /*+ index(CONTENT CONTENT_TITLE_LETTER_IDX) */ substr(upper(title),0,1) , count(*) from content group by substr(upper(title),0,1);

但它仍然很慢。我认为这可能是由于无序索引造成的，但我猜想即使我围绕所有 26 个可能的字母运行一个循环，单个查询 ( = 'letter' ) 也会更快！

谁知道如何告诉 Oracle 使用该索引（或者除了创建单字符列或表之外的替代方法）？

原文

Oracle 9i. We have a large table (~1M rows) containing our content that carries title and author columns. We'd like to write views that offer an A-Z navigation for titles and authors to that content ( A:1300, B:45000,...)

First some preparation without any indexing:

select * from content where substr(upper(title),0,1) = 'M'

performs a little better than

select * from content where upper(title) LIKE 'M%'

ExPlan for those:

TABLE ACCESS content FULL Cost=1624

both are pretty fast even without any indices. Now the slow part:

select count(*) from CONTENT WHERE substr(upper(TITLE),0,1) = 'A';

ExPlan:

SORT AGGREGATE else like above.

Now the cumulation (this is what we want, it's really slow):

select substr(upper(title),0,1) , count(*) from content group by substr(upper(title),0,1);

ExPlan:

SORT: GROUP BY COST=8069 / TABLE ACCESS on CONTENT FULL COST=1624

So I started creating a functional index:

create index CONTENT_TITLE_LETTER_IDX on CONTENT(substr(upper(TITLE),0,1));

This speeds up the single letter count query dramatically:

select count(*) from CONTENT WHERE substr(upper(TITLE),0,1) = 'A';

ExPlan (it responds nearly in realtime):

SORT AGGREGATE COST=1 / INDEX CONTENT_TITLE_LETTER_IDX RANGE SCAN COST=1

But the cumulation query which is basically querying the same thing is not using the index (it shows the same Explain Plan as above). I tried the hint:

select /*+ index(CONTENT CONTENT_TITLE_LETTER_IDX) */ substr(upper(title),0,1) , count(*) from content group by substr(upper(title),0,1);

but it still is very slow. I assume that this might be due to the unordered index but I'd guess that even if I ran a loop around all 26 possible letters the single query ( = 'letter' ) would be faster!

Who knows how to tell Oracle to use that index (or an alternative way besides creating one-char columns or tables) ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

゛时过境迁 2024-10-23 18:24:48

再看看您的查询：

select substr(upper(title),0,1) , count(*) 
from content 
group by substr(upper(title),0,1)

请注意没有任何 where 子句。事实上，您告诉数据库引擎获取所有行并计算每个首字母有多少行。您不能跳过任何行，因为否则您无法计算它。我认为索引不容易存储此类信息，因此完整扫描是最快的。如果您要求输入特定字母，则对索引使用范围扫描可能是有意义的。

如果您经常需要此信息，请创建一个汇总表，该汇总表将由主表上的触发器进行更新。

Take another look at your query:

select substr(upper(title),0,1) , count(*) 
from content 
group by substr(upper(title),0,1)

Note the absence of any where clause. In fact, you tell the database engine to take all rows and count how many rows there is for each initial letter. You cannot skip any row because else you can't count it. I don't think the index stores such information readily, so a full scan is the fastest you can have. If you asked for a specific letter, using a range scan on the index could make sense.

If you need this information often, create a summary table which would be updated by triggers on your main table.

回复收藏 0 原文

我是有多爱你 2024-10-23 18:24:48

好的，这回答了问题，但并没有真正解决问题（感谢 Gerrat，我们几乎是同一时间）：

我查看了这个有趣的问题告诉我 CBO 可能会拒绝包含 NULL 值的索引。考虑到我们的表包含以非字母字符和空格开头的标题（但永远不会为空...）我尝试了以下操作：

select   substr(upper(title),0,1), count(*) 
from content
 where substr(upper(title),0,1) >= 'A' AND substr(upper(title),0,1) <= 'Z'
group by substr(upper(title),0,1);

这极大地降低了成本：

SORT GROUP BY NOSORT COST=16 / CONTENT ACCESS BY INDEX ROWID COST=6 / CONTENT_TITLE_LETTER_IDX RANGE SCAN COST=2

从 8069 到 16

有趣的是，条件查询现在甚至是 比非调节的（平均 1.5 秒）慢（平均 2.7 秒）。添加更多条件（如 9000 所示）可以显着加快速度 - 即使它根本不使用字母索引。谢谢你
洞察力！

Ok, this answers the question but doesn't really solve the problem (thanks Gerrat, we're nearly same time):

I had a look at this interesting question that told me that the CBO might reject indices containing NULL values. Taking into account that our table contains title that start with non-alpha characters and blanks (but never are null...) I tried the following:

select   substr(upper(title),0,1), count(*) 
from content
 where substr(upper(title),0,1) >= 'A' AND substr(upper(title),0,1) <= 'Z'
group by substr(upper(title),0,1);

which dramatically decreases the costs:

SORT GROUP BY NOSORT COST=16 / CONTENT ACCESS BY INDEX ROWID COST=6 / CONTENT_TITLE_LETTER_IDX RANGE SCAN COST=2

from 8069 to 16

interestingly that conditioned query now even is slower (avg 2.7s) than the non-conditioned one (avg 1.5s). Adding more conditions as stated by 9000 dramatically speeds things up - even though it doesn't use the letter index at all. Thanks for that
insight!

回复收藏 0 原文

绮烟 2024-10-23 18:24:48

我不能向你保证这会更快，但你可以尝试：

select /*+ index(CONTENT CONTENT_TITLE_LETTER_IDX) */ substr(upper(title),0,1), 
count(*) from content 
where substr(upper(title),0,1) in ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 
'X', 'Y', 'Z')
group by substr(upper(title),0,1);

在我所做的类似测试中，这确实使用了索引（我没有尝试过其他任何方法），但成本实际上高于全表扫描）。

确保您的表和索引也被分析。

I can't promise you this will be faster, but you could try:

select /*+ index(CONTENT CONTENT_TITLE_LETTER_IDX) */ substr(upper(title),0,1), 
count(*) from content 
where substr(upper(title),0,1) in ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 
'X', 'Y', 'Z')
group by substr(upper(title),0,1);

On a similar test I did, this did use the index (nothing else I tried did), but the cost was actually higher than the full table scan).

Make sure your table and index are analyzed as well.

回复收藏 0 原文

~没有更多了~