- 话题 - 文江博客

当前位置：文江博客话题详情

indexing compound-index oracle-database

发布于 2024-11-26 17:13:16 字数 642 浏览 1 评论 0 原文

If I have a table TABLE_1 with, lets say 5 columns:

COL1    |   COL2    |   COL3    |   COL4    |   COL5
[line]
[line]
[...]

And there are two main queries I want to do:

SELECT * FROM table_a WHERE COL1 = 'X' and COL2 = 'Y'

And the other being:

SELECT * FROM table_a WHERE COL2 = 'Z'

Which indexes should I create? Creating one with columns COL1 and COL2 would index for both queries, or should I need another index just for COL2 in order to have the second query快点？

谢谢！

原文

If I have a table TABLE_1 with, lets say 5 columns:

COL1    |   COL2    |   COL3    |   COL4    |   COL5
[line]
[line]
[...]

And there are two main queries I want to do:

SELECT * FROM table_a WHERE COL1 = 'X' and COL2 = 'Y'

And the other being:

SELECT * FROM table_a WHERE COL2 = 'Z'

Which indexes should I create? Creating one with columns COL1 and COL2 would index for both queries, or should I need another index just for COL2 in order to have the second query faster?

thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对岸观火 2024-12-03 17:13:16

两个查询都可以使用 table_a(COL2, COL1) 上的常规 B 树索引。

单独在 COL2 上建立索引可能会更有效地仅使用此列上的筛选器来检索行，但附加索引将占用空间并会减慢插入速度（如果您更新此列，则还会减慢更新速度）。这是一个权衡。

回复收藏 0 原文

耳钉梦 2024-12-03 17:13:16

文森特在回答中这样说：

“table_a(COL2, COL1) 上的常规 B 树索引可供两者使用
查询。”

中的关键词是“可以”。因为这样的索引可能不会被任何一个查询使用，这也是事实。

数据库索引是一个复杂而微妙的主题。有关于该主题的整本书。Richard Foote 已成功维护（和 David Bowie）多年的博客。

href="http://richardfoote.wordpress.com/" rel="nofollow">一个只有多少行？ COL1 有多少个不同的值？ COL2 有多少个不同的值

？

在不知道有关该表的一些基本事实的情况下给出该问题的明确答案：它 TABLEA 仅包含十几行全表扫描可能会更有效。

如果 COL2 是唯一的，我们需要的唯一索引是 table_a(COL2)

如果 COL2 是非选择性的（与总数相比相对较少的值），行），那么第二个查询应该使用全表扫描而不是索引读取。

如果 COL2 是非选择性的，但 COL1 是高度选择性的（与总行数相比有很多值，但不是唯一的），则第一个查询应使用 table_a(COL2, COL1) 上的索引。

如果 COL2 不是特别有选择性，COL1 不是特别有选择性，但两者的组合具有高度选择性，则第一个查询应使用 table_a(COL2, COL1) 上的索引。全表扫描将是第二个查询的首选路径。

如果 COL1 是唯一的，它应该有一个索引，这将在第一个查询中使用，但显然在第二个查询中没有帮助。

然后是 NULL 的问题，这使问题变得更加复杂。除复合索引（以及某些其他特殊情况）外，NULL 不被索引。

真正完整的答案还可以解决倾斜问题：如果 COL1 中 90% 的行是“0”，而其余行是高度选择性的，那么索引可能有用，也可能没用。当我们收集索引时，我们可能需要生成直方图，但它们只有在查询使用文字而不是绑定变量时才真正有用。

In his answer Vincent says this:

"A regular B-tree index on table_a(COL2, COL1) could be used by both
queries."

The crucial word in that sentence is "could". Because it is also true that such an index might not be used by either query.

Database indexing is a complicated and subtle subject. There are entire books written on the topic. Richard Foote has managed to maintain a blog talking about nothing but Oracle Indexes (and David Bowie) for years.

We cannot give a definitive answer to the question without knowing some basic facts about the table: how many rows does it have? How many distinct values of COL1 are there? How many distinct values of COL2 are there?

So, let's look at some alternative answers.

If TABLEA contains only a dozen rows the chances are a Full Table Scan will be more efficient than any indexed read.

If COL2 is unique, the only index we need is table_a(COL2)

If COL2 is unselective (relatively few values compared to total number of rows) then the second query should use a Full Table Scan rather than an indexed read.

If COL2 is unselective but COL1 is highly selective (very many values compared to total number of rows, but not unique) then the first query should use an index on table_a(COL2, COL1).

If COL2 is not particularly selective and COL1 is not particularly selective but the combination of the two is highly selective then the first query should use an index on table_a(COL2, COL1). A Full Table Scan would be the preferred path for the second query.

If COL1 is unique it should have an index, which is what would be used in the first query, but it obviously wouldn't help in the second query.

Then there is the matter of NULLs, which futher complicate the issue. NULLs are not indexed, except in composite indexes (and certain other special cases).

A truly complete answer would also address the matter of skew: if 90% of the rows in COL1 are '0' and the rest are highly selective then an index may or may not be useful. We may need to generate histograms when we gather indexes, but they're only really helpful when a query uses literals rather than bind variables.

回复收藏 0 原文