综合指数如何运作?

发布于 2024-07-17 09:07:13 字数 330 浏览 4 评论 0原文

我之前已经在表上创建了复合索引(索引),并假设了它们的工作原理。 我只是好奇我的假设是否正确。

我假设当您列出索引的列顺序时,您还指定了索引的分组方式。 例如,如果您有列 abc,并且您以相同的顺序指定索引 a ASC< /code>、b ASCc ASC 那么生成的索引本质上将是 a 中每个“组”的许多索引。

它是否正确? 如果不是,生成的索引实际上会是什么样子?

I've created composite indexes (indices for you mathematical folk) on tables before with an assumption of how they worked. I was just curious if my assumption is correct or not.

I assume that when you list the order of columns for the index, you are also specifying how the indexes will be grouped. For instance, if you have columns a, b, and c, and you specify the index in that same order a ASC, b ASC, and c ASC then the resultant index will essentially be many indexes for each "group" in a.

Is this correct? If not, what will the resultant index actually look like?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

旧时浪漫 2024-07-24 09:07:13

复合索引的工作方式与常规索引类似,只是它们具有多值键。

如果您在字段 (a,b,c) 上定义索引,则记录首先按 a 排序,然后按 b 排序,最后按 c 排序。

例子:

| A | B | C |
-------------
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |

Composite indexes work just like regular indexes, except they have multi-values keys.

If you define an index on the fields (a,b,c) , the records are sorted first on a, then b, then c.

Example:

| A | B | C |
-------------
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |
对你再特殊 2024-07-24 09:07:13

复合索引就像字典中的普通字母索引,但涵盖两个或多个字母,如下所示:

AA - page 1
AB - page 12

等。

表行首先按索引中的第一列排序,然后按第二列排序,依此类推。

当您搜索时,它是可用的两列或按第一列。 如果您的索引是这样的:

AA - page 1
AB - page 12
…
AZ - page 245
BA - page 246
…

您可以使用它来搜索 2 个字母(表中的 = 2 列),或者像一个字母上的普通索引一样:

A - page 1
B - page 246
…

请注意如果是字典,页面本身是按字母顺序排列的。 这是CLUSTERED 索引的示例。

在普通的非CLUSTERED索引中,对页面的引用是有序的,就像在历史书中一样:

Gaul, Alesia: pages 12, 56, 78
Gaul, Augustodonum Aeduorum: page 145
…
Gaul, Vellaunodunum: page 24
Egypt, Alexandria: pages 56, 194, 213, 234, 267

当您ORDER BY两个或更多列时,也可以使用复合索引。 在这种情况下,DESC 子句可能会派上用场。

请参阅我的博客中有关在复合索引中使用 DESC 子句的文章:

Composite index is like a plain alphabet index in a dictionary, but covering two or more letters, like this:

AA - page 1
AB - page 12

etc.

Table rows are ordered first by the first column in the index, then by the second one etc.

It's usable when you search by both columns OR by first column. If your index is like this:

AA - page 1
AB - page 12
…
AZ - page 245
BA - page 246
…

you can use it for searching on 2 letters ( = 2 columns in a table), or like a plain index on one letter:

A - page 1
B - page 246
…

Note that in case of a dictionary, the pages themself are alphabetically ordered. That's an example of a CLUSTERED index.

In a plain, non-CLUSTERED index, the references to pages are ordered, like in a history book:

Gaul, Alesia: pages 12, 56, 78
Gaul, Augustodonum Aeduorum: page 145
…
Gaul, Vellaunodunum: page 24
Egypt, Alexandria: pages 56, 194, 213, 234, 267

Composite indexes may also be used when you ORDER BY two or more columns. In this case a DESC clause may come handy.

See this article in my blog about using DESC clause in a composite index:

慵挽 2024-07-24 09:07:13

最常见的索引实现使用 B 树来允许快速查找以及相当快速的范围扫描。 这里需要解释的内容太多,但这里有关于 B 树 的 Wikipedia 文章。 你是对的,你在创建索引中声明的第一列将是生成的 B 树中的高阶列。

对高阶列的搜索相当于范围扫描,B 树索引对于此类搜索非常有用。 看到这一点的最简单方法是与图书馆中尚未转换为在线目录的旧卡片目录进行类比。

如果您正在寻找姓氏为“Clemens”的作者的所有卡片,您只需转到作者目录,很快就会找到一个前面写着“CLE-CLI”的抽屉。 这是正确的抽屉。 现在,您在那个抽屉中进行一种非正式的二分搜索,以快速找到所有上面写着“克莱门斯,罗杰”或“克莱门斯,塞缪尔”的卡片。

但是假设您想要查找名字为“Samuel”的作者的所有卡片。 现在您已经陷入了困境,因为这些卡片并未集中在作者目录中的一个位置。 数据库中的复合索引也会发生类似的现象。

不同的 DBMS 的优化器在检测索引范围扫描和准确估计其成本方面的聪明程度有所不同。 并非所有索引都是 B 树。 您必须阅读特定 DBMS 的文档才能获取真实信息。

The most common implementation of indices uses B-trees to allow somewhat rapid lookups, and also reasonably rapid range scans. It's too much to explain here, but here's the Wikipedia article on B-trees. And you are right, the first column you declare in the create index will be the high order column in the resulting B-tree.

A search on the high order column amounts to a range scan, and a B-tree index can be very useful for such a search. The easiest way to see this is by analogy with the old card catalogs you have in libraries that have not yet converted to on line catalogs.

If you are looking for all the cards for Authors whose last name is "Clemens", you just go to the author catalog, and very quickly find a drawer that says "CLE- CLI" on the front. That's the right drawer. Now you do a kind of informal binary search in that drawer to quickly find all the cards that say "Clemens, Roger", or "Clemens, Samuel" on them.

But suppose you want to find all the cards for the authors whose first name is "Samuel". Now you're up the creek, because those cards are not gathered together in one place in the Author catalog. A similar phenomenon happens with composite indices in a database.

Different DBMSes differ in how clever their optimizer is at detecting index range scans, and accurately estimating their cost. And not all indices are B-trees. You'll have to read the docs for your specific DBMS to get the real info.

↙温凉少女 2024-07-24 09:07:13

不会。结果索引将是单个索引,但具有复合键。

KeyX = A、B、C、D; 键 Y = 1,2,3,4;

索引 KeyX、KeyY 实际上将是:A1、A2、A3、B1、B3、C3、C4、D2

因此,如果您需要通过 KeyX KeyY 查找某些内容 - 这将很快并且将使用单一索引。 像 SELECT ... WHERE KeyX = "B" AND KeyY = 3 之类的东西。

但理解这一点很重要:WHERE KeyX = ? 请求使用该索引,而 WHERE KeyY = ? 根本不会使用这样的索引。

No. Resultant index will be single index but with compound key.

KeyX = A,B,C,D; KeyY = 1,2,3,4;

Index KeyX, KeyY will be actually: A1,A2,A3,B1,B3,C3,C4,D2

So that in case you need to find something by KeyX and KeyY - that will be fast and will use single index. Something like SELECT ... WHERE KeyX = "B" AND KeyY = 3.

But it's important to understand: WHERE KeyX = ? requests will use that index, while WHERE KeyY = ? will NOT use such index at all.

横笛休吹塞上声 2024-07-24 09:07:13

哪些查询可以通过复合索引加速

一般来说,复合索引只能显着加速最后一列的不等式。

例如,xy 复合 B 树索引将:

  • 有效地加速:
    • x = 1 和 y = 2:两列相等
    • x = 1 且 y > 2:第一列相等加上第二列不相等
    • x = 1 ORDER BY y:紧跟在先前等式之后的 order by(x 在列索引顺序上位于 y 之前)可有效加速
  • 加速非常有限:
    • <代码>x > 1 且 y > 2:两列都不相等,包括第一列
    • <代码>x > 1 和 y = 2:第一列不等式
    • <代码>y> 2:这相当于x > -无穷大并且y> 2,所以这只是复合 B 树索引可能出现的最坏情况。 但请注意,这种情况可以通过 B 树索引有效解决。
    • ORDER BY y:在之前的等式之后不立即进行排序无法有效加速

如果您需要两列上的不等式,那么您应该查看空间索引,例如 R 树,我已经给出了更多详细信息位于:什么是空间INDEX 以及什么时候应该使用它?

例如,考虑以下索引:

x|y

1|1
1|2
1|3
1|4
1|5
1|6

2|2
2|2
2|2
2|3
2|3
2|3
2|4
2|4
2|4

4|2
4|2
4|2
4|3
4|3
4|3
4|4
4|4
4|4

5|3
5|4
5|5
5|6
5|7
5|8

只有当索引中所有行都相邻时,索引才能非常显着地加快查询速度。

因此,如果我们想要:

x = 5 and y > 4

我们得到连续的:

5|5
5|6
5|7
5|8

但是如果我们想要:

x > 0 and y > 4

结果集将不是连续的,并且意味着一堆无用的扫描:

1|5
1|6
5|5
5|6
5|7
5|8

相关:索引中列的顺序有多重要?

Which queries can be sped up by composite indices or not

In general, a composite index can only significantly speed up inequalities on the last column.

E.g. a x-y composite B-tree index would:

  • efficiently speed up:
    • x = 1 and y = 2: equalities in both columns
    • x = 1 and y > 2: equality on first column plus inequality on the second column
    • x = 1 ORDER BY y: order by immediately after previous equalities (x is before y on column index order) is efficiently sped up
  • very limited speed up:
    • x > 1 and y > 2: inequalities on both columns, including the first
    • x > 1 and y = 2: inequality on the first column
    • y > 2: this is equivalent to x > -infinity and y > 2, so it is just the worse case possible for the compound B-tree index. Note however, this case could be solved efficiently with a B-tree index.
    • ORDER BY y: order by not immediately after previous equalities is not efficiently sped up

If you need inequalities on both columns, then you should look into spatial indices such as R-trees, I've given more details at: What is a SPATIAL INDEX and when should I use it?

E.g. consider the following index:

x|y

1|1
1|2
1|3
1|4
1|5
1|6

2|2
2|2
2|2
2|3
2|3
2|3
2|4
2|4
2|4

4|2
4|2
4|2
4|3
4|3
4|3
4|4
4|4
4|4

5|3
5|4
5|5
5|6
5|7
5|8

An index can only speed up a query very significantly if all the rows are one next to the other in the index.

So if for example we want:

x = 5 and y > 4

we get the contiguous:

5|5
5|6
5|7
5|8

But if we want:

x > 0 and y > 4

the result set would not be contiguous, and would mean a bunch of useless scanning:

1|5
1|6
5|5
5|6
5|7
5|8

Related: How important is the order of columns in indexes?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文