多列 B 树索引是如何组织的

发布于 2024-09-19 13:04:17 字数 329 浏览 19 评论 0原文

我想了解更好的索引组织。 假设我们有一个包含 2 列的表:

CREATE TABLE user( 
  name varchar(100)
 ,age int)

我们想要创建一个索引:

CREATE INDEX IDX_MultiColIdx on user(name,age)

B 树索引组织是什么样的?

对于一列,例如年龄,组织是明确的:每个非叶节点将包含一组用于搜索的整数键。哪些值包含 IDX_MultiColIdx B-Tree 索引的节点?

I want to understand better index organization.
Imagine we have a table with 2 columns:

CREATE TABLE user( 
  name varchar(100)
 ,age int)

We would like to create an index:

CREATE INDEX IDX_MultiColIdx on user(name,age)

How would B-Tree index organization look like?

In case of one column, say, age, the organization is clear: every non-leaf node would contain a set of integer keys which would be used for a search. Which values contain nodes of our IDX_MultiColIdx B-Tree index?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

岛徒 2024-09-26 13:04:17

哪些值包含我们的 IDX_MultiColIdx B-Tree 索引的节点?

nameage 和行指针(RID/ROWID 或聚集键,具体取决于表组织)的值),按字典顺序排序。

它们的存储方式取决于数据类型和数据库系统。

通常,CHAR 存储时右侧填充了与其大小相等的空格,而 VARCHAR 则在前面添加了其长度。

MyISAM 和其他一些引擎可以使用密钥压缩:一组密钥的匹配部分仅存储一次,其他密钥仅存储不同部分,如下所示:

Hamblin
Hamblin, California
Hamblin (surname)
Hambling Baronets
Hambly
Hambly Arena    
Hambly Arena Fire
Hambo
Hambo Lama Itigelov
Hambok
Hambone

将存储为:

Hamblin
[7], California
[7] (surname)
[7]g Baronets
Hambly
[6] Arena   
[6] Arena Fire
Hambo
[5] Lama Itigelov
[5]k
[5]ne

,其中 [x] 表示“从前一个键中获取前导 x 个字符”

Which values contains nodes of our IDX_MultiColIdx B-Tree index?

Values of name, age and the row pointer (RID/ROWID or clustered key, depending on the table organization), sorted lexicographically.

How exactly they will be stored, depends on the datatype and database system.

Usually, CHAR is stored right-padded with spaces up to its size, while VARCHAR is prepended with its length.

MyISAM and some other engines can use key compression: the matching parts of a set of keys are only stored once, and the other keys only store the differing parts, like this:

Hamblin
Hamblin, California
Hamblin (surname)
Hambling Baronets
Hambly
Hambly Arena    
Hambly Arena Fire
Hambo
Hambo Lama Itigelov
Hambok
Hambone

will be stored as:

Hamblin
[7], California
[7] (surname)
[7]g Baronets
Hambly
[6] Arena   
[6] Arena Fire
Hambo
[5] Lama Itigelov
[5]k
[5]ne

, where [x] means "take leading x characters from the previous key"

吃素的狼 2024-09-26 13:04:17

我假设您询问内部数据库实现,因为您提到“非叶节点”。

B树中的内部节点不需要存储完整的密钥;他们只需要存储分隔符键。前缀和后缀压缩意味着内部节点可以非常密集,从而降低 B 树的高度,从而提高整体性能。

例如,给定一个具有顺序键 <'Avery long string', 314159> 的索引。和<'不是相同的字符串',9348>,所有内部节点需要表示的是这些键之间的分隔,可以用单个字符来表示。类似地,当内部节点中要分离的键具有公共前缀时,该前缀只需要存储一次并表示它们的分歧点。

叶子节点需要存储完整的键值,可以存储在链表中进行键序遍历。可以通过使用前缀压缩或其他技术来压缩叶节点页面,以进一步降低树高。

有关这方面的良好参考,请参阅 Gray & 的“事务处理:概念和技术”。路透社,如果您想了解更多详细信息,请参阅参考资料。

I assume you're asking about the internal database implementation because you mention 'non-leaf nodes'.

The interior nodes in a b-tree do not need to store the full key; they only need to store separator keys. Prefix and suffix compression mean that interior nodes can be very dense, and therefore reduce the height of the b-tree and therefore improve overall performance.

For example, given an index with the sequential keys <'A very long string', 314159> and <'Not the same string', 9348>, all the interior node needs to represent is the separation between those those keys, which can be represented in a single character. In a similar way, when the keys to be separated in the interior node have a common prefix, that prefix need only be stored once and the point where they diverge represented.

The leaf nodes need to store the full key values, and can be stored in a linked list for key order traversal. Leaf node pages can be compressed by using prefix compression or other techniques to further reduce the tree height.

For a good reference on this, see "Transaction Processing: Concepts and Techniques" by Gray & Reuter, and follow the references if you want more detail.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文