多列 B 树索引是如何组织的
我想了解更好的索引组织。 假设我们有一个包含 2 列的表:
CREATE TABLE user(
name varchar(100)
,age int)
我们想要创建一个索引:
CREATE INDEX IDX_MultiColIdx on user(name,age)
B 树索引组织是什么样的?
对于一列,例如年龄,组织是明确的:每个非叶节点将包含一组用于搜索的整数键。哪些值包含 IDX_MultiColIdx B-Tree 索引的节点?
I want to understand better index organization.
Imagine we have a table with 2 columns:
CREATE TABLE user(
name varchar(100)
,age int)
We would like to create an index:
CREATE INDEX IDX_MultiColIdx on user(name,age)
How would B-Tree index organization look like?
In case of one column, say, age, the organization is clear: every non-leaf node would contain a set of integer keys which would be used for a search. Which values contain nodes of our IDX_MultiColIdx B-Tree index?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
name
、age
和行指针(RID
/ROWID
或聚集键,具体取决于表组织)的值),按字典顺序排序。它们的存储方式取决于数据类型和数据库系统。
通常,
CHAR
存储时右侧填充了与其大小相等的空格,而VARCHAR
则在前面添加了其长度。MyISAM
和其他一些引擎可以使用密钥压缩:一组密钥的匹配部分仅存储一次,其他密钥仅存储不同部分,如下所示:将存储为:
,其中
[x]
表示“从前一个键中获取前导x
个字符”Values of
name
,age
and the row pointer (RID
/ROWID
or clustered key, depending on the table organization), sorted lexicographically.How exactly they will be stored, depends on the datatype and database system.
Usually,
CHAR
is stored right-padded with spaces up to its size, whileVARCHAR
is prepended with its length.MyISAM
and some other engines can use key compression: the matching parts of a set of keys are only stored once, and the other keys only store the differing parts, like this:will be stored as:
, where
[x]
means "take leadingx
characters from the previous key"我假设您询问内部数据库实现,因为您提到“非叶节点”。
B树中的内部节点不需要存储完整的密钥;他们只需要存储分隔符键。前缀和后缀压缩意味着内部节点可以非常密集,从而降低 B 树的高度,从而提高整体性能。
例如,给定一个具有顺序键 <'Avery long string', 314159> 的索引。和<'不是相同的字符串',9348>,所有内部节点需要表示的是这些键之间的分隔,可以用单个字符来表示。类似地,当内部节点中要分离的键具有公共前缀时,该前缀只需要存储一次并表示它们的分歧点。
叶子节点需要存储完整的键值,可以存储在链表中进行键序遍历。可以通过使用前缀压缩或其他技术来压缩叶节点页面,以进一步降低树高。
有关这方面的良好参考,请参阅 Gray & 的“事务处理:概念和技术”。路透社,如果您想了解更多详细信息,请参阅参考资料。
I assume you're asking about the internal database implementation because you mention 'non-leaf nodes'.
The interior nodes in a b-tree do not need to store the full key; they only need to store separator keys. Prefix and suffix compression mean that interior nodes can be very dense, and therefore reduce the height of the b-tree and therefore improve overall performance.
For example, given an index with the sequential keys <'A very long string', 314159> and <'Not the same string', 9348>, all the interior node needs to represent is the separation between those those keys, which can be represented in a single character. In a similar way, when the keys to be separated in the interior node have a common prefix, that prefix need only be stored once and the point where they diverge represented.
The leaf nodes need to store the full key values, and can be stored in a linked list for key order traversal. Leaf node pages can be compressed by using prefix compression or other techniques to further reduce the tree height.
For a good reference on this, see "Transaction Processing: Concepts and Techniques" by Gray & Reuter, and follow the references if you want more detail.