MySQL聚类与非聚类索引性能

发布于 2025-02-13 22:20:16 字数 776 浏览 2 评论 0原文

我正在对MySQL群集与非聚类索引进行几个测试,其中我有一个表100GB_Table包含〜6000万行的:

100gb_table schema:
CREATE TABLE 100gb_table (
    id int PRIMARY KEY NOT NULL AUTO_INCREMENT,
    c1 int,
    c2 text,
    c3 text,
    c4 blob NOT NULL,
    c5 text,
    c6 text,
    ts timestamp NOT NULL default(CURRENT_TIMESTAMP)
);

我正在执行一个查询,该查询只读取群集索引:

SELECT id FROM 100gb_table ORDER BY id;

i'' m看到这个查询完成了几乎 〜55分钟,这很慢。我通过在主键列的顶部添加另一个索引来修改表,并运行以下查询,该查询迫使要使用的非群集索引:

SELECT id FROM 100gb_table USE INDEX (non_clustered_key) ORDER BY id;

< 10分钟中完成,比阅读快得多。与群集索引。为什么这两者之间存在如此巨大的差异?我的理解是,这两个索引都将索引列的值存储在树结构中,除了群集索引包含叶子节点中的表数据,因此我希望两个查询都表现出类似的性能。 Blob列可能会扭曲聚类索引结构吗?

I'm running a couple tests on MySQL Clustered vs Non Clustered indexes where I have a table 100gb_table which contains ~60 million rows:

100gb_table schema:
CREATE TABLE 100gb_table (
    id int PRIMARY KEY NOT NULL AUTO_INCREMENT,
    c1 int,
    c2 text,
    c3 text,
    c4 blob NOT NULL,
    c5 text,
    c6 text,
    ts timestamp NOT NULL default(CURRENT_TIMESTAMP)
);

and I'm executing a query that only reads the clustered index:

SELECT id FROM 100gb_table ORDER BY id;

I'm seeing that it takes almost an ~55 min for this query to complete which is strangely slow. I modified the table by adding another index on top of the Primary Key column and ran the following query which forces the non-clustered index to be used:

SELECT id FROM 100gb_table USE INDEX (non_clustered_key) ORDER BY id;

This finished in <10 minutes, much faster than reading with the clustered index. Why is there such a large discrepancy between these two? My understanding is that both indexes store the index column's values in a tree structure, except the clustered index contains table data in the leaf nodes so I would expect both queries to be similarly performant. Could the BLOB column possibly be distorting the clustered index structure?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

愁杀 2025-02-20 22:20:16

答案是如何列出数据。

主键与数据一起“聚集”;也就是说,数据由B+树结构中的PK订购。要读取所有ids,必须读取整个btree。

任何次级索引也都在B+树结构中,但它包含(1)索引的列,以及(2)PK中的任何其他列。

在您的示例中(大概有很多[大概]笨重的列),数据btree比辅助索引大得多(仅ID)。任何一个测试都可能需要从磁盘中读取所有相关块。

旁注...这还不如可能。一排的大小约为8KB。 文本blob列(当简短)中包含在该8KB中。但是,当一个人笨重时,它会放在另一个地方,留下文本/斑点的“指针”。因此,数据btree的主要部分比直接包含所有文本/斑点数据的情况小。

由于从tbl中选择ID是一个不必要的查询,因此InnoDB的设计不必担心您发现的效率低下。

通过或在其中等上订购订单,并且可以进行许多不同的优化。您甚至可能会发现索引(C1)将使您的查询在不超过10分钟内运行。 (我想我已经为您提供了“为什么”的所有线索。)

另外,如果您从TBL 中完成了选择 *,则可能需要超过55分钟。这是因为有额外的[随机]获取以获取“外记录”存储中的文本/斑点。从网络时间到铲除更多数据。

The answer comes in how the data is laid out.

The PRIMARY KEY is "clustered" with the data; that is, the data is order ed by the PK in a B+Tree structure. To read all of the ids, the entire BTree must be read.

Any secondary index is also in a B+Tree structure, but it contains (1) the columns of the index, and (2) any other columns in the PK.

In your example (with lots of [presumably] bulky columns), the data BTree is a lot bigger than the secondary index (on just id). Either test probably required reading all the relevant blocks from the disk.

A side note... This is not as bad as it could be. There is a limit of about 8KB on how big a row can be. TEXT and BLOB columns, when short enough, are included in that 8KB. But when one is bulky, it is put in another place, leaving behind a 'pointer' to the text/blob. Hence, the main part of the data BTree is smaller than it might be if all the text/blob data were included directly.

Since SELECT id FROM tbl is a mostly unnecessary query, the design of InnoDB does not worry about the inefficiency you discovered.

Tack on ORDER BY or WHERE, etc, and there are many different optimizations that could into play. You might even find that INDEX(c1) will let your query run in not much more than 10 minutes. (I think I have given you all the clues for 'why'.)

Also, if you had done SELECT * FROM tbl, it might have taken much longer than 55 minutes. This is because of having extra [random] fetches to get the texts/blobs from the "off-record" storage. And from the network time to shovel far more data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文