你能在 MySQL 中设计一个简单的例子,其中聚集索引是解决方案,而不是普通索引吗?

发布于 2024-08-26 09:18:53 字数 29 浏览 5 评论 0原文

我不明白聚集索引的意义,我们什么时候会受益?

I don't see the point of clustered index, when will we benefit?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

菊凝晚露 2024-09-02 09:18:53

聚集索引

聚集索引意味着记录根据索引按顺序物理存储(至少彼此靠近)。当您按顺序从每条记录中检索各个列时,聚集索引最为重要,因为数据库引擎不必跳转来获取下一条记录。相反,记录是按顺序存储的,因此记录之间的查找时间是最短的。

当读取索引中彼此靠近的多个记录时,聚集索引最为重要。

默认情况下,对于 InnoDB,主索引是聚集索引。

聚集索引的用例

如果您正在进行增量搜索,例如 Google 和 Yahoo 搜索,当您开始键入时,您会看到与您迄今为止键入的内容相匹配的前几条记录,则性能最重要的。如果您只返回结果集中的单个索引列,则不需要聚集索引,但我们假设您还想返回每个 key_word 的命中数,从而强制数据库引擎访问实际行。由于您想要返回连续的行,因此应该按顺序存储它们以获得最佳性能。

SELECT key_word, hits FROM keywords
WHERE key_word LIKE 'britney s%'
ORDER BY key_word
LIMIT 10

您希望主键(聚集索引)位于 key_word 上。

与非聚集索引的比较

所有索引在物理上都按顺序存储(实际上是一个 btree,但基本上),因此如果您只返回存储在索引中的列,您仍然可以获得相同的好处。这是因为索引列的实际值存储在索引中,因此MySQL将使用索引值而不是读取记录。但是,如果您开始检索不属于索引的列,那么您还希望按顺序存储实际记录,例如它们具有聚集索引。

关于聚集索引的 MySQL 文档

通过集群访问一行
索引很快,因为行数据是
在索引所在的同一页面上
搜索线索。如果表很大,则
聚集索引架构经常
节省磁盘 I/O 操作
与存储组织相比
使用不同的页面存储行数据
从索引记录。 (例如,
MyISAM 使用一个文件来存储数据行
另一个用于索引记录。)

在InnoDB中,记录是非聚集的
索引(也称为二级索引
索引)包含主键
不在其中的行的列
二级索引。 InnoDB使用这个
用于搜索的主键值
聚集索引中的行。如果
主键长,次要键长
索引使用更多空间,所以它是
初级较短的优势
关键。

MySQL 聚集索引和二级索引

Clustered indexes

A clustered index means that the records are physically stored in order (at least near each other), based on the index. Clustered indexes are most important when you are retrieving various columns from each record, in order, because the database engine does not have to jump around to get the next record. Instead, the records are stored sequentially, therefore the seek time between records is at its minimum.

Clustered indexes are most important when reading multiple records that appear near each other in the index.

By default, with InnoDB, your primary index is a clustered index.

Use case for clustered indexes

If you were doing an incremental search like the Google and Yahoo search, where as you start typing, you see the first few records that match what you've typed so far, performance is paramount. If you were returning just a single indexed column in the result set, you wouldn't need a clustered index, but let's pretend that you also want to return the number of hits for each key_word, forcing the database engine to access the actual row. Since you want to return sequential rows, you should store them sequentially for optimal performance.

SELECT key_word, hits FROM keywords
WHERE key_word LIKE 'britney s%'
ORDER BY key_word
LIMIT 10

You'd want your primary key (clustered index) to be on key_word.

Comparison to nonclustered indexes

All indexes are physically stored in order (a btree actually, but basically), so if you are returning just the column that is stored in the index, you're still getting the same benefit. That is because the indexed column's actual value is stored in the index, therefore MySQL will use the index value instead of reading the record. However, if you start retrieving columns that aren't part of the index, this is where you'd also want the actual records stored in order, such as they are with a clustered index.

MySQL Documentation on clustered indexes

Accessing a row through the clustered
index is fast because the row data is
on the same page where the index
search leads. If a table is large, the
clustered index architecture often
saves a disk I/O operation when
compared to storage organizations that
store row data using a different page
from the index record. (For example,
MyISAM uses one file for data rows and
another for index records.)

In InnoDB, the records in nonclustered
indexes (also called secondary
indexes) contain the primary key
columns for the row that are not in
the secondary index. InnoDB uses this
primary key value to search for the
row in the clustered index. If the
primary key is long, the secondary
indexes use more space, so it is
advantageous to have a short primary
key.

MySQL Clustered and Secondary Indexes

半寸时光 2024-09-02 09:18:53

我能想到的最好的例子是定期查询交易日期的报告表。我将在 TransactionDate 列上放置一个聚集索引,并根据查询优化添加任何其他所需的索引。

因此,诸如 select sum (amount) from transactiondetails where TransactionDate > 这样的查询'jan 01 2010' 和 TransactionDate < 'feb 01 2010' 将使用聚集索引进行搜索,并以更有效的方式得出结果。

The best example I can think of is a reporting table that is queried regularly on date of transaction(s). I would put a clustered index on the TransactionDate column and add any other required indexes based on query optimization.

So queries like select sum (amount) from transactiondetails where TransactionDate > 'jan 01 2010' and TransactionDate < 'feb 01 2010' will use the clustered index to do seeks and will come up with results in a more efficient way.

不交电费瞎发啥光 2024-09-02 09:18:53

真实的地址簿(死树版),按名字排序,类似于聚集索引其结构和目的。

聚集索引可以极大地提高整体检索速度,但通常仅在以与聚集索引相同或相反的顺序顺序访问数据的情况下,或者在选择一系列项目时才有效。

由于物理记录在磁盘上按此排序顺序,因此序列中的下一行项目紧邻最后一个行项目之前或之后,因此需要较少的数据块读取。

来源:维基百科:数据库索引 - 集群

A real address book (a dead tree edition), ordered by first name, resembles a clustered index in its structure and purpose.

Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected.

Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required.

Source: Wikipedia: Database Index - Clustered

音盲 2024-09-02 09:18:53

查看此处,位于页面中间它说:

通过集群访问一行
索引很快,因为行数据是
在索引所在的同一页面上
搜索线索。如果表很大,则
聚集索引架构经常
节省磁盘 I/O 操作
与存储组织相比
使用不同的页面存储行数据
从索引记录。 (例如,
MyISAM 使用一个文件来存储数据行
另一个用于索引记录。)

速度对我来说听起来是一个很好的理由..或者我错过了你的观点?

聚集索引的优点是它可以通过比“普通”索引更少的 io 操作来访问(从而搜索)。了解这一点后,您可以通过将聚集索引放置在对您最有利的位置来优化数据库访问,从而优化您的应用程序。

Look here, half way down the page it says:

Accessing a row through the clustered
index is fast because the row data is
on the same page where the index
search leads. If a table is large, the
clustered index architecture often
saves a disk I/O operation when
compared to storage organizations that
store row data using a different page
from the index record. (For example,
MyISAM uses one file for data rows and
another for index records.)

speed sounds like an excellent reason to me .. or am missing your point?

The advantage of the clustered index is that it can be accessed (and thus searched through) with fewer io operations than 'normal' indexes. Knowing this you can optimize your DB accesses and thus your application, by placing the clustered index where it will benefit you most.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文