索引和聚集索引对数据库性能有多重要?

发布于 2024-07-11 05:09:24 字数 139 浏览 12 评论 0原文

最近有几个关于数据库索引和聚集索引的问题,直到最近几周,它对我来说还是新鲜的。 我想知道它有多重要以及创建它们可以带来什么样的性能提升。

编辑:当您第一次开始放入聚集索引时,通常最需要查看的字段类型是什么?

There have been several questions recently about database indexing and clustered indexing and it has been kind of new to me until the last couple weeks. I was wondering how important it is and what kind of performance gains can be expected from creating them.

Edit: What is usually the best type of fields to look at when putting in a clustered index when you are first starting out?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

你的心境我的脸 2024-07-18 05:09:24

非常非常A(G,G 重要。 在我看来,明智的索引绝对是数据库性能优化中最重要的事情。

这不是一个容易用单一答案涵盖的主题。 良好的索引需要了解数据库上将发生的查询,进行大量权衡并理解特定数据库引擎中特定索引的含义。 但这仍然非常重要。

编辑:基本上,聚集索引通常应该具有较短的长度。 它们应该在反映范围的查询上创建。 他们不应该有重复的条目。 但这些指导方针非常笼统,绝不是正确的。 正确的做法是分析将要执行的查询。 仔细地对执行计划进行基准测试和分析,并了解什么是最好的方法。 这需要多年的经验和知识,绝不是用一句话就能解释清楚的。 这是使数据库专家成为专家的首要因素(这不是唯一的因素,但它是其他重要因素的基础,例如并发问题、可用性……)!

Very veryA(G,G) important. In my opinion, wise indexing is the absolute most important thing in DB performance optimization.

This is not an easy topic to cover in a single answer. Good indexing requires knowledge of queries going to happen on the database, making a large number of trade-offs and understanding the implication of a specific index in the specific DB engine. But it's very important nevertheless.

EDIT: Basically, clustered indexes usually should have short lengths. They should be created on queries which reflect a range. They should not have duplicate entries. But these guidelines are very general and by no means the right thing. The right thing is to analyze the queries that are gonna be executed. Carefully benchmarking and analyzing execution plans and understanding what is the best way to do it. This requires years of experience and knowledge and by no means it's something to explain in a single paragraph. It's the primary thing that makes DB experts expert (It's not the only thing, but it's primitive to other important things, such as concurrency issues, availability, ...)!

风吹雨成花 2024-07-18 05:09:24

索引:极其重要。 错误的索引会使查询变得更加困难,有时甚至无法在合理的时间内完成。

索引还会影响插入性能和磁盘使用(负面影响),因此在大型表上保留大量多余的索引也是一个坏主意。

集群是值得思考的事情,我认为它确实取决于特定数据库的行为。 如果您可以正确地对数据进行聚类,则可以显着减少满足对不在内存中的行的请求所需的 IOP 量。

Indexing: extremely important. Having the wrong indexes makes queries harder, sometimes to the point they can't be completed in a sensible time.

Indexes also impact insert performance and disc usage (negatively), so keeping lots of superfluous indexes around on large tables is a bad idea too.

Clustering is something worth thinking about, I think it's really dependent on the behaviour of the specific database. If you can cluster your data correctly, you can dramatically reduce the amount of IOPs required to satisfy requests for rows not in memory.

瀞厅☆埖开 2024-07-18 05:09:24

如果没有正确的索引,您将强制 RDBMS 进行表扫描来查询任何内容。 效率极低。

我还推断您没有主键,这是关系设计中的一个主要错误。

Without proper indexes, you force the RDBMS to do table scans to query for anything. Terribly inefficient.

I'd also infer that you don't have primary keys, which is a cardinal sin in relational design.

少跟Wǒ拽 2024-07-18 05:09:24

当表包含很多行时,索引非常重要。
有了一些 rw,没有索引的性能会更好。
对于较大的表,索引对于获得良好的性能非常重要。
定义它们并不容易。 聚簇意味着数据按照聚簇索引顺序存储。
要获得索引的良好提示,您可以使用 Toad

Indexing is very important when the table contains many rows.
With a few rws, performance is better without indexes.
With larger tables indexes are very important to get good performance.
It is not easy to defined them. Clustered means that the data are stored in the clustered index order.
To get good hints of indexes you could use Toad

江南月 2024-07-18 05:09:24

索引非常重要

正确的查询索引可以显着提高性能,这看起来就像巫术

Indexing is vitally important.

The right index for a query can improve performance so dramatically it can seem like witchcraft.

兮颜 2024-07-18 05:09:24

正如其他答案所说,索引至关重要。

正如您可能从其他答案中推断出的那样,聚集索引并不那么重要。

良好的索引可为您带来一阶性能提升 - 数量级的提升很常见。

聚集索引是二阶或增量性能增益——通常提供小百分比(<100%)的性能提升。

(我们还遇到了“什么是 100% 性能增益”的问题;我将百分比解释为 ((oldtime - newtime)/newtime) * 100,因此如果旧时间为 10 秒,新时间为 5秒,性能提升100%。)

不同的DBMS对聚集索引的含义有不同的解释。 谨防。
特别是,一些 DBMS 对数据进行一次聚类,此后聚类会随着时间的推移而衰减,直到数据被重新聚类。 我相信其他人对集群持更积极的看法。

As the other answers have said, indexing is crucial.

As you might infer from other answers, clustered indexing is much less crucial.

Decent indexing gives you first order performance gains - orders of magnitude are common.

Clustered indexing is a second order or incremental performance gain - usually giving small (<100%) percentages of performance increase.

(We also get into questions of 'what is a 100% performance gain'; I'm interpreting the percentage as ((oldtime - newtime)/newtime) * 100, so if the old time is 10 seconds and the new time is 5 seconds, the performance increase is 100%.)

Different DBMS have different interpretations of what a clustered index means. Beware.
In particular, some DBMS cluster the data once and thereafter, the clustering decays over time until the data is reclustered. Others take a more active view of clustering, I believe.

腻橙味 2024-07-18 05:09:24

聚集索引通常但并不总是您的主键。 查看聚集索引的一种方法是考虑基于聚集索引的值对数据进行物理排序。

现实情况可能并非如此,但是无论如何,引用聚集索引通常都会为您带来以下性能奖励:

  1. 当从聚集索引命中解析时,表的所有列都可以免费访问,就好像它们包含在覆盖范围内一样指数。 (仅使用索引数据即可解析查询,而无需引用表本身的数据页)

  2. 可以直接针对聚集索引进行更新操作,无需中间处理。 如果您要对表进行大量更新,您通常希望引用聚集列。

  3. 根据实现的不同,可能会带来顺序访问的好处,即可以更快地检索存储在磁盘上的数据,并且使用更少昂贵的磁盘查找操作。

    根据实现的不同,可能

  4. 根据实现的不同,在不需要物理索引的情况下,可能会有免费索引的好处,因为数据访问可以通过简单的猜谜游戏算法来解决。

    根据实现的不同,在不需要

不要指望#3,尤其是#4。 在大多数 RDBMS 平台上,#1 和 #2 通常是安全的选择。

The clustered index is ususally but not always your primary key. One way of looking at a clustered index is to think of the data being physically ordered based on the values of the clustered index.

This may very well not be the case in reality however refrencing clustered indexes ususally gets you the following performance bonuses anyway:

  1. All columns of the table are accessable for free when resolved from a clustered index hit as if they were contained within a covering index. (A query resolvable using just the index data without having to refrence the data pages of the table itself)

  2. Update operations can be made directly against a clustered index without intermediate processing. If you are doing a lot of updates against a table you ususally want to be refrencing the clustered columns.

  3. Depending on implementation there may be a sequential access benefit where data stored on disk is retreived quicker with fewer expensive disk seek operations.

  4. Depending on implementation there may be free index benefit where a physical index is not necessary as data access can be resolved via simple guessing game algorithms.

Don't count on #3 and especially #4. #1 and #2 are ususally safe bets on most RDBMS platforms.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文