什么是索引？非聚集索引可以是非唯一的吗？

发布于 2024-09-25 08:24:46 字数 1353 浏览 1 评论 0原文

我的其他问题的子问题，关于UNIQUEINDEX 创建的 code> 参数是为了：

（MS SQL Server）索引（我能找到的）的所有定义都是不明确的，并且基于它的所有解释都使用未定义或定义不明确的术语来叙述某些内容。

索引的定义是什么？

比如最常见的索引定义

数据库索引是一种数据结构，它可以提高数据库表上的数据检索操作的速度，但代价是写入速度变慢和存储空间增加。可以使用数据库表的一列或多列创建索引...
†SQL Server 默认情况下在主键上创建聚集索引。数据以随机顺序出现，但逻辑顺序由索引指定。数据行可以随机分布在整个表中。非聚集索引树包含按排序顺序的索引键，索引的叶级包含指向页的指针和数据页中的行号。

我仍然感觉很暧昧。我们可以将索引理解为：

一种有序的数据结构，一棵树，包含中间节点和叶子节点；
叶节点数据包含索引列的值+“指向页面的指针和数据页中的行号”

，考虑到2），非聚集索引可以是非唯一的吗？或者，甚至，1) ?
对我来说似乎并非如此......

但是TSQL是否暗示存在非唯一非聚集索引？

如果是，那么 MS 创建索引文档，以及 UNIQUE 参数应用于何处？

是：

叶节点数据包含索引列中的值，但没有指针+行号

如果是3），则再次出现问题1） - 为什么对“索引”中的实际数据副本应用约束，而不是现场的真实数据？

指向真实数据行的书签（指针+行号）是否唯一（是否唯一标识行）？
这个书签不是构成了索引的一部分，从而使索引变得唯一吗？
你能给我索引的定义而不是解释如何使用它UNDEFINED吗？后一部分我已经知道（或者可以自己读）。

† _{此段落不再存在于当前版本的维基百科页面中，但在发布时做了。}

原文

Subquestion to my other question about what the UNIQUE argument on INDEX creation is for:

All definitions of (MS SQL Server) indexes (that I could find) are ambiguous and all explanations, based on it, narrate something using undefined or ambiguously defined terms.

What is the definition of index?

For example, the most common definition of index from Wikipedia is:

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table...
†SQL server creates a clustered index on a primary key by default. The data is present in random order, but the logical ordering is specified by the index. The data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page.

It still feels ambiguous to me. One can understand an index as:

An ordered data structure, a tree, containing intermediate and leaf nodes;
Leaf node data containing values from indexed columns + "pointer to the page and the row number in the data page"

Can non-clustered index be non-unique, considering 2)? or, even, 1) ?
It doesn't seem so to me ...

But does TSQL imply the existence of a non-unique non-clustered index?

If yes, then What is understood by non-clustered index in the MS Create Index docs, and to what the argument UNIQUE is applied there?

Is it:

Leaf node data containing values from indexed columns but without pointer + row number

If it is 3), then again question 1) arises - why apply constraints to copy of real data in an "index", instead of real data in-situ?

Is a bookmark (pointer+row number) to a real data row unique (does it uniquely identify rows)?
Doesn't this bookmark constitute part of the index and thereby make the index unique?
Can you give me the definition of the index instead of explaining how to use it UNDEFINED? The latter part I already know (or can read myself).

† _{This paragraph no longer exists in the current revision of the Wikipedia page, but did at time of posting.}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往事随风而去 2024-10-02 08:24:46

索引是一种旨在优化大型数据集查询的数据结构。因此，目前还没有断言任何东西是否是独一无二的。

你绝对可以拥有非唯一的非聚集索引 - 你还能如何对姓氏、名字建立索引？这永远不会是唯一的（例如在 Facebook 上......）

您可以将索引定义为唯一 - 这只是向其添加额外的检查，不允许重复值。如果您想让您的索引（姓氏，名字）唯一，那么第二个在您的网站上注册的布拉德·皮特就无法这样做，因为该唯一索引将拒绝他的数据。

一个例外是任何给定表上的主键。主键是用于唯一且精确地标识数据库中每一行的逻辑标识符。因此，它在所有行中必须是唯一的，并且不能包含任何 NULL 值。

SQL Server 中的聚集索引很特殊，因为它们在叶节点中包含实际数据。到目前为止，没有任何限制 - 但是：聚集索引还用于唯一地定位（物理定位）数据库中的数据，因此，聚集索引必须是唯一的 - 它必须是能够区分布拉德·皮特 #1 和布拉德·皮特 #2。如果您不小心为聚集索引提供了一组唯一的列，SQL Server 将为那些不唯一的行添加一个“uniquefier”（4 字节 INT），例如您会得到 BradPitt001 和BradPitt002（或类似的东西）。

聚集索引用作指向 SQL Server 表中实际数据行的“指针”，因此它也包含在每个非聚集索引中。因此，（姓氏、名字）上的非聚集、非唯一索引不仅包含这两个字段，而且实际上，它还包含该表上的聚集键 - 这就是为什么它很重要SQL Server 表上的聚集键较小、稳定且唯一 - 通常是 INT。

因此，（姓氏，名字）上的非聚集索引实际上将具有（姓氏，名字，personID），并且将包含诸如 (Pitt, Brad, 10176)、(Pitt, Brad, 17665）等等。当您在非聚集索引中搜索“Brad Pitt”时，SQL Server 现在将找到这两个条目，并且对于这两个条目，它都有“物理指针”，指向在哪里可以找到这两个家伙的其余数据，因此如果您要求的不仅仅是名字和姓氏，SQL Server 现在可以抓取两个 Brad Pitt 条目中每一个条目的整行，并为您提供查询所需的数据。

An index is a data structure designed to optimize querying large data sets. As such, no claim is made about whether or not anything is unique at this point.

You can definitely have non-unique non-clustered indices - how else could you index on lastname, firstname ?? That's never going to be unique (e.g. on Facebook.....)

You can define an index as being unique - this just adds the extra check to it that no duplicate values are allowed. If you would make your index on (lastname, firstname) UNIQUE, then the second Brad Pitt to sign up on your site couldn't do so, since that unique index would reject his data.

One exception is the primary key on any given table. The primary key is the logical identifier used to uniquely and precisely identify each single row in your database. As such, it must be unique over all rows and cannot contain any NULL values.

The clustered index in SQL Server is special in that they do contain the actual data in their leaf nodes. There's no restriction up to this point - however: the clustered index is also being used to uniquely locate (physically locate) the data in your database, and thus, the clustered index must be unique - it must be able to tell Brad Pitt #1 and Brad Pitt #2 apart. If you don't take care and provide a unique set of columns to your clustered index, SQL Server will add a "uniquefier" (a 4-byte INT) to those rows that aren't unique, e.g. you'd get BradPitt001 and BradPitt002 (or something like that).

The clustered index is used as the "pointer" to the actual data row in your SQL Server table, so it's included in every single non-clustered index, too. So your non-clustered, non-unique index on (lastname, firstname) would not only contain these two fields, but in reality, it also contains the clustered key on that table - that's why it's important the clustered key on a SQL Server table is small, stable, and unique - typically an INT.

So your non-clustered index on (lastname, firstname) will really have (lastname, firstname, personID) and will have entries like (Pitt, Brad, 10176), (Pitt, Brad, 17665) and so forth. When you search for "Brad Pitt" in your non-clustered index, SQL Server will now find these two entries, and for both, it has the "physical pointer" to where to find the rest of the data for those two guys, so if you ask for more than just the first- and last name, SQL Server could now go grab the whole row for each of the two Brad Pitt entries and provide you with the data the query requires.

回复收藏 0 原文