SQL Server 2005 中没有聚集索引的原因
我继承了 SQL SERVER 2005 数据库的一些数据库创建脚本。
我注意到的一件事是,所有主键都是作为非聚集索引而不是聚集索引创建的。
我知道每个表只能有一个聚集索引,并且您可能希望将其放在非主键列上以提高搜索等的查询性能。但是表上没有其他CLUSTERED
索引在问题中。
所以我的问题是,除了上述之外,是否还有任何技术原因不在主键列上建立聚集索引。
I've inherited some database creation scripts for a SQL SERVER 2005 database.
One thing I've noticed is that all primary keys are created as NON CLUSTERED
indexes as opposed to clustered.
I know that you can only have one clustered index per table and that you may want to have it on a non primary key column for query performance of searches etc. However there are no other CLUSTERED
indexes on the tables in questions.
So my question is are there any technical reasons not to have clustered indexes on a primary key column apart from the above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在任何“正常”数据或查找表上:不,我看不出任何原因。
对于批量导入表或临时表之类的东西 - 这取决于。
让一些人惊讶的是,拥有一个良好聚集索引实际上可以加快 INSERT 或 UPDATE 等操作的速度。请参阅 Kimberly Tripps 出色的 聚集索引争论仍在继续...... 博客文章,她在其中详细解释了为什么会出现这种情况。
有鉴于此:我认为没有任何有效理由不拥有良好的聚集索引(狭窄、稳定、唯一、不断增加=
INT IDENTITY< /code> 作为最明显的选择)在任何 SQL Server 表上。
要深入了解如何以及为何选择集群键,请阅读 Kimberly Tripp 关于该主题的所有优秀博客文章:
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx
来自“索引女王”的优秀内容”! :-)
On any "normal" data or lookup table: no, I don't see any reason whatsoever.
On stuff like bulk import tables, or temporary tables - it depends.
To some people surprisingly, it appears that having a good clustered index actually can speed up operations like INSERT or UPDATE. See Kimberly Tripps excellent The Clustered Index Debate continues.... blog post in which she explains in great detail why this is the case.
In this light: I don't see any valid reason not to have a good clustered index (narrow, stable, unique, ever-increasing =
INT IDENTITY
as the most obvious choice) on any SQL Server table.To get some deep insights into how and why to choose clustering keys, read all of Kimberly Tripp's excellent blog posts on the topic:
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx
Excellent stuff from the "Queen of Indexing" ! :-)
集群表与堆表
(有关该主题的好文章,位于www.mssqltips.com)
HEAP 表(无聚集索引)
数据不存储在任何特定的
order
无法检索具体数据
很快,除非还有
非聚集索引
数据页没有链接,所以
顺序访问需要回溯
到索引分配映射 (IAM)
页
由于没有聚集索引,
不需要额外的时间
维护索引
由于没有聚集索引,
不需要额外的
存储聚集索引的空间
树
这些表的index_id值为
sys.indexes 目录视图中的 0
聚集表
数据按顺序存储
聚集索引键
可以快速检索数据
在聚集索引键上,如果
查询使用索引列
链接数据页以加快速度
顺序存取
需要额外的时间来维护基于的聚集索引
插入、更新和删除
需要额外的空间来存储
聚集索引树
这些表在 sys.indexes 目录中的 index_id 值为 1
查看
Clustered Tables vs Heap Tables
(Good article on subject at www.mssqltips.com)
HEAP Table (Without clustered index)
Data is not stored in any particular
order
Specific data can not be retrieved
quickly, unless there are also
non-clustered indexes
Data pages are not linked, so
sequential access needs to refer back
to the index allocation map (IAM)
pages
Since there is no clustered index,
additional time is not needed to
maintain the index
Since there is no clustered index,
there is not the need for additional
space to store the clustered index
tree
These tables have a index_id value of
0 in the sys.indexes catalog view
Clustered Table
Data is stored in order based on the
clustered index key
Data can be retrieved quickly based
on the clustered index key, if the
query uses the indexed columns
Data pages are linked for faster
sequential access
Additional time is needed to maintain clustered index based on
INSERTS, UPDATES and DELETES
Additional space is needed to store
clustered index tree
These tables have a index_id value of 1 in the sys.indexes catalog
view
请首先阅读我在“无法直接访问聚集表中的数据行 - 为什么?”下的回答。特别是第 [2] 项警告。
创建“数据库”的人都是白痴。他们有:
)随着电子表格伪装成数据库,完全避免 CI 而只使用 NCI 和堆的做法变得越来越普遍。显然,他们没有获得 CI 的任何力量或好处,但是天哪,他们没有获得关系数据库的任何力量或好处,所以谁在乎他们没有获得 CI 的任何力量(CI 是为关系数据库设计的,他们的不是)。从他们的角度来看,无论如何,他们必须经常“重构”这该死的东西,所以为什么要费心呢。关系数据库不需要“重构”。
如果您需要进一步讨论此响应,请发布 CREATE TABLE/INDEX DDL;否则就是浪费时间的学术争论。
Please read my answer under "No direct access to data row in clustered table - why?", first. Specifically item [2] Caveat.
The people who created the "database" are cretins. They had:
For such collections of spreadsheets masquerading as databases, it is becoming more and more common to avoid CIs altogether, and just have NCIs plus the Heap. Obviously they get none of the power or benefits of the CI, but hell, they get none of the power or benefit of Relational databases, so who cares that they get none of the power of CIs (which were designed for Relational databases, which theirs is not). The way they look at it, they have to "refactor" the darn thing every so often anyway, so why bother. Relational databases do not need "refactoring".
If you need to discuss this response further, please post the CREATE TABLE/INDEX DDL; otherwise it is a time-wasting academic argument.
这是另一个(是否已在其他答案中提供?)可能的原因(仍有待理解):
我希望我稍后会更新,但现在更希望链接这些主题
更新:
我在理解聚集索引时错过了什么?
Here is another (have it already been provided in other answers?) possible reason (still to be understood):
I hope, I shall update later but for now it is rather the desire to link these topics
Update:
What do I miss in understanding the clustered index?
如今,一些 B 树服务器/编程语言仍在使用,固定或可变长度的平面 ASCII 文件用于存储数据。当新的数据记录/行添加到文件(表)时,该记录(1)附加到文件末尾(或替换已删除的记录)并且(2)索引平衡。当数据以这种方式存储时,您不必担心系统性能(就 B 树服务器正在执行的操作来返回指向第一个数据记录的指针而言)。响应时间仅受索引文件中的节点数影响。
当您开始使用 SQL 时,您希望会意识到每当编写 SQL 语句时都必须考虑系统性能。在非索引列上使用“ORDER BY”语句可能会使系统崩溃。使用聚集索引可能会给 CPU 带来不必要的负载。现在是 21 世纪,我希望我们在使用 SQL 编程时不必考虑系统性能,但我们仍然需要考虑。
对于一些较旧的编程语言,每当检索排序的数据时都必须使用索引。我只希望这个要求今天仍然存在。我只是想知道有多少公司因为非索引数据上的 SQL 语句写得不好而更新了他们缓慢的计算机系统。
在我 25 年的编程生涯中,我从来不需要以特定顺序存储物理数据,所以也许这就是为什么一些程序员避免使用聚集索引的原因。很难知道权衡是什么(存储时间,与检索时间),特别是如果您正在设计的系统有一天可能会存储数百万条记录。
With some b-tree servers/programming languages still used today, fixed or variable length flat ascii files are used for storing data. When a new data record/row is added to a file (table), the record is (1) appended to the end of the file (or replaces a deleted record) and (2) the indexes are balanced. When data is stored this way, you don't have to be concerned about system performance (as far as what the b-tree server is doing to return a pointer to the first data record). The response time is only effected by the # of nodes in your index files.
When you get into using SQL, you hopefully come to realize that system performance has to be considered whenever you write an SQL statement. Using an "ORDER BY" statement on a non-indexed column can bring a system to its knees. Using a clustered index might put an unnecessary load on the CPU. It's the 21st century and I wish we didn't have to think about system performance when programming in SQL, but we still do.
With some older programming languages, it was mandatory to use an index whenever sorted data is retrieved. I only wish this requirement was still in place today. I can only wonder how many companies have updated their slow computer systems due to a poorly written SQL statement on non-indexed data.
In my 25 years of programming, I've never needed my physical data stored in a particular order, so maybe that is why some programmers avoid using clustered indexes. It's hard to know what the tradeoff is (storage time, verses retrieval time) especially if the system you are designing might store millions of records someday.