减少 SQL Server 表碎片而不添加/删除聚集索引?
我有一个大型数据库(90GB 数据,70GB 索引),在过去的一年里一直在缓慢增长,增长/更改不仅导致索引产生大量内部碎片,还导致表本身产生大量内部碎片。
解决(大量)非常碎片化的索引很容易 - REORGANIZE 或 REBUILD 将解决这个问题,具体取决于它们的碎片程度 - 但我能找到的关于清理实际表碎片的唯一建议是添加聚集索引到桌子上。之后我会立即删除它,因为我不希望表上有聚集索引,但是是否有另一种方法可以在没有聚集索引的情况下执行此操作?一个“DBCC”命令可以做到这一点?
感谢您的帮助。
I have a large database (90GB data, 70GB indexes) that's been slowly growing for the past year, and the growth/changes has caused a large amount of internal fragmentation not only of the indexes, but of the tables themselves.
It's easy to resolve the (large number of) very fragmented indexes - a REORGANIZE or REBUILD will take care of that, depending on how fragmented they are - but the only advice I can find on cleaning up actual table fragmentation is to add a clustered index to the table. I'd immediately drop it afterwards, as I don't want a clustered index on the table going forward, but is there another method of doing this without the clustered index? A "DBCC" command that will do this?
Thanks for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
问题
让我们澄清一下,因为这是一个常见问题,对于每个使用 SQL Server 的公司来说都是一个严重的问题。
这个问题以及 CREATE CLUSTERED INDEX 的需要被误解了。
同意拥有永久聚集索引比没有好。但这不是重点,而且无论如何这都会导致长时间的讨论,所以让我们先把这个放在一边,集中讨论发布的问题。
关键是,堆上有大量碎片。你一直称其为“表”,但在物理数据存储或数据结构级别上不存在这样的东西。表是一个逻辑概念,而不是物理概念。它是物理数据结构的集合。该集合是两种可能性之一:
堆
加上所有非聚集索引
加上文本/图像链
或聚集索引
(消除堆和一个非聚集索引)
加上所有非聚集索引
加上文本/图像链。
堆变得严重碎片化;散布(随机)插入/删除/更新越多,碎片就越多。
没有办法按原样清理堆。 MS 不提供设施(其他供应商提供)。
解决方案
但是,我们知道创建聚集索引会完全重写并重新排序堆。因此,方法(不是技巧)是创建聚集索引仅用于整理堆碎片,然后将其删除。数据库中需要 table_size x 1.25 的可用空间。
当您这样做时,请务必使用 FILLFACTOR 来减少未来的碎片。然后,堆将占用更多分配的空间,以便将来进行插入、删除以及由于更新而进行的行扩展。
注意
请注意,碎片分为三个级别;这仅涉及第三级,即堆内的碎片,这是由缺乏聚集索引引起的
作为一项单独的任务,在其他时间,您可能希望考虑实施永久聚集索引,它完全消除了碎片......但这与发布的问题无关。
对评论的回应
相当。我不会称其为“限制”。
我给出的消除堆碎片的方法是创建一个聚集索引,然后删除它。即。暂时,唯一的目的是纠正碎片。
(永久地)在表上实现聚集索引是一个更好的解决方案,因为它减少了总体碎片(DataStructure仍然可以碎片化,请参阅下面链接中的详细信息),这是远小于堆中发生的碎片。
关系数据库中的每个表(“管道”或“队列”表除外)都应该有一个聚集索引,以便利用其各种优势。
聚集索引应该位于分布数据的列上(避免 INSERT 冲突),切勿在单调递增的列上建立索引,例如记录 ID 1,这样可以保证 INSERT 热位于最后一页。
1.每个文件上的记录 ID 使您的“数据库”成为一个非关系记录归档系统,使用 SQL 仅仅是为了方便。此类文件不具备关系数据库的完整性、功能或速度。
在 MS SQL 和 Sybase ASE 中,有三个 级别碎片的级别,以及每个级别内的几种不同的类型。请记住,在处理碎片时,我们必须关注数据结构,而不是表(表是数据结构的集合,如上所述)。级别为:
级别 I • 额外数据结构
在相关数据结构之外、跨数据库或在数据库内。
二级•数据结构
在相关数据结构内,页面上方(跨所有页面)
这是 DBA 最常提到的级别。
第三级•页面
在相关数据结构内,在页面内
这些链接提供了有关碎片的完整详细信息。它们特定于 Sybase ASE,但是,在结构级别上,这些信息适用于 MS SQL。
碎片定义
碎片影响
碎片类型
请注意,我给出的方法是 Level II,它纠正了 Level II 和三、碎片化。
Problem
Let's get some clarity, because this is a common problem, a serious issue for every company using SQL Server.
This problem, and the need for CREATE CLUSTERED INDEX, is misunderstood.
Agreed that having a permanent Clustered Index is better than not having one. But that is not the point, and it will lead into a long discussion anyway, so let's set that aside and focus on the posted question.
The point is, you have substantial fragmentation on the Heap. You keep calling it a "table", but there is no such thing at the physical data storage or DataStructure level. A table is a logical concept, not a physical one. It is a collection of physical DataStructures. The collection is one of two possibilities:
Heap
plus all Non-clustered Indices
plus Text/Image chains
or a Clustered Index
(eliminates the Heap and one Non-clustered Index)
plus all Non-clustered Indices
plus Text/Image chains.
Heaps get badly fragmented; the more interspersed (random)Insert/Deletes/Updates there are, the more fragmentation.
There is no way to clean up the Heap, as is. MS does not provide a facility (other vendors do).
Solution
However, we know that Create Clustered Index rewrites and re-orders the Heap, completely. The method (not a trick), therefore, is to Create Clustered Index only for the purpose of de-fragmenting the Heap, and drop it afterward. You need free space in the db of table_size x 1.25.
While you are at it, by all means, use FILLFACTOR, to reduce future fragmentation. The Heap will then take more allocated space, allowing for future Inserts, Deletes and row expansions due to Updates.
Note
Note that there are three Levels of Fragmentation; this deals with Level III only, fragmentation within the Heap, which is caused by Lack of a Clustered Index
As a separate task, at some other time, you may wish to contemplate the implementation of a permanent Clustered Index, which eliminates fragmentation altogether ... but that is separate to the posted problem.
Response to Comment
Not quite. I wouldn't call it a "limitation".
The method I have given to eliminate the Fragmentation in the Heap is to create a Clustered Index, and then drop it. Ie. temporarily, the only purpose of which is correct the Fragmentation.
Implementing a Clustered Index on the table (permanently) is a much better solution, because it reduces overall Fragmentation (the DataStructure can still get Fragmented, refer detailed info in links below), which is far less than the Fragmentation that occurs in a Heap.
Every table in a Relational database (except "pipe" or "queue" tables) should have a Clustered Index, in order to take advantage of its various benefits.
The Clustered Index should be on columns that distribute the data (avoiding INSERT conflicts), never be indexed on a monotonically increasing column, such as Record ID 1, which guarantees an INSERT Hot Spot in the last Page.
1. Record IDs on every File renders your "database" a non-relational Record Filing System, using SQL merely for convenience. Such Files have none of the Integrity, Power, or Speed of Relational databases.
In MS SQL and Sybase ASE, there are three Levels of Fragmentation, and within each Level, several different Types. Keep in mind that when dealing with Fragmentation, we must focus on DataStructures, not on tables (a table is a collection of DataStructures, as explained above). The Levels are:
Level I • Extra-DataStructure
Outside the DataStructure concerned, across or within the database.
Level II • DataStructure
Within the DataStructure concerned, above Pages (across all Pages)
This is the Level most frequently addressed by DBAs.
Level III • Page
Within the DataStructure concerned, within the Pages
These links provide full detail re Fragmentation. They are specific to Sybase ASE, however, at the structural level, the information applies to MS SQL.
Fragmentation Definition
Fragmentation Impact
Fragmentation Type
Note that the method I have given is Level II, it corrects the Level II and III Fragmentation.
您声明添加聚集索引以减轻表碎片,然后立即将其删除。
聚集索引通过对聚集键进行排序来消除碎片,但您说该键将来无法使用。这就引出了一个问题:为什么要使用这个键进行碎片整理?
创建这个聚集键并保留它是有意义的,因为您显然希望/需要以这种方式排序数据。你说数据变更会带来无法承担的数据移动惩罚;您是否考虑过使用低于默认值的
FILLFACTOR
创建索引?根据数据更改模式,您可以从低至 80% 的收益中受益。这样,每页就有 20% 的“未使用”空间,但当聚集键值更改时,较低的页面拆分的好处是。这对你有帮助吗?
You state that you add a clustered index to alleviate the table fragmentation, to then drop it immediately.
The clustered index removes fragmentation by sorting on the cluster key, but you say that this key would not be possible for future use. This begs the question: why defragment using this key at all?
It would make sense to create this clustered key and keep it, as you obviously want/need the data sorted that way. You say that data changes would incur data movement penalties that can't be borne; have you thought about creating the index with a lower
FILLFACTOR
than the default value? Depending upon data change patterns, you could benefit from something as low as 80%. You then have 20% 'unused' space per page, but the benefit of lower page splits when the clustered key values are changed.Could that help you?
没有人谈论的问题是硬盘驱动器上的数据或日志设备文件本身的碎片!每个人都在谈论索引碎片以及如何避免/限制这种碎片。
仅供参考:创建数据库时,您可以指定 .MDF 的初始大小以及它需要增长时的增长量。您对 .LDF 文件执行相同的操作。无法保证当这两个文件增长时,为所需的额外磁盘空间分配的磁盘空间将与分配的现有磁盘空间在物理上连续!
每次这两个设备文件之一需要扩展时,就有可能产生硬盘磁盘空间碎片。这意味着硬盘驱动器上的磁头需要更加努力(并花费更多时间)从硬盘驱动器的一个部分移动到另一部分才能访问数据库中的必要数据。这类似于购买一小块土地并在该土地上建造一座适合该土地的房屋。当你需要扩建房子时,你没有更多的可用土地,除非你购买隔壁的空地 - 除非 - 如果同时其他人已经购买了那块土地并在上面建造了房子怎么办?那么你就不能扩建你的房子。唯一的可能性是在“附近”再购买一块土地,并在上面建造另一栋房子。问题是 - 您和您的两个孩子将住在 A 屋,而您的妻子和第三个孩子将住在 B 屋。这将是一个痛苦(只要您仍处于婚姻状态)。
解决这种情况的解决办法是“购买更大的一块土地,拿起现有的房子(即数据库),将其搬到更大的一块土地上,然后在那里扩建房子”。那么,如何使用数据库做到这一点呢?进行完整备份,删除数据库(除非您有足够的可用磁盘空间来保留旧的碎片数据库(以防万一)以及新数据库),创建一个分配有大量初始磁盘空间的全新数据库(不保证操作系统会确保您请求的空间是连续的),然后将数据库恢复到刚刚创建的新数据库空间中。是的 - 这很痛苦,但我不知道有任何“自动磁盘碎片整理程序”软件可以处理 SQL 数据库文件。
The problem that no one is talking about is FRAGMENTATION OF THE DATA OR LOG DEVICE FILES ON THE HARD DRIVE(s) ITSELF!! Everyone talks about fragmentation of the indexes and how to avoid/limit that fragmentation.
FYI: When you create a database, you specify the INITIAL size of the .MDF along with how much it will grow by when it needs to grow. You do the same with the .LDF file. THERE IS NO GUARANTEE THAT WHEN THESE TWO FILES GROW THAT THE DISK SPACE ALLOCATED FOR THE EXTRA DISK SPACE NEEDED WILL BE PHYSICALLY CONTIGUOUS WITH THE EXISTING DISK SPACE ALLOCATED!!
Every time one of these two device files needs to expand, there is the possibility of fragmentation of the hard drive disk space. That means the heads on the hard drive need to work harder (and take more time) to move from one section of the hard drive to another section to access the necessary data in the database. It is analogous to buying a small plot of land and building a house that just fits on that land. When you need to expand the house, you have no more land available unless you buy the empty lot next door - except - what if someone else, in the meantime, has already bought that land and built a house on it? Then you CANNOT expand your house. The only possibility is to buy another plot of land in the "neighborhood" and build another house on it. The problem becomes - you and two of your children would live in House A and your wife and third child would live in House B. That would be a pain (as long as you were still married).
The solution to remedy this situation is to "buy a much larger plot of land, pick up the existing house (i.e. database), move it to the larger plot of land and then expand the house there". Well - how do you do that with a database? Do a full backup, drop the database (unless you have plenty of free disk space to keep both the old fragmented database - just in case - as well as the new database), create a brand new database with plenty of initial disk space allocated (no guarantee that the operating system will insure that the space that you request will be contiguous) and then restore the database into the new database space just created. Yes - it is a pain to do but I do not know of any "automatic disk defragmenter" software that will work on SQL database files.
您可以通过运行带有 NOTRUNCATE 的 DBCC SHRINKFILE 来压缩堆.
根据评论,我发现您尚未使用永久聚集索引进行测试。
为了正确地看待这一点,我们的数据库每天有 1000 万个新行,所有表上都有聚集索引。删除的“间隙”将通过预定的 ALTER INDEX (以及前向指针/页面拆分)删除。
您的 12GB 表在索引后可能是 2GB:它仅分配了 12GB,但也存在大量碎片。
You can maybe compact the heap by running DBCC SHRINKFILE with NOTRUNCATE.
Based on comments, I see you haven't tested with a permenent clustered index.
To put this in perspective, we have database with 10 million new rows per day with clustered indexes on all tables. Deleted "gaps" will be removed via scheduled ALTER INDEX (and also forward pointers/page splits).
Your 12GB table may be 2GB after indexing: it merely has 12GB allocated but is massively fragmented too.
我理解您因受到遗留设计的限制而感到痛苦。
您是否有机会在另一台服务器上恢复相关表的备份并创建聚集索引?如果在一组狭窄的唯一列或标识列上创建聚集索引,则很有可能会减少表(数据和索引)的总大小。
在我的一个遗留应用程序中,所有数据都是通过视图访问的。我能够修改基础表的架构,添加标识列和聚集索引,而不影响应用程序。
使用堆的另一个缺点是与任何转发行相关的额外 IO。
当我被问及是否有证据证明我们需要在表上永久建立聚集索引时,我发现下面的文章很有效
本文由 Microsoft 撰写
I understand your pain in being constrained by the design of a legacy design.
Have you the oppertunity to restore a backup of the table in question on another server and create a clustered index? It is very possible the clustered index if created on a set of narrow unique columns or an identity column will reduce the total table (data and index) size.
In one of my legacy apps all the data was accessed via views. I was able to modify the schema of the underlying table adding an identity column and a clustered index without effecting the application.
Another drawback of having the heap is the extra IO associated with any fowarded rows.
I found the article below effective when I was asked if there was any PROOF that we needed a clusted index permanently on the table
This article is by Microsoft