将表移出分区
我有一个非常大的表(800GB),其中有一个 DATETIME 字段,它是分区架构的一部分。该字段名为 tran_date。我遇到的问题是索引未与分区正确对齐,并且我无法在主键中包含 tran_date 字段,因为它设置为可为空。
我可以删除所有外键关系、统计信息和索引,但无法修改该列,因为分区架构仍然依赖于 tran_date 列。
在我的研究中,我找到了一种将表移出分区的方法,即删除聚集索引,然后将聚集索引重新写入主文件组,这将允许我修改列,但此操作需要需要几个小时才能删除,需要 13 个小时在 PRIMARY 上写入临时聚集索引,然后我必须删除它,更改表,并正确地重新写入聚集索引,这又需要 13 个小时。此外,我还有不止一张桌子。
当我在开发环境中使用类似大小的数据集测试此部署时,需要几天时间才能完成,因此我正在尝试寻找减少时间的方法。
如果我可以将表移出分区,而不必在 PRIMARY 上写入聚集索引,那么将显着减少更改列所需的时间。
I have a very large table (800GB) which has a DATETIME field which is part of a partition schema. This field is named tran_date. The problem I'm having is that the indexes are not properly aligned with the partition and I can't include the tran_date field in the PRIMARY KEY because it's set to nullable.
I can drop all foreign key relationships, statistics, and indexes, but I can't modify the column because the partition schema is still dependent on the tran_date column.
In my research I've located one way to move the table off of the partition which is to drop the clustered index and then re-write the clustered index onto the PRIMARY filegroup which will then allow me to modify the column, but this operation takes several hours to drop, 13 hours to write the temporary CLUSTERED INDEX on PRIMARY and then I have to drop that, alter the table, and re-write the CLUSTERED INDEX properly which takes another 13 hours. Additionally I have more than one table.
When I tested this deployment in my development environment with a similarly sized data set it took several days to complete, so I'm trying to look for ways to chop down this time.
If I can move the table off the partition without having to write a CLUSTERED INDEX on PRIMARY it would significantly reduce the time required to alter the column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
无论如何,您最终都会将数据从“A 点”(存储在数据库内的表分区中)移动到“B 点”(不存储在数据库内的表分区中)。目标是尽量减少处理所有数据的次数 最简单的方法可能是:
要处理的一个问题是聚集索引。您可以创建没有聚集索引的新表,复制数据,然后重新索引(额外的时间和痛苦),或者您可以创建带有聚集索引的表,然后“按顺序”复制数据(例如,这比将其复制到非聚集表要慢,但总体上可能会更快,因为您不必构建聚集索引当然
存在“如果用户会怎样”的问题 。更改数据的同时你正在复制它”...但是表分区意味着仓储,所以我猜你不必担心这一点。
最后一点,在复制大量数据时,最好将插入分成多次插入,以免事务日志膨胀。
No matter what, you are going to end up moving data from "point A" (stored in table partitions within the database) to "point B" (not stored within table partitions within the database. The goal is to minimize the number of times you have to work through all that data. Simplest way to do this might be:
One problem to deal with is the clustered index. You could either create the new table without the clustered index, copy the data over, and then reindex (extra time and pain), or you could create the table with the clustered index, and copy the data over “in order” (say, low Ids to high). This would be slower than copying it over to a non-clustered table, but it might be faster overall since you wouldn’t then have to build the clustered index.
Of course there's the problem of "what if users change the data while you're copying it"... but table partitioning implies warehousing, so I'm guessing you don't have to worry about that.
A last point, when copying gobs of data, it is best to break the insert into several inserts, so as to not bloat the transaction log.