数据增长策略
我想了解您对如何处理与 MSSQL 数据库相关的数据增长的意见。为了简单起见,我将把我们的讨论限制在一张桌子上。假设表增长到包含大量数据,即尽管存在索引,但最终用户仍会注意到检索数据的时间。
现在,尽管我们需要这样做,但并非所有数据都需要始终可用。换句话说,我们可以“归档”一些旧数据。我们当前的策略只是将“存档”数据移动到另一个表,以便“日常”查询很快,并且我们可以在需要时通过组合两个表来访问“存档”数据。
I would like your opinions on how to deal with data growth as it pertains to a MSSQL database. To keep things simple, I'll constrain our discussion to one table only. Say that the table grows to contain lots of data, i.e. the time to retrieve data is noticeable to the end user despite the fact that there are indexes.
Now, not all the data needs to be available at all times although should we need to do so. In other words, we can "archive" some of the old data. Our current strategy is simply to move the "archive" data to another table so that "day to day" queries are quick and we can access the "archived" data when we need to by combining the two tables.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用表分区。
阅读这些链接,作者通过示例详细介绍了这一点。
http://blog.sqlauthority.com/ 2008/01/24/sql-server-2005-introduction-to-partitioning/
http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-表分区教程如何水平分区数据库表/
Use table partitioning.
Read these links the author details it with an example.
http://blog.sqlauthority.com/2008/01/24/sql-server-2005-introduction-to-partitioning/
http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/
我已经见过很多次了,而且它肯定有效。如果您使用存储过程,则可以维护两个集,其中一个仅用于日常,第二个将引用两个表或组合这两个集的视图。
根据您执行此操作的时间长短,您仍然可能最终得到一个在某个时刻包含大量数据的存档表,因此您可能需要在将数据移动到脱机存储或存储之前查看具有结束日期的策略。第三个表不作为流程的一部分进行搜索。
I've seen that a lot, and it can certainly work. If you are using stored procedures, you can then maintain two sets where one is only for day to day, and the second will reference both tables or a view that combines the two sets.
Depending on how long you do that for, you still potentially end up with an archive table that has a lot of data at some point, so you may need to look at a strategy that has an end date before moving data into offline storage or a third table that is not searched as part of the process.
对于 SQL Server 2005+,请查看表分区。
For SQL Server 2005+, take a look at table partitioning.
作为对其他答案的一点澄清,这两种策略都可以很好地工作,具体取决于您使用的 Sql Server 版本。如果您有企业版,那么表分区是我会采用的方式,因为它在开发方面需要较少的工作。不幸的是,微软将表分区仅限于企业版,尽管它是几乎所有其他数据库(包括免费开源替代方案)中的一项功能。如果您无法使用企业版(许多人认为它的价格超过 2 万美元/CPU),那么按照您最初建议的那样采用手动分区方法是一个不错的选择。另一种不需要企业版的替代方法是使用覆盖索引。一种好方法是使用视图并在视图上创建物化索引,它基本上只在磁盘上创建您需要的数据的副本,而无需您自己移动数据的开销。假设您创建了一个带有 where 子句的视图,仅包含今天的数据,然后在该视图上创建了一个物化索引,实际上会有一个索引包含磁盘上今天的所有数据,并且返回的速度将与您创建的速度一样快手动对数据进行分区。不幸的是,您可以在其上创建索引的视图的复杂性有很多限制,但值得一试,当它工作时它会很好地工作。
As a bit of clarification to the other answers, both strategies work well depending on which edition of Sql Server you are using. If you have enterprise edition then table partitioning is the way I would go as it requires less work on the development side. Unfortunately Microsoft has limited table partitioning to enterprise edition only, even though it is a feature in virtually every other database out there including the free open source alternatives. If you are stuck without enterprise edition (as many are considing it's >$20k/cpu pricing) then moving to manual partitioning approach as you first suggested is a good alternative. Another alternative that does not require enterprise edition is to use covering indexes. One good way to do that is to use a view and create a materialized index on the view, it is basically creating copy of only the data you need on disk without the overhead of you moving the data around yourself. Say you created a view with a where clause to only include today's data and then created a materialized index on the view, there would actually be an index containing all of the data from today on disk and the return will be as fast as if you had manually partitioned the data. Unfortunately there are many limits to the complexity of a view that you can create an index on but it's worth a shot, when it works it works well.