SQL Server 2005 中从小表批量复制到大表
我是 SQL Server 的新手,遇到以下困境:
我有两个具有相同结构的表。将其命名为 runningTbl
和 finalTbl
。
runningTbl
每 15 分钟包含大约 600 000 到 100 万行。
在 runningTbl
中进行一些数据清理后,我想将所有记录移至 finalTbl
。 finalTbl
目前约有 3800 万行。
上述过程需要每15-20分钟重复一次。
问题是,将数据从 runningTbl
移动到 finalTbl
有时需要超过 20 分钟。
最初,当表很小时,需要 10 秒到2分钟复制。
现在只是需要太长的时间。
有谁可以帮忙解决这个问题吗?要遵循的 SQL 查询..
谢谢
I'm a newbie in SQL Server and have the following dilemma:
I have two tables with the same structure. Call it runningTbl
and finalTbl
.
runningTbl
contains about 600 000 to 1 million rows every 15 minutes.
After doing some data cleanup in runningTbl
I want to move all the records to finalTbl
.finalTbl
currently has about 38 million rows.
The above process needs to be repeated every 15-20 minutes.
The problem is that the moving of data from runningTbl
to finalTbl
is taking way longer than 20 minutes at times..
Initially when the tables were small it took anything from 10 seconds to 2 minutes to copy.
Now it just takes too long.
Any one that can assist with this? SQL query to follow..
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为了获得复制数据的最有效方法,您需要执行许多操作。到目前为止,您走在正确的道路上,但还有很长的路要走。我建议您首先查看您的索引。那里可能有一些优化可以提供帮助。接下来,确保此表上没有可能导致速度减慢的触发器。接下来,更改日志记录级别(如果可以更改)。
这里还有更多帮助(来自 Microsoft):
http://msdn.microsoft.com/en-us/library/ms190421(v=SQL.90).aspx
基本上,您使用 BCP 的方向是正确的。这实际上是微软的建议:
不过,当您执行此操作时,您还需要考虑如果引入的数据过多(基于您使用的索引类型),则删除索引的可能性。如果您使用聚集索引,那么在导入之前对数据进行排序可能也是一个好主意。以下是更多信息(包括上述引用的来源):
http://msdn.microsoft.com/en-US/library/ms177445(v=SQL.90).aspx
There are a number of things that you will need to do in order to get the most efficient method of copying the data. So far you are on the right track but you have a long way to go. I would suggest you first look at your indexes. There may be optimizations there that can help. Next, make sure you don't have triggers on this table that could cause a slowdown. Next, change the logging level (if that is permutable).
There is a bunch more help here (from Microsoft):
http://msdn.microsoft.com/en-us/library/ms190421(v=SQL.90).aspx
Basically you are on the right track using BCP. This is actually Microsoft's recommendation:
When you do this though, you need to also consider the possibility of dropping your indexes if there is too much data being brought in (based upon the type of index you use). If you use a clustered index, it may also be a good idea to order your data before import. Here is more information (including the source of the above quote):
http://msdn.microsoft.com/en-US/library/ms177445(v=SQL.90).aspx
对于初学者:多年来我学到的一件事是,MSSQL 在优化各种操作方面做得很好,但要做到这一点在很大程度上依赖于所有涉及的表的统计信息。因此,我建议运行“UPDATE STATISTICSprocessed_logs”&在运行实际插入之前“UPDATE STATISTICS unprocessed_logs”;即使在一张大桌子上,这些事情也不会花那么长时间。
除此之外,根据上面的查询,很大程度上取决于目标表的索引。我假设目标表在(至少)UnixTime 上有聚集索引(或主键),如果没有,当您在现有记录之间压缩越来越多的数据时,您将创建主要的数据碎片。要解决此问题,您可以尝试偶尔对目标表进行碎片整理(可以在线完成,但需要很长时间),但创建聚集索引(或 PK)以便数据始终附加到表的末尾是更好的方法;好吧,至少在我看来。
For starters : one of the things I've learned over the years is that MSSQL does a great job at optimizing all kinds of operations but to do so heavily relies on the statistics for all tables involved. Hence, I would suggest to run "UPDATE STATISTICS processed_logs" & "UPDATE STATISTICS unprocessed_logs" before running the actual inserts; even on a large table these things don't take all that long.
Apart from that, based on the query above, a lot depends on the indexes of the target table. I'm assuming the target table has its clustered index (or PRIMARY KEY) on (at least) UnixTime, if not you'll create major data-fragmentation when you squeeze more and more data in-between the already existing records. To work around this you could try defragmenting the target table once in a while (can be done online, but takes a long time), but making the clustered index (or PK) so that data is always appended to the end of the table would be the better approach; well, at least in my opinion.
我建议您应该有一个窗口服务并使用计时器和布尔变量。一旦您的请求发送到服务器,请将布尔值设置为高位,并且计时器事件在该位为低位之前不应执行代码。
I suggest that you should have a window service and use timer and a boolean variable. Once your request is sent to server set the bool to high bit and the timer event should not execute code until the bit is low.