SQL 归档存储过程的最佳实践

发布于 2024-08-15 02:33:46 字数 1828 浏览 7 评论 0原文

我有一个非常大的数据库(~100Gb),主要由两个我想要减小大小的表组成(两个表都有大约 5000 万条记录)。我在同一台服务器上设置了一个存档数据库,其中包含这两个表,使用相同的架构。我正在尝试确定从实时数据库中删除行并将其插入存档数据库中的最佳概念方法。在伪代码中,这就是我现在正在做的事情:

Declare @NextIDs Table(UniqueID)
Declare @twoYearsAgo = two years from today's date

Insert into @NextIDs 
     SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo

Insert into myArchiveTable
<fields>
SELECT <fields> 
FROM myLargeTable INNER JOIN @NextIDs on myLargeTable.UniqueID = @NextIDs.UniqueID

DELETE MyLargeTable
FROM MyLargeTable INNER JOIN @NextIDs on myLargeTable.UniqueID = @NextIDs.UniqueID

现在需要 7 分钟才能完成 1000 条记录,速度非常慢。我测试了删除和插入,两者都花费了大约。 3.5 分钟即可完成,因此其中一个不一定比另一个效率低得多。谁能指出一些优化思路?

谢谢!

这是 SQL Server 2000。

编辑:在大表上,ActionDate 字段上有一个聚集索引。还有其他两个索引,但在任何查询中都没有引用。 Archive 表没有索引。在我的测试服务器上,这是唯一命中 SQL Server 的查询,因此它应该具有足够的处理能力。

代码(一次循环处理 1000 条记录):

 DECLARE @NextIDs TABLE(UniqueID int primary key)
DECLARE @TwoYearsAgo datetime
SELECT @TwoYearsAgo = DATEADD(d, (-2 * 365), GetDate())

WHILE EXISTS(SELECT TOP 1 UserName FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [ActionDateTime] < @TwoYearsAgo)
BEGIN

BEGIN TRAN

--get all records to be archived
INSERT INTO @NextIDs(UniqueID)
        SELECT TOP 1000 UniqueID FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [UserUnitAudit].[ActionDateTime] < @TwoYearsAgo

--insert into archive table
INSERT INTO [ISArchive].[dbo].[userunitaudit] 
(<Fields>)
SELECT  <Fields>
FROM  [ISAdminDB].[dbo].[UserUnitAudit] AS a
        INNER JOIN @NextIDs AS b ON a.UniqueID = b.UniqueID

--remove from Admin DB
DELETE [ISAdminDB].[dbo].[UserUnitAudit] 
FROM  [ISAdminDB].[dbo].[UserUnitAudit] AS a
INNER JOIN @NextIDs AS b ON a.UniqueID = b.UniqueID 

DELETE FROM @NextIDs

COMMIT

END

I have a very large database (~100Gb) primarily consisting of two tables I want to reduce in size (both of which have approx. 50 million records). I have an archive DB set up on the same server with these two tables, using the same schema. I'm trying to determine the best conceptual way of going about removing the rows from the live db and inserting them in the archive DB. In pseudocode this is what I'm doing now:

Declare @NextIDs Table(UniqueID)
Declare @twoYearsAgo = two years from today's date

Insert into @NextIDs 
     SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo

Insert into myArchiveTable
<fields>
SELECT <fields> 
FROM myLargeTable INNER JOIN @NextIDs on myLargeTable.UniqueID = @NextIDs.UniqueID

DELETE MyLargeTable
FROM MyLargeTable INNER JOIN @NextIDs on myLargeTable.UniqueID = @NextIDs.UniqueID

Right now this takes a horrifically slow 7 minutes to complete 1000 records. I've tested the Delete and the Insert, both taking approx. 3.5 minutes to complete, so its not necessarily one is drastically more inefficient than the other. Can anyone point out some optimization ideas in this?

Thanks!

This is SQL Server 2000.

Edit: On the large table there is a clustered index on the ActionDate field. There are two other indexes, but neither are referenced in any of the queries. The Archive table has no indexes. On my test server, this is the only query hitting the SQL Server, so it should have plenty of processing power.

Code (this does a loop in batches of 1000 records at a time):

 DECLARE @NextIDs TABLE(UniqueID int primary key)
DECLARE @TwoYearsAgo datetime
SELECT @TwoYearsAgo = DATEADD(d, (-2 * 365), GetDate())

WHILE EXISTS(SELECT TOP 1 UserName FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [ActionDateTime] < @TwoYearsAgo)
BEGIN

BEGIN TRAN

--get all records to be archived
INSERT INTO @NextIDs(UniqueID)
        SELECT TOP 1000 UniqueID FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [UserUnitAudit].[ActionDateTime] < @TwoYearsAgo

--insert into archive table
INSERT INTO [ISArchive].[dbo].[userunitaudit] 
(<Fields>)
SELECT  <Fields>
FROM  [ISAdminDB].[dbo].[UserUnitAudit] AS a
        INNER JOIN @NextIDs AS b ON a.UniqueID = b.UniqueID

--remove from Admin DB
DELETE [ISAdminDB].[dbo].[UserUnitAudit] 
FROM  [ISAdminDB].[dbo].[UserUnitAudit] AS a
INNER JOIN @NextIDs AS b ON a.UniqueID = b.UniqueID 

DELETE FROM @NextIDs

COMMIT

END

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

甜柠檬 2024-08-22 02:33:46

您实际上有三个选择,需要在执行插入/删除命令之前运行:

对于第一个插入:

SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo

对于第二个插入:

SELECT <fields> FROM myLargeTable INNER JOIN NextIDs 
on myLargeTable.UniqueID = NextIDs.UniqueID

对于删除:

(select *)
FROM MyLargeTable INNER JOIN NextIDs on myLargeTable.UniqueID = NextIDs.UniqueID

我会尝试优化这些,如果它们都很快,那么索引可能会减慢你的写入速度。一些建议:

  1. 启动探查器并查看读/写等发生了什么。

  2. 检查所有三个语句的索引使用情况。

    >

  3. 尝试运行仅返回 PK 的 SELECTs,以查看延迟是否是查询执行或获取数据(是否有任何全文索引字段、TEXT 字段)等)

You effectively have three selects which need to be run before your insert/delete commands are executed:

for the 1st insert:

SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo

for the 2nd insert:

SELECT <fields> FROM myLargeTable INNER JOIN NextIDs 
on myLargeTable.UniqueID = NextIDs.UniqueID

for the delete:

(select *)
FROM MyLargeTable INNER JOIN NextIDs on myLargeTable.UniqueID = NextIDs.UniqueID

I'd try and optimize these and if they are all quick, then the indexes may be slowing down your writes. Some suggestions:

  1. start profiler and see what's happenng with the reads/writes etc.

  2. check index usage for all three statements.

  3. try running the SELECTs returning only the PK, to see if the delay is query execution or fetching the data (do have e.g. any fulltext-indexed fields, TEXT fields etc.)

暖树树初阳… 2024-08-22 02:33:46

您在源表上是否有用于过滤结果的列的索引?在本例中,这就是 actionDate。

此外,在进行大量插入之前,它通常可以帮助从目标表中删除所有索引,但在这种情况下,您一次只能执行 100 个索引。

您最好批量进行此操作。一次一百个查询的开销最终将主导成本/时间。

在此期间服务器上是否有其他活动?是否有阻塞发生?

希望这能为您提供一个起点。

如果您可以提供您正在使用的确切代码(如果存在隐私问题,可能不需要列名称),那么也许有人可以找到其他优化方法。

编辑:
您检查过代码块的查询计划吗?我遇到过这样的表变量问题,查询优化器无法确定表变量的大小会很小,因此它总是尝试对基表进行全表扫描。

就我而言,它最终成为一个有争议的问题,所以我不确定最终的解决方案是什么。您当然可以将 actionDate 上的条件添加到所有选择查询中,这至少可以最大限度地减少这种影响。

另一种选择是使用普通表来保存 ID。

Do you have an index on the source table for the column which you're using to filter the results? In this case, that would be the actionDate.

Also, it can often help to remove all indexes from the destination table before doing massive inserts, but in this case you're only doing 100's at a time.

You would also probably be better off doing this in larger batches. With one hundred at a time the overhead of the queries is going to end up dominating the costs/time.

Is there any other activity on the server during this time? Is there any blocking happening?

Hopefully this gives you a starting point.

If you can provide the exact code that you're using (maybe without the column names if there are privacy issues) then maybe someone can spot other ways to optimize.

EDIT:
Have you checked the query plan for your block of code? I've run into issues with table variables like this where the query optimizer couldn't figure out that the table variable would be small in size so it always tried to do a full table scan on the base table.

In my case it eventually became a moot point, so I'm not sure what the ultimate solution is. You can certainly add a condition on the actionDate to all of your select queries, which would at least minimize the effects of this.

The other option would be to use a normal table to hold the IDs.

清旖 2024-08-22 02:33:46

INSERT 和 DELETE 语句正在连接

[ISAdminDB].[dbo].[UserUnitAudit].UniqueID

如果没有索引,并且您指出没有索引,则您正在执行两次表扫描。这可能是缓慢的根源,b/ca SQL Server 表扫描将整个表读取到临时表中,在临时表中搜索匹配的行,然后删除临时表。

我认为您需要在 UniqueID 上添加索引。维护它对性能的影响必须小于表扫描。存档完成后您可以将其删除。

The INSERT and DELETE statements are joining on

[ISAdminDB].[dbo].[UserUnitAudit].UniqueID

If there's no index on this, and you indicate there isn't, you're doing two table scans. That's likely the source of the slowness, b/c a SQL Server table scan reads the entire table into a scratch table, searches the scratch table for matching rows, then drops the scratch table.

I think you need to add an index on UniqueID. The performance hit for maintaining it has got to be less than table scans. And you can drop it after your archive is done.

执妄 2024-08-22 02:33:46

myLargeTable.actionDate 和 .UniqueID 上是否有索引?

Are there any indexes on myLargeTable.actionDate and .UniqueID?

赠我空喜 2024-08-22 02:33:46

您是否尝试过大于 100 的批量大小?

什么事情最耗时?插入还是删除?

Have you tried larger batch sizes than 100?

What is taking the most time? The INSERT, or the DELETE?

黯然 2024-08-22 02:33:46

您可以尝试使用输出子句执行此操作:

declare @items table (
  <field list just like source table> )

delete top 100 source_table
  output deleted.first_field, deleted.second_field, etc
  into @items
  where <conditions>

insert archive_table (<fields>)
  select (<fields>) from @items

您也可以通过直接在存档表中执行“输出到”来在单个查询中执行此操作(消除对表变量的需要)

You might try doing this using the output clause:

declare @items table (
  <field list just like source table> )

delete top 100 source_table
  output deleted.first_field, deleted.second_field, etc
  into @items
  where <conditions>

insert archive_table (<fields>)
  select (<fields>) from @items

You also might be able to do this in a single query, by doing 'output into' directly into the archive table (eliminating the need for the table var)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文