全部删除/批量插入

发布于 2024-09-29 07:38:13 字数 380 浏览 5 评论 0原文

首先,我要说的是,我正在 SQL Server 2005 上运行,因此我无权访问 MERGE

我有一个包含约 15 万行的表,我每天都会从文本文件中更新该表。当行从文本文件中掉出时,我需要从数据库中删除它们,如果它们发生变化或者是新的,我需要相应地更新/插入。

经过一些测试后,我发现从性能角度来看,执行完全删除然后从文本文件中批量插入比逐行读取文件进行更新/插入要快得多。不过,我最近看到一些帖子讨论使用临时表和 UPDATE 语句的输出来模仿 SQL Server 2008 的 MERGE 功能。

我对此很感兴趣,因为我正在研究如何消除当表没有行时删除/批量插入方法中的时间。我仍然认为这种方法是最快的,所以我正在寻找解决空表问题的最佳方法。

谢谢

First off let me say I am running on SQL Server 2005 so I don't have access to MERGE.

I have a table with ~150k rows that I am updating daily from a text file. As rows fall out of the text file I need to delete them from the database and if they change or are new I need to update/insert accordingly.

After some testing I've found that performance wise it is exponentially faster to do a full delete and then bulk insert from the text file rather than read through the file line by line doing an update/insert. However I recently came across some posts discussing mimicking the MERGE functionality of SQL Server 2008 using a temp table and the output of the UPDATE statement.

I was interested in this because I am looking into how I can eliminate the time in my Delete/Bulk Insert method when the table has no rows. I still think that this method will be the fastest so I am looking for the best way to solve the empty table problem.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

你的往事 2024-10-06 07:38:13

我认为你最快的方法是:

  1. 删除所有外键和索引
    从你的桌子上。
  2. 截断
    桌子。
  3. 批量插入您的数据。
  4. 重新创建您的外键并
    索引。

I think your fastest method would be to:

  1. Drop all foreign keys and indexes
    from your table.
  2. Truncate your
    table.
  3. Bulk insert your data.
  4. Recreate your foreign keys and
    indexes.
勿忘心安 2024-10-06 07:38:13

问题是 Joe 的解决方案不够快,还是在进程运行时无法对目标表进行任何活动?如果您只需要阻止用户对目标表运行查询,则应该将进程包含在事务块中。这样,当 TRUNCATE TABLE 执行时,它将创建一个表锁,该锁将在事务持续时间内保持,如下所示:

begin tran;

truncate table stage_table

bulk insert stage_table
from N'C:\datafile.txt'

commit tran;

Is the problem that Joe's solution is not fast enough, or that you can not have any activity against the target table while your process runs? If you just need to prevent users from running queries against your target table, you should contain your process within a transaction block. This way, when your TRUNCATE TABLE executes, it will create a table lock that will be held for the duration of the transaction, like so:

begin tran;

truncate table stage_table

bulk insert stage_table
from N'C:\datafile.txt'

commit tran;
唔猫 2024-10-06 07:38:13

另一种解决方案可以满足您正在更新的表没有“停机时间”的要求。

听起来好像最初您正在读取文件并一次执行 INSERT/UPDATE/DELETE 1 行。比这更高效的方法,不涉及清除表,如下所示:

1)将文件批量加载到新的单独表中(无索引)
2)然后在其上创建PK
3) 运行 3 条语句以从这个新(临时)表更新原始表:
删除主表中新表中不存在的行
更新主表中的行,其中新表中有匹配的行
将不存在的新表中的行插入到主表中

这将比逐行操作执行得更好,并且有望满足您的总体要求

An alternative solution which would satsify your requirement for not having "down time" for the table you are updating.

It sounds like originally you were reading the file and doing an INSERT/UPDATE/DELETE 1 row at a time. A more performant approach than that, that does not involve clearing down the table is as follows:

1) bulk load the file into a new, separate table (no indexes)
2) then create the PK on it
3) Run 3 statements to update the original table from this new (temporary) table:
DELETE rows in the main table that don't exist in the new table
UPDATE rows in the main table where there is a matching row in the new table
INSERT rows into main table from the new table where they don't already exist

This will perform better than row-by-row operations and should hopefully satisfy your overall requirements

庆幸我还是我 2024-10-06 07:38:13

有一种零停机时间更新表的方法:在表中保留两天的数据,并在加载新行后删除旧行!

  1. 添加一个 DataDate 列,表示大约 150K 行的有效日期。
  2. 创建一个包含“今天”DataDate 的单行单列表。
  3. 创建两个表的视图,仅选择与 DataDate 表中的行匹配的行。如果您愿意的话,可以将其编入索引。读者现在将参考此视图,而不是表格。
  4. 批量插入行。 (显然,您需要将 DataDate 添加到每一行。)
  5. 更新 DataDate 表。 立即查看更新
  6. 闲暇时删除昨天的行。

SELECT 性能不会受到影响;沿着主键将一行连接到 150,000 行对于任何 15 年以下的服务器来说应该不会出现任何问题。

我经常使用这种技术,并且在处理依赖于 sp_rename 的进程时也遇到了困难。修改模式的生产过程是一个令人头疼的问题。不。

There is a way to update the table with zero downtime: keep two day's data in the table, and delete the old rows after loading the new ones!

  1. Add a DataDate column representing the date for which your ~150K rows are valid.
  2. Create a one-row, one-column table with "today's" DataDate.
  3. Create a view of the two tables that selects only rows matching the row in the DataDate table. Index it if you like. Readers will now refer to this view, not the table.
  4. Bulk insert the rows. (You'll obviously need to add the DataDate to each row.)
  5. Update the DataDate table. View updates Instantly!
  6. Delete yesterday's rows at your leisure.

SELECT performance won't suffer; joining one row to 150,000 rows along the primary key should present no problem to any server less than 15 years old.

I have used this technique often, and have also struggled with processes that relied on sp_rename. Production processes that modify the schema are a headache. Don't.

梦萦几度 2024-10-06 07:38:13

对于原始速度,我认为表中有大约 150K 行,我只需删除表,从头开始重新创建它(没有索引),然后重新批量加载。完成批量加载后,创建索引。

当然,这假设在一段时间内表为空/不存在是可以接受的,这听起来确实可能是这种情况。

For raw speed, I think with ~150K rows in the table, I'd just drop the table, recreate it from scratch (without indexes) and then bulk load afresh. Once the bulk load has been done, then create the indexes.

This assumes of course that having a period of time when the table is empty/doesn't exist is acceptable which it does sound like could be the case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文