全部删除/批量插入
首先,我要说的是,我正在 SQL Server 2005 上运行,因此我无权访问 MERGE
。
我有一个包含约 15 万行的表,我每天都会从文本文件中更新该表。当行从文本文件中掉出时,我需要从数据库中删除它们,如果它们发生变化或者是新的,我需要相应地更新/插入。
经过一些测试后,我发现从性能角度来看,执行完全删除然后从文本文件中批量插入比逐行读取文件进行更新/插入要快得多。不过,我最近看到一些帖子讨论使用临时表和 UPDATE 语句的输出来模仿 SQL Server 2008 的 MERGE 功能。
我对此很感兴趣,因为我正在研究如何消除当表没有行时删除/批量插入方法中的时间。我仍然认为这种方法是最快的,所以我正在寻找解决空表问题的最佳方法。
谢谢
First off let me say I am running on SQL Server 2005 so I don't have access to MERGE
.
I have a table with ~150k rows that I am updating daily from a text file. As rows fall out of the text file I need to delete them from the database and if they change or are new I need to update/insert accordingly.
After some testing I've found that performance wise it is exponentially faster to do a full delete and then bulk insert from the text file rather than read through the file line by line doing an update/insert. However I recently came across some posts discussing mimicking the MERGE
functionality of SQL Server 2008 using a temp table and the output of the UPDATE
statement.
I was interested in this because I am looking into how I can eliminate the time in my Delete/Bulk Insert method when the table has no rows. I still think that this method will be the fastest so I am looking for the best way to solve the empty table problem.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为你最快的方法是:
从你的桌子上。
桌子。
索引。
I think your fastest method would be to:
from your table.
table.
indexes.
问题是 Joe 的解决方案不够快,还是在进程运行时无法对目标表进行任何活动?如果您只需要阻止用户对目标表运行查询,则应该将进程包含在事务块中。这样,当 TRUNCATE TABLE 执行时,它将创建一个表锁,该锁将在事务持续时间内保持,如下所示:
Is the problem that Joe's solution is not fast enough, or that you can not have any activity against the target table while your process runs? If you just need to prevent users from running queries against your target table, you should contain your process within a transaction block. This way, when your TRUNCATE TABLE executes, it will create a table lock that will be held for the duration of the transaction, like so:
另一种解决方案可以满足您正在更新的表没有“停机时间”的要求。
听起来好像最初您正在读取文件并一次执行 INSERT/UPDATE/DELETE 1 行。比这更高效的方法,不涉及清除表,如下所示:
1)将文件批量加载到新的单独表中(无索引)
2)然后在其上创建PK
3) 运行 3 条语句以从这个新(临时)表更新原始表:
删除主表中新表中不存在的行
更新主表中的行,其中新表中有匹配的行
将不存在的新表中的行插入到主表中
这将比逐行操作执行得更好,并且有望满足您的总体要求
An alternative solution which would satsify your requirement for not having "down time" for the table you are updating.
It sounds like originally you were reading the file and doing an INSERT/UPDATE/DELETE 1 row at a time. A more performant approach than that, that does not involve clearing down the table is as follows:
1) bulk load the file into a new, separate table (no indexes)
2) then create the PK on it
3) Run 3 statements to update the original table from this new (temporary) table:
DELETE rows in the main table that don't exist in the new table
UPDATE rows in the main table where there is a matching row in the new table
INSERT rows into main table from the new table where they don't already exist
This will perform better than row-by-row operations and should hopefully satisfy your overall requirements
有一种零停机时间更新表的方法:在表中保留两天的数据,并在加载新行后删除旧行!
SELECT
性能不会受到影响;沿着主键将一行连接到 150,000 行对于任何 15 年以下的服务器来说应该不会出现任何问题。我经常使用这种技术,并且在处理依赖于 sp_rename 的进程时也遇到了困难。修改模式的生产过程是一个令人头疼的问题。不。
There is a way to update the table with zero downtime: keep two day's data in the table, and delete the old rows after loading the new ones!
SELECT
performance won't suffer; joining one row to 150,000 rows along the primary key should present no problem to any server less than 15 years old.I have used this technique often, and have also struggled with processes that relied on
sp_rename
. Production processes that modify the schema are a headache. Don't.对于原始速度,我认为表中有大约 150K 行,我只需删除表,从头开始重新创建它(没有索引),然后重新批量加载。完成批量加载后,创建索引。
当然,这假设在一段时间内表为空/不存在是可以接受的,这听起来确实可能是这种情况。
For raw speed, I think with ~150K rows in the table, I'd just drop the table, recreate it from scratch (without indexes) and then bulk load afresh. Once the bulk load has been done, then create the indexes.
This assumes of course that having a period of time when the table is empty/doesn't exist is acceptable which it does sound like could be the case.