在 SQL 中,UPDATE 总是比 DELETE+INSERT 快吗?

发布于 2024-08-01 23:30:14 字数 466 浏览 2 评论 0原文

假设我有一个简单的表,其中包含以下字段:

  1. ID:int,自动增量(身份),主键
  2. 名称:varchar(50),唯一,具有唯一索引
  3. 标签:int

我从不使用 ID 字段进行查找,因为我的应用程序始终基于使用“名称”字段。

我需要不时更改标签值。 我正在使用以下简单的 SQL 代码:

UPDATE Table SET Tag = XX WHERE Name = YY;

我想知道是否有人知道上面的代码是否总是比以下更快:

DELETE FROM Table WHERE Name = YY;
INSERT INTO Table (Name, Tag) VALUES (YY, XX);

再次 - 我知道在第二个示例中 ID 已更改,但这对我的应用程序来说并不重要。

Say I have a simple table that has the following fields:

  1. ID: int, autoincremental (identity), primary key
  2. Name: varchar(50), unique, has unique index
  3. Tag: int

I never use the ID field for lookup, because my application is always based on working with the Name field.

I need to change the Tag value from time to time. I'm using the following trivial SQL code:

UPDATE Table SET Tag = XX WHERE Name = YY;

I wondered if anyone knows whether the above is always faster than:

DELETE FROM Table WHERE Name = YY;
INSERT INTO Table (Name, Tag) VALUES (YY, XX);

Again - I know that in the second example the ID is changed, but it does not matter for my application.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

缘字诀 2024-08-08 23:30:14

请记住,当发出 DELETE+INSERT 时发生的实际碎片与正确实现的 UPDATE 不同,随着时间的推移,会产生很大的差异。

这就是为什么不鼓励使用 MySQL 实现的 REPLACE INTO,而不是使用 INSERT INTO ... ON DUPLICATE KEY UPDATE ... 语法。

Keep in mind the actual fragmentation that occurs when DELETE+INSERT is issued opposed to a correctly implemented UPDATE will make great difference by time.

Thats why, for instance, REPLACE INTO that MySQL implements is discouraged as opposed to using the INSERT INTO ... ON DUPLICATE KEY UPDATE ... syntax.

街角卖回忆 2024-08-08 23:30:14

恐怕您的问题正文与标题问题无关。

如果要回答标题的话:

在 SQL 中,UPDATE 总是比 DELETE+INSERT 快吗?

那么答案是否定的!

只需谷歌搜索

  • “昂贵的直接更新”*“sql server”
  • “延迟更新”*“sql server”

这样的更新会导致通过插入+更新实现更新比直接插入+更新更昂贵(更多处理)。 这些情况是当

  • 使用唯一(或主)键更新字段时,或者
  • 新数据不适合(更大)分配的更新前行空间(甚至最大行大小)时,导致碎片
  • 等。

我的快速(非详尽)搜索,而不是假装覆盖一个,给了我 [1], [2]

[1]
更新操作
(Sybase® SQL Server 性能和调优指南
第 7 章:SQL Server 查询优化器)
http://www.lcard.ru/~nail/sybase/perf/11500 .htm
[2]
UPDATE 语句可以复制为 DELETE/INSERT 对
http://support.microsoft.com/kb/238254

I am afraid the body of your question is unrelated to title question.

If to answer the title:

In SQL, is UPDATE always faster than DELETE+INSERT?

then answer is NO!

Just google for

  • "Expensive direct update"* "sql server"
  • "deferred update"* "sql server"

Such update(s) result in more costly (more processing) realization of update through insert+update than direct insert+update. These are the cases when

  • one updates the field with unique (or primary) key or
  • when the new data does not fit (is bigger) in the pre-update row space allocated (or even maximum row size),resulting in fragmentation,
  • etc.

My fast (non-exhaustive) search, not pretending to be covering one, gave me [1], [2]

[1]
Update Operations
(Sybase® SQL Server Performance and Tuning Guide
Chapter 7: The SQL Server Query Optimizer)
http://www.lcard.ru/~nail/sybase/perf/11500.htm
[2]
UPDATE Statements May be Replicated as DELETE/INSERT Pairs
http://support.microsoft.com/kb/238254

雪花飘飘的天空 2024-08-08 23:30:14

这个答案有点晚了,但由于我遇到了类似的问题,我在同一台机器上使用 JMeter 和 MySQL 服务器进行了测试,其中我使用了:

  1. 包含两个 JDBC 请求的事务控制器(生成父示例):删除和插入语句
  2. 包含更新语句的单独 JDBC 请求。

运行 500 次循环测试后,我获得了以下结果:

DEL + INSERT - 平均:62ms

更新 - 平均:30ms

结果:
结果

A bit too late with this answer, but since I faced a similar question, I made a test with JMeter and a MySQL server on same machine, where I have used:

  1. A transaction Controller (generating parent sample) that contained two JDBC Requests: a Delete and an Insert statement
  2. A sepparate JDBC Request containing the Update statement.

After running the test for 500 loops, I have obtained the following results:

DEL + INSERT - Average: 62ms

Update - Average: 30ms

Results:
Results

Saygoodbye 2024-08-08 23:30:14

表越大(列数和大小),删除和插入而不是更新的成本就越高。 因为你要付出UNDO和REDO的代价。 DELETE 比 UPDATE 消耗更多的 UNDO 空间,并且 REDO 包含的语句数量是所需语句的两倍。

此外,从商业角度来看,这是完全错误的。 考虑一下理解该表上的名义审计跟踪会有多困难。


在某些情况下,涉及对表中的所有行进行批量更新,其中使用 CTAS 从旧表创建新表(在 SELECT 子句的投影中应用更新)、删除旧表并重命名新表会更快。新表。 副作用是创建索引、管理约束和更新权限,但值得考虑。

The bigger the table (number of and size of columns) the more expensive it becomes to delete and insert rather than update. Because you have to pay the price of UNDO and REDO. DELETEs consume more UNDO space than UPDATEs, and your REDO contains twice as many statements as are necessary.

Besides, it is plain wrong from a business point of view. Consider how much harder it would be to understand a notional audit trail on that table.


There are some scenarios involving bulk updates of all the rows in a table where it is faster to create a new table using CTAS from the old table (applying the update in the the projection of the SELECT clause), dropping the old table and renaming the new table. The side-effects are creating indexes, managing constraints and renewing privileges, but it is worth considering.

纸短情长 2024-08-08 23:30:14

同一行上的一个命令应始终比同一行上的两个命令快。 所以只更新会更好。

编辑
设置表:

create table YourTable
(YourName  varchar(50)  primary key
,Tag int
)

insert into YourTable values ('first value',1)

运行此命令,在我的系统上花费 1 秒(sql server 2005):

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
    Set @x=@x+1
    update YourTable set YourName='new name' where YourName='new name'
    SET @y=@y+@@ROWCOUNT
end
print @y

运行此命令,在我的系统上花费 2 秒:

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
    Set @x=@x+1
    DELETE YourTable WHERE YourName='new name'
    insert into YourTable values ('new name',1)
    SET @y=@y+@@ROWCOUNT
end
print @y

One command on the same row should always be faster than two on that same row. So the UPDATE only would be better.

EDIT
set up the table:

create table YourTable
(YourName  varchar(50)  primary key
,Tag int
)

insert into YourTable values ('first value',1)

run this, which takes 1 second on my system (sql server 2005):

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
    Set @x=@x+1
    update YourTable set YourName='new name' where YourName='new name'
    SET @y=@y+@@ROWCOUNT
end
print @y

run this, which took 2 seconds on my system:

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
    Set @x=@x+1
    DELETE YourTable WHERE YourName='new name'
    insert into YourTable values ('new name',1)
    SET @y=@y+@@ROWCOUNT
end
print @y
半世晨晓 2024-08-08 23:30:14

刚刚尝试更新具有 44 个字段的表上的 43 个字段,其余字段是主聚集键。

更新花了8秒。

删除+插入比“客户端统计”通过 SQL Management Studio 报告的最小时间间隔要快。

彼得

MS SQL 2008

Just tried updating 43 fields on a table with 44 fields, the remaining field was the primary clustered key.

The update took 8 seconds.

A Delete + Insert is faster than the minimum time interval that the "Client Statistics" reports via SQL Management Studio.

Peter

MS SQL 2008

叹倦 2024-08-08 23:30:14

如果没有特定的速度问题,速度问题就无关紧要。

如果您正在编写 SQL 代码来更改现有行,请更新它。 其他任何内容都是不正确的。

如果你要打破代码应该如何工作的规则,那么你最好有一个非常好的、量化的理由,而不是一个模糊的想法“这种方式更快”,当你没有任何理由时知道什么是“更快”。

The question of speed is irrelevant without a specific speed problem.

If you are writing SQL code to make a change to an existing row, you UPDATE it. Anything else is incorrect.

If you're going to break the rules of how code should work, then you'd better have a damn good, quantified reason for it, and not a vague idea of "This way is faster", when you don't have any idea what "faster" is.

江挽川 2024-08-08 23:30:14

对于你的情况,我相信更新会更快。

记住索引!

你已经定义了一个主键,它很可能会自动成为一个聚集索引(至少SQL Server是这样做的)。 簇索引意味着记录根据索引物理地放置在磁盘上。 DELETE操作本身不会造成太大麻烦,即使一条记录消失后,索引仍然保持正确。 但是,当您插入一条新记录时,数据库引擎必须将该记录放在正确的位置,这在某些情况下会导致旧记录进行一些“重新洗牌”,以便为新记录“腾出位置”。 在那里它会减慢操作速度。

如果值不断增加,索引(尤其是聚集索引)效果最好,因此新记录只会附加到尾部。 也许你可以添加一个额外的 INT IDENTITY 列来成为聚集索引,这将简化插入操作。

In your case, I believe the update will be faster.

Remember indexes!

You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.

An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.

恰似旧人归 2024-08-08 23:30:14

如果您有几百万行怎么办? 每一行都以一条数据开始,可能是一个客户端名称。 当您为客户收集数据时,必须更新他们的条目。 现在,我们假设客户端数据的收集分布在许多其他机器上,随后从这些机器上收集数据并将其放入数据库中。 如果每个客户端都有唯一的信息,那么您将无法执行批量更新; 即,没有 where 子句标准可供您用来一次性更新多个客户端。 另一方面,您可以执行批量插入。 因此,问题可能更好地提出如下:执行数百万个单一更新更好,还是将它们编译成大批量删除和插入更好。 换句话说,您不需要“update [table] set field=data where clientid=123”一百万次,而是“从 [table] where clientid in ([all client to be update]);insert into [table]”删除值(客户端 1 的数据)、(客户端 2 的数据)等'

哪个选择比另一个更好,还是两者都搞砸了?

What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'

Is either choice better than the other, or are you screwed both ways?

無處可尋 2024-08-08 23:30:14

显然,答案会根据您使用的数据库而有所不同,但 UPDATE 总是可以比 DELETE+INSERT 更快地实现。 由于内存中的操作大多都是微不足道的,给定一个基于硬盘的数据库,更新可以在硬盘上就地更改数据库字段,而删除将删除一行(留下空白),并插入一个新的行。行,也许到表的末尾(同样,这一切都在实现中)。

另一个次要问题是,当您更新单行中的单个变量时,该行中的其他列保持不变。 如果您 DELETE 然后执行 INSERT,您将面临忘记其他列并因此将它们抛在后面的风险(在这种情况下,您必须在 DELETE 之前执行 SELECT 来临时存储其他列,然后再使用 INSERT 将它们写回) 。

Obviously, the answer varies based on what database you are using, but UPDATE can always be implemented faster than DELETE+INSERT. Since in-memory ops are mostly trivial anyways, given a hard-drive based database, an UPDATE can change a database field in-place on the hdd, while a delete would remove a row (leaving an empty space), and insert a new row, perhaps to the end of the table (again, it's all in the implementation).

The other, minor, issue is that when you UPDATE a single variable in a single row, the other columns in that row remain the same. If you DELETE and then do an INSERT, you run the risk of forgetting about other columns and consequently leaving them behind (in which case you would have to do a SELECT before your DELETE to temporarily store your other columns before writing them back with INSERT).

自由范儿 2024-08-08 23:30:14

删除+插入几乎总是更快,因为更新涉及更多步骤。

更新:

  1. 使用 PK 查找行。
  2. 从磁盘读取该行。
  3. 检查哪些值已更改
  4. 使用填充的 :NEW 和 :OLD 变量引发 onUpdate 触发器
  5. 将新变量写入磁盘(整行)

    (这对您正在更新的每一行重复)

删除+插入:

  1. 将行标记为已删除(仅在PK)。
  2. 在表末尾插入新行。
  3. 使用新记录的位置更新 PK 索引。

    (这不会重复,所有操作都可以在单个操作块中执行)。

使用“插入+删除”会使您的文件系统碎片化,但速度不会那么快。 在后台进行惰性优化总是会释放未使用的块并将表打包在一起。

Delete + Insert is almost always faster because an Update has way more steps involved.

Update:

  1. Look for the row using PK.
  2. Read the row from disk.
  3. Check for which values have changed
  4. Raise the onUpdate Trigger with populated :NEW and :OLD variables
  5. Write New variables to disk (The entire row)

    (This repeats for every row you're updating)

Delete + Insert:

  1. Mark rows as deleted (Only in the PK).
  2. Insert new rows at the end of the table.
  3. Update PK Index with locations of new records.

    (This doesn't repeat, all can be perfomed in a single block of operation).

Using Insert + Delete will fragment your File System, but not that fast. Doing a lazy optimization on the background will allways free unused blocks and pack the table altogether.

猥︴琐丶欲为 2024-08-08 23:30:14

这取决于产品。 可以实现一个产品(在幕后)将所有 UPDATE 转换为(事务包装的)DELETE 和 INSERT。 前提是结果与 UPDATE 语义一致。

我并不是说我知道有任何产品可以做到这一点,但它是完全合法的。

It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.

I'm not saying I'm aware of any product that does this, but it's perfectly legal.

执笏见 2024-08-08 23:30:14

每次写入数据库都会产生很多潜在的副作用。

删除:必须删除一行、更新索引、检查外键以及可能的级联删除等。
插入:必须分配一行 - 这可能会代替已删除的行,也可能不会; 必须更新索引、检查外键等。
更新:必须更新一个或多个值; 也许该行的数据不再适合数据库的该块,因此必须分配更多空间,这可能会级联成多个被重写的块,或导致块碎片; 如果值有外键约束,则必须检查它们,等等。

对于很少数量的列或更新整行,Delete+insert 可能会更快,但 FK 约束问题是一个大问题。 当然,也许你现在没有 FK 限制,但这会永远如此吗? 如果您有触发器,那么如果更新操作确实是更新,那么编写处理更新的代码会更容易。

另一个需要考虑的问题是,有时插入和删除与更新持有不同的锁。 当您插入或删除时,数据库可能会锁定整个表,而不是在更新一条记录时锁定该记录。

最后,如果你想更新的话,我建议只更新一条记录。 然后检查数据库的性能统计信息和该表的统计信息,看看是否有性能需要改进。 任何其他事情都为时过早。

我工作的电子商务系统的一个例子:我们通过两步方法将信用卡交易数据存储在数据库中:首先,编写部分交易以表明我们已经开始该流程。 然后,当银行返回授权数据时更新记录。 我们可以删除然后重新插入记录,但我们只是使用更新。 我们的 DBA 告诉我们,表是碎片化的,因为数据库只为每行分配少量空间,而更新会导致块链,因为它添加了大量数据。 然而,我们没有切换到 DELETE+INSERT,而是将数据库调整为始终分配整行,这意味着更新可以毫无问题地使用预先分配的空白空间。 无需更改代码,代码仍然简单易懂。

Every write to the database has lots of potential side effects.

Delete: a row must be removed, indexes updated, foreign keys checked and possibly cascade-deleted, etc.
Insert: a row must be allocated - this might be in place of a deleted row, might not be; indexes must be updated, foreign keys checked, etc.
Update: one or more values must be updated; perhaps the row's data no longer fits into that block of the database so more space must be allocated, which may cascade into multiple blocks being re-written, or lead to fragmented blocks; if the value has foreign key constraints they must be checked, etc.

For a very small number of columns or if the whole row is updated Delete+insert might be faster, but the FK constraint problem is a big one. Sure, maybe you have no FK constraints now, but will that always be true? And if you have a trigger it's easier to write code that handles updates if the update operation is truly an update.

Another issue to think about is that sometimes inserting and deleting hold different locks than updating. The DB might lock the entire table while you are inserting or deleting, as opposed to just locking a single record while you are updating that record.

In the end, I'd suggest just updating a record if you mean to update it. Then check your DB's performance statistics and the statistics for that table to see if there are performance improvements to be made. Anything else is premature.

An example from the ecommerce system I work on: We were storing credit-card transaction data in the database in a two-step approach: first, write a partial transaction to indicate that we've started the process. Then, when the authorization data is returned from the bank update the record. We COULD have deleted then re-inserted the record but instead we just used update. Our DBA told us that the table was fragmented because the DB was only allocating a small amount of space for each row, and the update caused block-chaining since it added a lot of data. However, rather than switch to DELETE+INSERT we just tuned the database to always allocate the whole row, this means the update could use the pre-allocated empty space with no problems. No code change required, and the code remains simple and easy to understand.

执笏见 2024-08-08 23:30:14

在特定情况下,删除+插入会节省您的时间。 我有一个包含 30000 奇数行的表,并且使用数据文件每天更新​​/插入这些记录。 上传过程会生成 95% 的更新语句(因为记录已存在)和 5% 的插入(因为记录不存在)。 或者,将数据文件记录上传到临时表中,删除临时表中记录的目标表,然后从临时表中插入相同的记录,这显示了 50% 的时间增益。

In specific cases, Delete+Insert would save you time. I have a table that has 30000 odd rows and there is a daily update/insert of these records using a data file. The upload process generates 95% of update statements as the records are already there and 5% of inserts for ones that do not exist. Alternatively, uploading the data file records into a temp table, deletion of the destination table for records in the temp table followed by insertion of the same from the temp table has shown 50% gain in time.

我不咬妳我踢妳 2024-08-08 23:30:14

我的场景是大量单独更新与批量删除/批量插入。我有多个客户的历史销售数据。 在获得经过验证的数据(下个月 15 日)之前,我将每天调整销售数据以反映从其他来源获得的当前状态(这意味着每天最多覆盖每个客户 45 天的销售数据)。 可能没有变化,也可能有一些变化。 我可以编写逻辑来查找差异并更新/删除/插入受影响的记录,或者我可以删除昨天的数字并插入今天的数字。 显然后一种方法更简单,但如果它会因流失而降低表的性能,那么值得编写额外的逻辑来识别少数(或没有)已更改的记录,并且仅更新/删除/插入这些记录。

因此,我要替换记录,旧记录和新记录之间可能存在某种关系,但一般来说,我不一定希望将旧数据与新数据进行匹配(这将是一个额外的步骤,并且将导致删除、更新和插入)。 此外,更改的字段相对较少(最多 20 个字段中的 7 个或 15 个字段中的 2 个)。

可能一起检索的记录将同时插入,因此物理上应该彼此靠近。 这是否弥补了由于该方法的流失而造成的性能损失,并且它是否比所有这些单独记录更新的撤消/重做成本更好?

Large number of individual updates vs bulk delete/bulk insert is my scenario.I have historical sales data for multiple customers going back years. Until I get verified data (15th of the following month), I will adjust sales numbers every day to reflect the current state as obtained from another source (this means overwriting at most 45 days of sales each day for each customer). There may be no changes, or there may be a few changes. I can either code the logic to find the differences and update/delete/insert the affected records or I can just blow away yesterday's numbers and insert today's numbers. Clearly this latter approach is simpler, but if it's going to kill the table's performance due to churn, then it's worth it to write the extra logic to identify the handful (or none) of records that changed and only update/delete/insert those.

So, I'm replacing the records, and there may be some relationship between the old records and the new records, but in general I don't necessarily want to match the old data with the new data (that would be an extra step and would result in deletions, updates, and inserts). Also, relatively few fields would be changed (at most 7 out of 20 or 2 out of 15).

The records that are likely to be retrieved together will have been inserted at the same time and therefore should be physically close to each other. Does that make up for the performance loss due to churn with that approach, and is it better than the undo/redo cost of all those individual record updates?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文