insert-select 语句很大吗?

发布于 2024-10-19 00:56:02 字数 94 浏览 1 评论 0原文

当在事务中使用 select 语句进行多次插入时,数据库如何跟踪事务期间的更改?如果事务保持打开时间过长,资源(例如内存或硬盘空间)是否会出现问题?

When multiple inserts are used with a select statement in a transaction, how does the database keep track of the changes during the transaction? Can there be problems with resources (such as memory or hard disk space) if a transaction is held open too long?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

屌丝范 2024-10-26 00:56:02

简而言之,这取决于选择的大小。从技术上讲,选择是事务的一部分,但大多数选择不必“回滚”,因此数据库更改的实际日志不会包含选择本身。它将包含作为插入语句的 select 语句的每个结果的一个新行。如果该 select 语句有 10k 行,则提交将相当大,但不会比您在显式事务中编写 10k 条单独的 insert 语句更大。

The short answer is, it depends on the size of the select. The select is part of the transaction, technically, but most selects don't have to be "rolled back", so the actual log of DB changes wouldn't include the select by itself. What it WILL include is a new row for every result from the select statement as an insert statement. If that select statement is 10k rows, the commit will be rather large, but no more so than if you'd written 10k individual insert statements within an explicit transaction.

若相惜即相离 2024-10-26 00:56:02

具体如何工作取决于数据库。例如,在 Oracle 中,它将需要 UNDO 空间(最终,如果你用完,你的事务将被中止,或者你的 DBA 会对你大喊大叫)。在 PostgreSQL 中,它将阻止旧行版本的清理。在MySQL/InnoDB中,它会使用回滚空间,并可能导致锁定超时。

数据库必须使用空间来完成以下几件事:

  1. 存储事务已更改的行(旧值、新值或两者),以便可以执行回滚
  2. 跟踪哪些数据对事务可见,以便保持一致维护视图(在除未提交读之外的事务隔离级别中)。您请求的隔离程度越高,这种开销通常会越大。
  3. 跟踪哪些数据对其他事务可见(除非整个数据库以未提交读的方式运行)
  4. 跟踪事务更改了哪些对象,因此遵循隔离规则,尤其是在可序列化隔离中。 (可能空间不大,但有足够的锁)。

一般来说,您希望尽快提交事务。因此,例如,您不想在空闲连接上保持打开状态。如何最好地进行批量插入取决于数据库(通常,一个事务上的许多插入比每次插入一个事务要好)。当然,交易的主要目的是数据完整性。

Exactly how this works depends on the database. For example, in Oracle, it will require UNDO space (and eventually, if you run out, your transaction will be aborted, or your DBA will yell at you). In PostgreSQL, it'll prevent the vacuuming of old row versions. In MySQL/InnoDB, it'll use rollback space, and possibly cause lock timeouts.

There are several things the database must use space for:

  1. Storing which rows your transaction has changed (the old values, the new values, or both) so that rollback can be performed
  2. Keeping track of which data is visible to your transaction so that a consistent view is maintained (in transaction isolation levels other than read uncommitted). This overhead will often be greater the more isolation you request.
  3. Keeping track of which data is visible to other transactions (unless the whole database is running in read uncommitted)
  4. Keeping track of which objects which transactions have changed, so isolation rules are followed, especially in serializable isolation. (Probably not much space, but plenty of locks).

In general, you want your transactions to commit as soon as possible. So, e.g., you don't want to hold one open on an idle connection. How to best batch inserts depends on the database (often, many inserts on one transaction is better than one transaction per insert). And of course, the primary purpose of transactions is data integrity.

萧瑟寒风 2024-10-26 00:56:02

大额交易可能会遇到很多问题。首先,在大多数数据库中,您不希望逐行运行,因为对于一百万条记录,这将需要几个小时。但是,在一个复杂的语句中插入一百万条记录可能会导致所涉及的表被锁定,并损害其他人的性能。如果终止事务,回滚也可能需要一段时间。通常最好的选择是批量循环。我通常一次测试 50,000 个,然后根据需要的时间来提高或降低设置。我有一些数据库,在一次基于集合的操作中执行的操作不超过 1000 个。如果可能,大型插入或更新应安排在数据库运行的非高峰时段。如果非常大(并且是一次性的 - 通常是大型数据迁移),您甚至可能需要关闭数据库进行维护,将其置于单用户模式并删除索引,执行插入和重新索引。

You can have many problems with the large transaction. First, in most databases you do not want to run row-by-row because for a million records that will take hours. But to insert a million records in one complex statement can cause locking on the tables involved and harm performance for everyone else. And a rollback if you kill the transaction can take a good while too. Usually the best alternative is to loop in batches. I usually test 50,000 at a time and raise or lower the set depending on how long that takes. I've had some databases where I do no more that 1000 in one set-based operation. If possible large inserts or updates should be scheduled for the off-peak hours that the database operates. If really large (and one-time - usually a large data migration) you might even want to close the database for maintenance, put it in single user mode and drop the indexes, do the insert and reindex.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文