insert-select 语句很大吗?
当在事务中使用 select
语句进行多次插入时,数据库如何跟踪事务期间的更改?如果事务保持打开时间过长,资源(例如内存或硬盘空间)是否会出现问题?
When multiple inserts are used with a select
statement in a transaction, how does the database keep track of the changes during the transaction? Can there be problems with resources (such as memory or hard disk space) if a transaction is held open too long?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
简而言之,这取决于选择的大小。从技术上讲,选择是事务的一部分,但大多数选择不必“回滚”,因此数据库更改的实际日志不会包含选择本身。它将包含作为插入语句的 select 语句的每个结果的一个新行。如果该 select 语句有 10k 行,则提交将相当大,但不会比您在显式事务中编写 10k 条单独的 insert 语句更大。
The short answer is, it depends on the size of the select. The select is part of the transaction, technically, but most selects don't have to be "rolled back", so the actual log of DB changes wouldn't include the select by itself. What it WILL include is a new row for every result from the select statement as an insert statement. If that select statement is 10k rows, the commit will be rather large, but no more so than if you'd written 10k individual insert statements within an explicit transaction.
具体如何工作取决于数据库。例如,在 Oracle 中,它将需要 UNDO 空间(最终,如果你用完,你的事务将被中止,或者你的 DBA 会对你大喊大叫)。在 PostgreSQL 中,它将阻止旧行版本的清理。在MySQL/InnoDB中,它会使用回滚空间,并可能导致锁定超时。
数据库必须使用空间来完成以下几件事:
一般来说,您希望尽快提交事务。因此,例如,您不想在空闲连接上保持打开状态。如何最好地进行批量插入取决于数据库(通常,一个事务上的许多插入比每次插入一个事务要好)。当然,交易的主要目的是数据完整性。
Exactly how this works depends on the database. For example, in Oracle, it will require UNDO space (and eventually, if you run out, your transaction will be aborted, or your DBA will yell at you). In PostgreSQL, it'll prevent the vacuuming of old row versions. In MySQL/InnoDB, it'll use rollback space, and possibly cause lock timeouts.
There are several things the database must use space for:
In general, you want your transactions to commit as soon as possible. So, e.g., you don't want to hold one open on an idle connection. How to best batch inserts depends on the database (often, many inserts on one transaction is better than one transaction per insert). And of course, the primary purpose of transactions is data integrity.
大额交易可能会遇到很多问题。首先,在大多数数据库中,您不希望逐行运行,因为对于一百万条记录,这将需要几个小时。但是,在一个复杂的语句中插入一百万条记录可能会导致所涉及的表被锁定,并损害其他人的性能。如果终止事务,回滚也可能需要一段时间。通常最好的选择是批量循环。我通常一次测试 50,000 个,然后根据需要的时间来提高或降低设置。我有一些数据库,在一次基于集合的操作中执行的操作不超过 1000 个。如果可能,大型插入或更新应安排在数据库运行的非高峰时段。如果非常大(并且是一次性的 - 通常是大型数据迁移),您甚至可能需要关闭数据库进行维护,将其置于单用户模式并删除索引,执行插入和重新索引。
You can have many problems with the large transaction. First, in most databases you do not want to run row-by-row because for a million records that will take hours. But to insert a million records in one complex statement can cause locking on the tables involved and harm performance for everyone else. And a rollback if you kill the transaction can take a good while too. Usually the best alternative is to loop in batches. I usually test 50,000 at a time and raise or lower the set depending on how long that takes. I've had some databases where I do no more that 1000 in one set-based operation. If possible large inserts or updates should be scheduled for the off-peak hours that the database operates. If really large (and one-time - usually a large data migration) you might even want to close the database for maintenance, put it in single user mode and drop the indexes, do the insert and reindex.