带或不带索引的批量插入
在我读到的评论中
顺便说一句,有时删除表的索引并在批量插入操作后重新创建它们会更快。
这是真的? 在什么情况下?
In a comment I read
Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.
Is this true? Under which circumstances?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
与乔尔一样,我会重复这样的说法:是的,这可能是真的。 我发现识别他提到的场景的关键在于数据的分布以及特定表上索引的大小。
在我曾经支持的一个应用程序中,定期批量导入 180 万行,表上有 4 个索引,1 个索引有 11 列,表中总共有 90 列。 索引导入花了 20 多个小时才完成。 删除索引、插入和重新创建索引仅花费了 1 小时 25 分钟。
因此它可以提供很大的帮助,但很大程度上取决于您的数据、索引和数据值的分布。
As with Joel I will echo the statement that yes it can be true. I've found that the key to identifying the scenario that he mentioned is all in the distribution of data, and the size of the index(es) that you have on the specific table.
In an application that I used to support that did a regular bulk import of 1.8 million rows, with 4 indexes on the table, 1 with 11 columns, and a total of 90 columns in the table. The import with indexes took over 20 hours to complete. Dropping the indexes, inserting, and re-creating the indexes only took 1 hour and 25 minutes.
So it can be a big help, but a lot of it comes down to your data, the indexes, and the distribution of data values.
是的,它是真实的。 当插入期间表上存在索引时,服务器将需要不断地对表进行重新排序/分页以使索引保持最新。 如果您删除索引,它只需添加行而不必担心,然后在您重新创建索引时立即构建索引。
当然,例外情况是导入数据已经按索引顺序排列时。 事实上,我应该指出,我现在正在开展一个项目,观察到了这种相反的效果。 我们希望减少大型导入(从大型机系统每晚转储)的运行时间。 我们尝试删除索引、导入数据并重新创建它们。 实际上,它显着增加完成导入的时间。 但是,这并不典型。 它只是表明您应该始终首先针对您的特定系统进行测试。
Yes, it is true. When there are indexes on the table during an insert, the server will need to be constantly re-ordering/paging the table to keep the indexes up to date. If you drop the indexes, it can just add the rows without worrying about that, and then build the indexes all at once when you re-create them.
The exception, of course, is when the import data is already in index order. In fact, I should note that I'm working on a project right now where this opposite effect was observed. We wanted to reduce the run-time of a large import (nightly dump from a mainframe system). We tried removing the indexes, importing the data, and re-creating them. It actually significantly increased the time for the import to complete. But, this is not typical. It just goes to show that you should always test first for your particular system.
删除和重新创建索引时应该考虑的一件事是,它只能在数据库使用量较低期间运行的自动化进程上完成。 当索引被删除时,它不能用于其他用户可能同时运行的其他查询。 如果您在生产时间内执行此操作,您的用户可能会开始抱怨超时。
One thing you should consider when dropping and recreating indexes is that it should only be done on automated processes that run during the low volumne periods of database use. While the index is dropped it can't be used for other queries that other users might be riunning at the same time. If you do this during production hours ,your users will probably start complaining of timeouts.