在 SQL 中向表中插入大量记录的最快方法
问题是,我们有大量记录(超过一百万条)需要从 Java 应用程序插入到单个表中。这些记录是由 Java 代码创建的,不是从另一个表移动的,因此 INSERT/SELECT 没有帮助。
目前,我的瓶颈是 INSERT 语句。我正在使用PreparedStatement来加速该过程,但在普通服务器上每秒无法获得超过50条记录。该表并不复杂,也没有定义索引。
这个过程需要太长的时间,时间长了就会出问题。
我该怎么做才能获得最大速度(每秒插入)?
数据库:MS SQL 2008。应用程序:基于Java,使用Microsoft JDBC 驱动程序。
The problem is, we have a huge number of records (more than a million) to be inserted into a single table from a Java application. The records are created by the Java code, it's not a move from another table, so INSERT/SELECT won't help.
Currently, my bottleneck is the INSERT statements. I'm using PreparedStatement to speed-up the process, but I can't get more than 50 recods per second on a normal server. The table is not complicated at all, and there are no indexes defined on it.
The process takes too long, and the time it takes will make problems.
What can I do to get the maximum speed (INSERT per second) possible?
Database: MS SQL 2008. Application: Java-based, using Microsoft JDBC driver.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
对插入件进行批处理。也就是说,一次仅发送 1000 行,而不是一次发送一行,因此可以大大减少往返/服务器调用
对 JDBC 驱动程序执行 MSDN 上的批量操作。这是最简单的方法,无需重新设计即可使用真正的批量方法。
每个插入都必须被解析、编译和执行。一批意味着更少的解析/编译,因为 1000 个(例如)插入将一次性编译
有更好的方法,但如果您仅限于生成的 INSERT,则这有效
Batch the inserts. That is, only send 1000 rows at a time, rather then one row at a time, so you hugely reduce round trips/server calls
Performing Batch Operations on MSDN for the JDBC driver. This is the easiest method without reengineering to use genuine bulk methods.
Each insert must be parsed and compiled and executed. A batch will mean a lot less parsing/compiling because a 1000 (for example) inserts will be compiled in one go
There are better ways, but this works if you are limited to generated INSERTs
使用批量插入 - 它专为您所要求的内容而设计< em>显着提高了插入速度。
另外,(以防万一您确实没有索引)您可能还需要考虑添加索引 - 某些索引(大多数是主键上的索引)可能会提高插入的性能。
您插入记录的实际速率将取决于确切的数据、表结构以及 SQL Server 本身的硬件/配置,因此我无法真正给您任何数字。
Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.
Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.
The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.
您是否研究过批量操作批量操作?
Have you looked into bulk operations bulk operations?
您是否考虑过使用批量更新< /a>?
Have you considered to use batch updates?
表上是否有任何完整性约束或触发器?
如果是这样,在插入之前将其删除会有所帮助,但您必须确保您能够承受后果。
Is there any integrity constraint or trigger on the table ?
If so, droping it before inserts will help, but you have to be sure that you can afford the consequences.
查看 Sql Server 的 bcp 实用程序。
这意味着在您的方法是生成一个分隔文件并使用外部实用程序导入数据。但这是向 Sql Server 数据库插入大量记录的最快方法,并且会将加载时间加快多个数量级。
另外,这是您必须执行的一次性操作还是定期发生的操作?如果这是一次,我建议甚至不要编写此过程的代码,而是使用数据库实用程序的组合执行导出/导入。
Look into Sql Server's bcp utility.
This would mean a big change in your approach in that you'd be generating a delimited file and using an external utility to import the data. But this is the fastest method for inserting a large number of records into a Sql Server db and will speed up your load time by many orders of magnitude.
Also, is this a one-time operation you have to perform or something that will occur on a regular basis? If it's one time I would suggest not even coding this process but performing an export/import with a combination of db utilities.
我建议使用 ETL 引擎。您可以使用 Pentaho。它是免费的。 ETL 引擎针对批量加载数据以及所需的任何形式的转换/验证进行了优化。
I would recommend using an ETL engine for it. You can use Pentaho. It's free. The ETL engines are optimized for doing bulk loading on data and also any forms of transformation/validation that are required.