Java中使用准备好的语句批量插入批量更新
我正在尝试用大约 50,000 行 10 列填充 Java 中的结果集 然后使用 PreparedStatement
的 batchExecute
方法将它们插入到另一个表中。
为了使该过程更快,我做了一些研究,发现在将数据读入 resultSet 时,fetchSize 起着重要作用。
fetchSize 太低可能会导致服务器访问次数过多,而 fetchSize 太高可能会阻塞网络资源,因此我进行了一些实验并设置了适合我的基础设施的最佳大小。
我正在读取此结果集并创建插入语句以插入到不同数据库的另一个表中。
像这样的东西(只是一个示例,不是真正的代码):
for (i=0 ; i<=50000 ; i++) {
statement.setString(1, "[email protected]");
statement.setLong(2, 1);
statement.addBatch();
}
statement.executeBatch();
- executeBatch 方法会尝试一次发送所有数据吗?
- 有没有办法定义批量大小?
- 有没有更好的方法来加快批量插入的过程?
批量更新(50,000 行 10 列)时,使用可更新的 ResultSet
或PreparedStaement 进行批量执行更好吗?
I am trying to fill a resultSet in Java with about 50,000 rows of 10 columns
and then inserting them into another table using the batchExecute
method of PreparedStatement
.
To make the process faster I did some research and found that while reading data into resultSet the fetchSize plays an important role.
Having a very low fetchSize can result into too many trips to the server and a very high fetchSize can block the network resources, so I experimented a little bit and set up an optimum size that suits my infrastructure.
I am reading this resultSet and creating insert statements to insert into another table of a different database.
Something like this (just a sample, not real code):
for (i=0 ; i<=50000 ; i++) {
statement.setString(1, "[email protected]");
statement.setLong(2, 1);
statement.addBatch();
}
statement.executeBatch();
- Will the executeBatch method try to send all the data at once ?
- Is there a way to define the batch size?
- Is there any better way to speed up the process of bulk insertion?
While updating in bulk (50,000 rows 10 cols), is it better to use a updatable ResultSet
or PreparedStaement with batch execution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我将依次回答您的问题。
这可能因每个 JDBC 驱动程序而异,但我研究过的少数驱动程序会迭代每个批处理条目并将参数与准备好的语句一起发送每次处理到数据库执行。也就是说,在上面的示例中,将使用 50,000 对参数执行 50,000 次准备好的语句,但是这 50,000 个步骤可以在较低级别的“内部循环”中完成,这就是节省时间的地方。相当延伸的类比,这就像从“用户模式”进入“内核模式”并在那里运行整个执行循环。您可以节省每个批次条目进出较低级别模式的成本。
您已在此处隐式定义了它,方法是在通过
Statement#executeBatch()
执行批处理之前推送 50,000 个参数集。批量大小为 1 同样有效。考虑在批量插入之前显式打开一个事务,然后再提交。不要让数据库或 JDBC 驱动程序在批处理中的每个插入步骤周围强加事务边界。您可以使用
Connection#setAutoCommit(boolean)
方法。首先将连接退出自动提交模式,然后填充批次、启动事务、执行批次,然后通过Connection#commit()
。此建议假设您的插入不会与并发写入者竞争,并假设这些事务边界将为您提供从源表中读取的足够一致的值以用于插入。如果情况并非如此,请优先考虑正确性而不是速度。
ResultSet
或PreparedStatement
是否更好?没有什么比使用您选择的 JDBC 驱动程序进行测试更好的了,但我希望后者—
PreparedStatement
和Statement#executeBatch()
在这里胜出。语句句柄可能有一个关联的“批处理参数”列表或数组,其中每个条目都是在调用Statement#executeBatch()
和Statement#addBatch()< 之间提供的参数集。 /code> (或
Statement#clearBatch()
)。该列表将随着每次调用addBatch()
而增长,并且在调用executeBatch()
之前不会刷新。因此,Statement 实例实际上充当了参数缓冲区;您为了方便而牺牲内存(使用Statement
实例代替您自己的外部参数集缓冲区)。同样,只要我们不讨论特定的 JDBC 驱动程序,您就应该将这些答案视为一般性和推测性的。每个驱动程序的复杂程度各不相同,并且每个驱动程序所追求的优化也各不相同。
I'll address your questions in turn.
This can vary with each JDBC driver, but the few I've studied will iterate over each batch entry and send the arguments together with the prepared statement handle each time to the database for execution. That is, in your example above, there would 50,000 executions of the prepared statement with 50,000 pairs of arguments, but these 50,000 steps can be done in a lower-level "inner loop," which is where the time savings come in. As a rather stretched analogy, it's like dropping out of "user mode" down into "kernel mode" and running the entire execution loop there. You save the cost of diving in and out of that lower-level mode for each batch entry.
You've defined it implicitly here by pushing 50,000 argument sets in before executing the batch via
Statement#executeBatch()
. A batch size of one is just as valid.Consider opening a transaction explicitly before the batch insertion, and commit it afterward. Don't let either the database or the JDBC driver impose a transaction boundary around each insertion step in the batch. You can control the JDBC layer with the
Connection#setAutoCommit(boolean)
method. Take the connection out of auto-commit mode first, then populate your batches, start a transaction, execute the batch, then commit the transaction viaConnection#commit()
.This advice assumes that your insertions won't be contending with concurrent writers, and assumes that these transaction boundaries will give you sufficiently consistent values read from your source tables for use in the insertions. If that's not the case, favor correctness over speed.
ResultSet
orPreparedStatement
with batch execution?Nothing beats testing with your JDBC driver of choice, but I expect the latter—
PreparedStatement
andStatement#executeBatch()
will win out here. The statement handle may have an associated list or array of "batch arguments," with each entry being the argument set provided in between calls toStatement#executeBatch()
andStatement#addBatch()
(orStatement#clearBatch()
). The list will grow with each call toaddBatch()
, and not be flushed until you callexecuteBatch()
. Hence, theStatement
instance is really acting as an argument buffer; you're trading memory for convenience (using theStatement
instance in lieu of your own external argument set buffer).Again, you should consider these answers general and speculative so long as we're not discussing a specific JDBC driver. Each driver varies in sophistication, and each will vary in which optimizations it pursues.
该批处理将“一次性”完成 - 这就是您要求它执行的操作。
在一次调用中尝试 50,000 似乎有点大。我会将其分成 1,000 个较小的块,如下所示:
50,000 行应该不会超过几秒钟。
The batch will be done in "all at once" - that's what you've asked it to do.
50,000 seems a bit large to be attempting in one call. I would break it up into smaller chunks of 1,000, like this:
50,000 rows shouldn't take more than a few seconds.
如果只是将数据库中一个/多个表的数据插入到该表中并且没有干预(更改结果集),则调用statement.executeUpdate(SQL)执行 INSERT-SELECT 语句,这会更快,因为有没有开销。没有数据流出数据库,整个操作都在数据库上而不是在应用程序中。
If it's just data from one/more tables in the DB to be inserted into this table and no intervention (alterations to the resultset), then call
statement.executeUpdate(SQL)
to perform INSERT-SELECT statment, this is quicker since there is no overhead. No data going outside of the DB and the entire operation is on the DB not in the application.批量未记录的更新不会为您提供您想要的性能改进。请参阅此
Bulk unlogged update will not give you the improved performance you want the way you are going about it. See this