Java中使用准备好的语句批量插入批量更新

发布于 2024-11-27 00:12:02 字数 827 浏览 7 评论 0原文

我正在尝试用大约 50,000 行 10 列填充 Java 中的结果集然后使用 PreparedStatement 的 batchExecute 方法将它们插入到另一个表中。

为了使该过程更快，我做了一些研究，发现在将数据读入 resultSet 时，fetchSize 起着重要作用。

fetchSize 太低可能会导致服务器访问次数过多，而 fetchSize 太高可能会阻塞网络资源，因此我进行了一些实验并设置了适合我的基础设施的最佳大小。

我正在读取此结果集并创建插入语句以插入到不同数据库的另一个表中。

像这样的东西（只是一个示例，不是真正的代码）：

for (i=0 ; i<=50000 ; i++) {
    statement.setString(1, "[email protected]");
    statement.setLong(2, 1);
    statement.addBatch();
}
statement.executeBatch();

executeBatch 方法会尝试一次发送所有数据吗？
有没有办法定义批量大小？
有没有更好的方法来加快批量插入的过程？

批量更新（50,000 行 10 列）时，使用可更新的 ResultSet 或PreparedStaement 进行批量执行更好吗？

原文

I am trying to fill a resultSet in Java with about 50,000 rows of 10 columns
and then inserting them into another table using the batchExecute method of PreparedStatement.

To make the process faster I did some research and found that while reading data into resultSet the fetchSize plays an important role.

Having a very low fetchSize can result into too many trips to the server and a very high fetchSize can block the network resources, so I experimented a little bit and set up an optimum size that suits my infrastructure.

I am reading this resultSet and creating insert statements to insert into another table of a different database.

Something like this (just a sample, not real code):

for (i=0 ; i<=50000 ; i++) {
    statement.setString(1, "[email protected]");
    statement.setLong(2, 1);
    statement.addBatch();
}
statement.executeBatch();

Will the executeBatch method try to send all the data at once ?
Is there a way to define the batch size?
Is there any better way to speed up the process of bulk insertion?

While updating in bulk (50,000 rows 10 cols), is it better to use a updatable ResultSet or PreparedStaement with batch execution?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云巢 2024-12-04 00:12:02

我将依次回答您的问题。

executeBatch 方法会尝试一次发送所有数据吗？

这可能因每个 JDBC 驱动程序而异，但我研究过的少数驱动程序会迭代每个批处理条目并将参数与准备好的语句一起发送每次处理到数据库执行。也就是说，在上面的示例中，将使用 50,000 对参数执行 50,000 次准备好的语句，但是这 50,000 个步骤可以在较低级别的“内部循环”中完成，这就是节省时间的地方。相当延伸的类比，这就像从“用户模式”进入“内核模式”并在那里运行整个执行循环。您可以节省每个批次条目进出较低级别模式的成本。

有没有办法定义批处理大小

您已在此处隐式定义了它，方法是在通过 Statement#executeBatch() 执行批处理之前推送 50,000 个参数集。批量大小为 1 同样有效。

有没有更好的方法来加速批量插入的过程？

考虑在批量插入之前显式打开一个事务，然后再提交。不要让数据库或 JDBC 驱动程序在批处理中的每个插入步骤周围强加事务边界。您可以使用 Connection#setAutoCommit(boolean) 方法。首先将连接退出自动提交模式，然后填充批次、启动事务、执行批次，然后通过 Connection#commit()。

此建议假设您的插入不会与并发写入者竞争，并假设这些事务边界将为您提供从源表中读取的足够一致的值以用于插入。如果情况并非如此，请优先考虑正确性而不是速度。

在批量执行中使用可更新的 ResultSet 或 PreparedStatement 是否更好？

没有什么比使用您选择的 JDBC 驱动程序进行测试更好的了，但我希望后者—PreparedStatement 和 Statement#executeBatch() 在这里胜出。语句句柄可能有一个关联的“批处理参数”列表或数组，其中每个条目都是在调用 Statement#executeBatch() 和 Statement#addBatch()< 之间提供的参数集。 /code> （或 Statement#clearBatch()）。该列表将随着每次调用 addBatch() 而增长，并且在调用 executeBatch() 之前不会刷新。因此，Statement 实例实际上充当了参数缓冲区；您为了方便而牺牲内存（使用 Statement 实例代替您自己的外部参数集缓冲区）。

同样，只要我们不讨论特定的 JDBC 驱动程序，您就应该将这些答案视为一般性和推测性的。每个驱动程序的复杂程度各不相同，并且每个驱动程序所追求的优化也各不相同。

I'll address your questions in turn.

Will the executeBatch method tries to send all the data at once?

This can vary with each JDBC driver, but the few I've studied will iterate over each batch entry and send the arguments together with the prepared statement handle each time to the database for execution. That is, in your example above, there would 50,000 executions of the prepared statement with 50,000 pairs of arguments, but these 50,000 steps can be done in a lower-level "inner loop," which is where the time savings come in. As a rather stretched analogy, it's like dropping out of "user mode" down into "kernel mode" and running the entire execution loop there. You save the cost of diving in and out of that lower-level mode for each batch entry.

Is there a way to define the batch size

You've defined it implicitly here by pushing 50,000 argument sets in before executing the batch via Statement#executeBatch(). A batch size of one is just as valid.

Is there any better way to speed up the process of bulk insertion?

Consider opening a transaction explicitly before the batch insertion, and commit it afterward. Don't let either the database or the JDBC driver impose a transaction boundary around each insertion step in the batch. You can control the JDBC layer with the Connection#setAutoCommit(boolean) method. Take the connection out of auto-commit mode first, then populate your batches, start a transaction, execute the batch, then commit the transaction via Connection#commit().

This advice assumes that your insertions won't be contending with concurrent writers, and assumes that these transaction boundaries will give you sufficiently consistent values read from your source tables for use in the insertions. If that's not the case, favor correctness over speed.

Is it better to use a updatable ResultSet or PreparedStatement with batch execution?

Nothing beats testing with your JDBC driver of choice, but I expect the latter—PreparedStatement and Statement#executeBatch() will win out here. The statement handle may have an associated list or array of "batch arguments," with each entry being the argument set provided in between calls to Statement#executeBatch() and Statement#addBatch() (or Statement#clearBatch()). The list will grow with each call to addBatch(), and not be flushed until you call executeBatch(). Hence, the Statement instance is really acting as an argument buffer; you're trading memory for convenience (using the Statement instance in lieu of your own external argument set buffer).

Again, you should consider these answers general and speculative so long as we're not discussing a specific JDBC driver. Each driver varies in sophistication, and each will vary in which optimizations it pursues.

回复收藏 0 原文

倾｀听者〃 2024-12-04 00:12:02

该批处理将“一次性”完成 - 这就是您要求它执行的操作。

在一次调用中尝试 50,000 似乎有点大。我会将其分成 1,000 个较小的块，如下所示：

final int BATCH_SIZE = 1000;
for (int i = 0; i < DATA_SIZE; i++) {
  statement.setString(1, "[email protected]");
  statement.setLong(2, 1);
  statement.addBatch();
  if (i % BATCH_SIZE == BATCH_SIZE - 1)
    statement.executeBatch();
}
if (DATA_SIZE % BATCH_SIZE != 0)
  statement.executeBatch();

50,000 行应该不会超过几秒钟。

The batch will be done in "all at once" - that's what you've asked it to do.

50,000 seems a bit large to be attempting in one call. I would break it up into smaller chunks of 1,000, like this:

final int BATCH_SIZE = 1000;
for (int i = 0; i < DATA_SIZE; i++) {
  statement.setString(1, "[email protected]");
  statement.setLong(2, 1);
  statement.addBatch();
  if (i % BATCH_SIZE == BATCH_SIZE - 1)
    statement.executeBatch();
}
if (DATA_SIZE % BATCH_SIZE != 0)
  statement.executeBatch();

50,000 rows shouldn't take more than a few seconds.

回复收藏 0 原文