Spring,Hibernate——批量处理大量数据,性能良好

发布于 2024-12-25 15:02:22 字数 702 浏览 1 评论 0 原文

想象一下您的数据库中有大约大量数据。 〜100Mb。我们需要以某种方式处理所有数据(更新或导出到其他地方)。如何出色地完成这项任务?如何设置交易传播?

示例 1#(性能较差):

@Singleton
public ServiceBean {

 procesAllData(){

   List<Entity> entityList = dao.findAll();

   for(...){
     process(entity);
   }

 }

 private void process(Entity ent){
  //data processing    
  //saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
 }

}

这里有什么可以改进的地方?

在我看来:

  1. 我会设置休眠批量大小(请参阅批量处理的休眠文档)。
  2. 我将 ServiceBean 分成两个具有不同事务设置的 Spring bean。方法 processAllData() 应该用完事务,因为它运行大量数据,并且潜在的回滚不会“快速”(我猜)。方法 process(Entity 实体) 将在事务中运行 - 在一个数据实体的情况下回滚没什么大不了的。

你同意 ?有什么建议吗?

Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?

Example 1# (with bad performance) :

@Singleton
public ServiceBean {

 procesAllData(){

   List<Entity> entityList = dao.findAll();

   for(...){
     process(entity);
   }

 }

 private void process(Entity ent){
  //data processing    
  //saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
 }

}

What could be improved here ?

In my opinion :

  1. I would set hibernate batch size (see hibernate documentation for batch processing).
  2. I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.

Do you agree ? Any tips ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

十年九夏 2025-01-01 15:02:22

这里有 2 个基本策略:

  1. JDBC 批处理:设置 JDBC 批处理大小,通常在 20 到 50 之间 (hibernate.jdbc.batch_size)。如果您要混合和匹配对象 C/U/D 操作,请确保已将 Hibernate 配置为排序插入和更新,否则它将不会批处理(hibernate.order_insertshibernate.order_updates )。在进行批处理时,必须确保 clear() 您的 Session,以免在大型事务期间遇到内存问题。
  2. 连接 SQL 语句:实现 Hibernate Work 接口并使用您的实现类(或匿名内部类)针对 JDBC 连接运行本机 SQL。通过分号连接手动编码的 SQL(适用于大多数数据库),然后通过 doWork 处理该 SQL。此策略允许您使用 Hibernate 事务协调器,同时能够利用本机 SQL 的全部功能。

您通常会发现,无论获得 OO 代码的速度有多快,使用连接 SQL 语句等 DB 技巧都会更快。

Here are 2 basic strategies:

  1. JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
  2. Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.

You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.

-小熊_ 2025-01-01 15:02:22

这里需要记住以下几点:

  1. 使用 findAll 方法将所有实体加载到内存中可能会导致 OOM 异常。

  2. 您需要避免将所有实体附加到会话 - 因为每次 hibernate 执行刷新时都需要对每个附加实体进行脏检查。这将很快使您的处理停止。

Hibernate 提供了一个无状态会话,您可以将其与可滚动结果集一起使用来逐个滚动实体 - 文档 此处。然后,您可以使用此会话来更新实体,而无需将其附加到会话。

另一种选择是使用有状态会话,但定期清除会话,如下所示 此处

我希望这是有用的建议。

There are a few things to keep in mind here:

  1. Loading all entites into memory with a findAll method can lead to OOM exceptions.

  2. You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.

Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.

The other alternative is to use a stateful session but clear the session at regular intervals as shown here.

I hope this is useful advice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文