想象一下您的数据库中有大约大量数据。 〜100Mb。我们需要以某种方式处理所有数据(更新或导出到其他地方)。如何出色地完成这项任务?如何设置交易传播?
示例 1#(性能较差):
@Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
这里有什么可以改进的地方?
在我看来:
- 我会设置休眠批量大小(请参阅批量处理的休眠文档)。
- 我将 ServiceBean 分成两个具有不同事务设置的 Spring bean。方法 processAllData() 应该用完事务,因为它运行大量数据,并且潜在的回滚不会“快速”(我猜)。方法 process(Entity 实体) 将在事务中运行 - 在一个数据实体的情况下回滚没什么大不了的。
你同意 ?有什么建议吗?
Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?
Example 1# (with bad performance) :
@Singleton
public ServiceBean {
procesAllData(){
List<Entity> entityList = dao.findAll();
for(...){
process(entity);
}
}
private void process(Entity ent){
//data processing
//saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
}
}
What could be improved here ?
In my opinion :
- I would set hibernate batch size (see hibernate documentation for batch processing).
- I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.
Do you agree ? Any tips ?
发布评论
评论(2)
这里有 2 个基本策略:
hibernate.jdbc.batch_size
)。如果您要混合和匹配对象 C/U/D 操作,请确保已将 Hibernate 配置为排序插入和更新,否则它将不会批处理(hibernate.order_inserts
和hibernate.order_updates )。在进行批处理时,必须确保
clear()
您的Session
,以免在大型事务期间遇到内存问题。doWork
处理该 SQL。此策略允许您使用 Hibernate 事务协调器,同时能够利用本机 SQL 的全部功能。您通常会发现,无论获得 OO 代码的速度有多快,使用连接 SQL 语句等 DB 技巧都会更快。
Here are 2 basic strategies:
hibernate.jdbc.batch_size
). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts
andhibernate.order_updates
). And when doing batching, it is imperative to make sure youclear()
yourSession
so that you don't run into memory issues during a large transaction.Work
interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL viadoWork
. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.
这里需要记住以下几点:
使用 findAll 方法将所有实体加载到内存中可能会导致 OOM 异常。
您需要避免将所有实体附加到会话 - 因为每次 hibernate 执行刷新时都需要对每个附加实体进行脏检查。这将很快使您的处理停止。
Hibernate 提供了一个无状态会话,您可以将其与可滚动结果集一起使用来逐个滚动实体 - 文档 此处。然后,您可以使用此会话来更新实体,而无需将其附加到会话。
另一种选择是使用有状态会话,但定期清除会话,如下所示 此处。
我希望这是有用的建议。
There are a few things to keep in mind here:
Loading all entites into memory with a findAll method can lead to OOM exceptions.
You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.
Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.
The other alternative is to use a stateful session but clear the session at regular intervals as shown here.
I hope this is useful advice.