使用 Spring/Hibernate 进行批量插入
我在我的应用程序中使用了 Spring/Hibernate 的混合(没有什么原创的)。 对于给定的功能,我必须将 CSV 文件的内容导入到 Oracle 数据库的表中。 现在,我只是创建对象,
HibernateTemplate.saveOrUpdate
对每个对象执行操作(我需要检索它们新分配的 Id),
然后使用 Spring 事务 API 在方法末尾发生事务。
一切都工作正常,除了性能之外,这对于 5000 个对象是正确的,但对于 100 000 个对象则不然……
所以我寻找加速这个东西的想法。 我听说过 Hibernate 的批量插入,但找不到任何可靠的参考。 有人能给我一些想法来以更好的性能执行此导入吗?
I'm using in my application a mix Spring/Hibernate (nothing original). For a given feature, I have to import the content of a CSV file into a table of my Oracle DB.
For now, I juste create the objects, I do
HibernateTemplate.saveOrUpdate
on each of them (I need to retrieve their newly allocated Id)
Then the transaction occurs at the end of the method, using the Spring transaction API.
Everything works fine, except performance, which is correct for some 5000's objects, but not for 100 000...
So I look for ideas to accelerate this stuff. I've heard of bulk inserts with Hibernate, but could not find any solid reference. Can anybody give me some ideas to perform this import with greater performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可能会尝试的简单方法是刷新并清除会话(例如每 100 个对象)...
因此
每 100 或 1000 个插入执行一次。
这将刷新并清除休眠会话并阻止其变得太大(可能是您的 100 000 个对象花费如此长时间的原因)。
此外,如果您使用身份标识符生成器,休眠会默默地关闭批量插入。 批量插入将提高性能。 您还需要指定 hibernate.jdbc.batch_size 配置属性,相当于一次 100 个数字。
Manning 的 Java Persistence with Hibernate 是这本书的来源(很棒的书 - 多次拯救了我的皮肤)。
Something simple you might try is to flush and clear the session say every 100 objects...
so execute
every 100 or 1000 inserts.
That will flush and clear the hibernate session and stop it growing too big (possibly why your 100 000 objects are taking so long).
Furthermore if you're using identity identifier generator hibernate will silently turn batch inserts off. Batch inserts will improve performance. You'd also need to specify the hibernate.jdbc.batch_size configuration property equivalent to your 100 at a time number.
Manning's Java Persistence with Hibernate was the source of this (great book - saved my skin numerous times).
您还可以考虑使用 StatelessSession 因为它专为批量操作而设计。
You might also consider using StatelessSession as it is designed for bulk operations.
有时,ORMapper 并不是解决问题的最佳工具。 特别是使用普通的旧式 JDBC 执行批处理操作通常性能更高。 这当然取决于各种条件,但您至少应该将其视为一种选择并比较两种方法的性能。
Sometimes an ORMapper is not the right hammer for the nail. Especially batch operations are often more performantly executed using plain old JDBC. This of course depends on a variety of conditions but you should at least see this as an option and compare performance of both approaches.
这不仅仅是一个数据库插入性能问题; 如果您创建数以万计的对象并且不执行刷新,则 Hibernate 会话将不断增长,直到内存耗尽。
It's not purely a database insert performance issue; if you are creating tens of thousands of objects and not performing a flush, the Hibernate session will grow until you run out of memory.