使用 Hibernate 插入(更新)大量实体的最佳方法是什么?

发布于 2025-01-06 08:51:03 字数 403 浏览 0 评论 0原文

我需要执行类似的操作

   for (int i = 0; i<=moreThanThousand; i++){
       Entity e = new Entity();
       insertEntity(e);
   }

或者

for (Entity e: moreThanThousandEntities){
       updateEntity(e);
   }

Hibernate 中是否有批处理机制?在多个线程中执行这项工作是否有意义?最佳实践是什么? 对于 JDBC,我会使用PreparedStatement 的addBatch() 和executeBatch() 方法,但我不是Hibernate 专家。 提前致谢!

I I need to execute something like

   for (int i = 0; i<=moreThanThousand; i++){
       Entity e = new Entity();
       insertEntity(e);
   }

or

for (Entity e: moreThanThousandEntities){
       updateEntity(e);
   }

Is there some batch mechanism in Hibernate? Does it make sense to perform this work in several threads? What is the best practice?
With JDBC I would use addBatch() and executeBatch() methods of PreparedStatement but I'm not expert in Hibernate.
Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷途知返 2025-01-13 08:51:03

您可以使用以下方法定义批量大小:

hibernate.jdbc.batch_size 20

批量插入/更新很简单:

for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

有关更多详细信息,请查看 此处

You can define the batch size using:

hibernate.jdbc.batch_size 20

The inserting/updating in batches is easy:

for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

For more details, have a look here.

夏至、离别 2025-01-13 08:51:03

如果您正在处理大量数据,例如每天导入数据,并且您的处理窗口非常小,那么不幸的是,最好的方法是使用 JDBC 直接访问数据库,请考虑所有这些:

  • 垃圾收集器 - 避免在关键操作上构建和释放数百万个对象
  • 除了数据处理之外的数据导入 - 尝试使用存储过程处理数据库内的数据。在将数据与其他业务数据(通常需要)相关联时,您可以达到最佳性能。
  • 数据物理验证 - 您应该只在导入阶段进行解析操作和物理验证,仅将清理后的数据留给存储过程,以便根据数据库内的其他业务数据进行验证。
  • 管道 - 考虑构建一个处理管道以同时执行多个阶段。当您导入数据时,已导入的数据将通过存储过程等进行异步处理。

我可以告诉你,在系统上,我们每天应该处理大约 800 万条记录(我没有以字节为单位,但很大),每天只有 2 小时,因此这是达到最佳性能的唯一方法,即使使用允许的最高硬件。

我希望我为您提供了一种可供考虑的新的有用方法。

If you are processing a great volume of data, like as importing data every day and you have a very small window of processing to do that, the best approach unfortunately is to access your DB directly using JDBC, consider all these:

  • Garbage collector - Avoid to construct and free millions of objects on critical operations
  • Data importing apart from data processing - Try to process the data inside database using stored procedure. There you can reach the best performance when relating the data with others business data (usually needed).
  • Data physical validation - Parsing operations and physical validation you shall prefer to do only on importing phase, leave to your stored procedures only the cleaned data to be validated against others business data inside DB.
  • Pipeline - Consider to construct a pipeline of processing to do several phases on the same time. When you are importing data, the data already imported is asynchronously processed by stored procedures, and so on.

I can tell you that on systems when we should process about 8 millions of records (I haven't the volume in bytes but is big) daily during only 2 hours by day, thus was the only way to reach the best performance even using the highest hardware allowed.

I hope i have gave to you a new useful approach to consider.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文