大批量插入:如何从 Grails 获取 INSERT SQL 代码?

发布于 2024-11-16 07:05:17 字数 510 浏览 3 评论 0原文

我在 Grails 上进行大批量插入时遇到了很多麻烦。我尝试了用户提出的所有最常见的解决方法(例如 WithTransaction 等),但插入仍然慢得离谱。

我有一批大约有 500K 对象的批次。在 Gorm 上,数据库中的对象创建/插入每次插入的执行时间约为 200 毫秒,这显然是不可接受的。 如果我从转储文件中的应用程序生成相关的 SQL INSERT 语句,然后直接在 DBMS 上运行转储文件,我会得到 <每次插入 1ms,这更像是这样。 为了使批处理在线,我正在考虑这个解决方案(伪Groovy代码):

FileBuffer sqlFile = ...
largeBatch.each{
    def obj = createMyObject( it )
    String sql = getSQL code for obj insert
    sqlFile.write( sql )
}
execute sqlFile directly on JDBC

如何从瞬态对象中获取插入SQL代码?

I had a lot of troubles with large insert batches on Grails. I tried all the most common workarounds proposed by users (such as the WithTransaction, etc), and the insertions were still ridiculously slow.

I have a batch with about 500K objects. On Gorm, the object creation/insertion in the DB performs at about 200ms per insert, which is obviously not acceptable.
If I generate the relevant SQL INSERT statements from the app in a dump file, and then I run the dump file directly on the DBMS, I get < 1ms per insert, which is more like it.
In order to make the batch online, I'm thinking of this solution (pseudo Groovy code):

FileBuffer sqlFile = ...
largeBatch.each{
    def obj = createMyObject( it )
    String sql = getSQL code for obj insert
    sqlFile.write( sql )
}
execute sqlFile directly on JDBC

How can I get the insert SQL code from a transient object?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

吖咩 2024-11-23 07:05:17

有两种方法可以执行更快的批处理来插入数据:

  1. hibernate 无状态会话 (http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html)
  2. 原生 sql (http:// groovy.codehaus.org/api/groovy/sql/Sql.html)(创建 x 个插入语句并在 x 个数据库事务中触发它们)例如,您可以创建一个 sql 转储并在一个事务中触发此转储。

使用 hibernate 无状态会话,您还可以批量插入,并且可以插入域类实例,例如 session.save(obj)。使用 navite sql,您必须自己构建语句。

there are 2 ways to performing faster batch processing for inserting data:

  1. hibernate stateless session (http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html)
  2. native sql (http://groovy.codehaus.org/api/groovy/sql/Sql.html) (create x insert statements and fire them in x database transactions) e.g. you can create one single sql dump and fire this dump in one single transaction.

with hibernate stateless session you can also batch the inserts and can insert a domain class instance e.g. session.save(obj). with navite sql you have to build the statements by your own.

翻身的咸鱼 2024-11-23 07:05:17

泰德·纳莱德有一个关于提高批量插入/更新性能的好文章。你遇到过吗?他的建议对我的批量更新很有用。

以下是文章下方的摘录:

通常,grails 休眠会话
快速执行某件事并且
返回。在导入过程中,我们做了很多事情
的处理,都具有相同的
休眠会话。所有这些
通常是垃圾的对象
会话关闭时收集的数据是
堆积如山。

解决这个问题最简单的方法是
创建一个简单的方法来清除
定期收集这些集合。

他建议定期刷新和清理会话。

data.eachWithIndex { datum, index ->
    // insert or update datum
    if(index % 100 == 0) {
        def session = sessionFactory.currentSession()
        session.flush()
        session.clear()
        // also clear the Grails domain class property instance map
        // - it may be that this isn't necessary anymore; not sure, but the article is
        //   a bit old, so maybe it's been fixed within Grails.
    }
}

使用此功能,他将 100k 条记录的批量插入时间从约 2.5 小时缩短到 3 分钟。

@Ted - 如果您遇到这个问题/答案并想自己重复这个答案,我非常乐意删除我的答案并为您的答案投票;你付出了所有的努力才弄清楚。谢谢!

Ted Naleid's got a good write-up on improving batch insert/update performance. Have you come across it? His suggestions have worked for me for batch updates.

Here's an excerpt from lower down in the article:

Normally, a grails hibernate session
executes something quickly and
returns. During importing, we do a ton
of processing, all with the same
hibernate session. All of these
objects that would normally be garbage
collected when the session closed are
piling up.

The easiest way to deal with this is
to create a simple method to clear out
these collections periodically.

He suggests periodically flushing and clearing the session.

data.eachWithIndex { datum, index ->
    // insert or update datum
    if(index % 100 == 0) {
        def session = sessionFactory.currentSession()
        session.flush()
        session.clear()
        // also clear the Grails domain class property instance map
        // - it may be that this isn't necessary anymore; not sure, but the article is
        //   a bit old, so maybe it's been fixed within Grails.
    }
}

Using this, he improves a batch insert of 100k records from ~2.5 hours to 3 minutes.

@Ted - If you come across this question/answer and would like to repeat this answer yourself, I'd be more than happy to delete my answer and upvote yours; you did all of the hard work figuring it out. Thanks!

百合的盛世恋 2024-11-23 07:05:17

您可以使用自定义 Hibernate Configuration 子类来捕获启动时生成的 SQL。我不确定稍后访问它的最佳方式是什么 - 也许是类中的公共静态字段。

这是代码(将其放在 src/java 下):

package com.yourcompany.yourapp;

import org.codehaus.groovy.grails.orm.hibernate.cfg.GrailsAnnotationConfiguration;
import org.hibernate.HibernateException;

public class DdlCapturingConfiguration extends GrailsAnnotationConfiguration {

   @Override
   public String[] generateSchemaCreationScript(Dialect d) throws HibernateException {
      String[] sql = super.generateSchemaCreationScript(d);
      // do something with the sql
      return sql;
   }
}

您可以将其注册到 grails-app/conf/DataSource.groovy 中,如下所示:

dataSource {
   pooled = true
   driverClassName = ...
   username = ...
   password = ...
   dialect = ...
   configClass = com.yourcompany.yourapp.DdlCapturingConfiguration
}

You can use a custom Hibernate Configuration subclass to capture the SQL that's generated at startup. I'm not sure what the best way to access it later would be - perhaps a public static field in the class.

Here's the code (put it under src/java):

package com.yourcompany.yourapp;

import org.codehaus.groovy.grails.orm.hibernate.cfg.GrailsAnnotationConfiguration;
import org.hibernate.HibernateException;

public class DdlCapturingConfiguration extends GrailsAnnotationConfiguration {

   @Override
   public String[] generateSchemaCreationScript(Dialect d) throws HibernateException {
      String[] sql = super.generateSchemaCreationScript(d);
      // do something with the sql
      return sql;
   }
}

and you'd register it in grails-app/conf/DataSource.groovy like this:

dataSource {
   pooled = true
   driverClassName = ...
   username = ...
   password = ...
   dialect = ...
   configClass = com.yourcompany.yourapp.DdlCapturingConfiguration
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文