JPA 使用对象图插入速度很慢

发布于 2024-09-06 15:47:00 字数 1279 浏览 13 评论 0原文

我正在尝试使用 JPA 对大型对象图进行级联保存。例如(我的对象图有点大,但足够接近):

@Entity
@Table(name="a")
public class A {
  private long id;
  @OneToMany(cascade = CascadeType.ALL, mappedBy = "a")
  private Collection<B> bs;
}

@Entity
@Table(name="b")
public class B {
  private long id;
  @ManyToOne
  private A a;
}

所以我试图保留 A,它有 100 多个 B 的集合。代码只是

em.persist(a);

问题是,它很慢。我的保存大约需要 1300 毫秒。我查看了生成的 SQL,发现效率非常低。像这样的事情:

select a_seq.nextval from dual;
select b_seq.nextval from dual;
select b_seq.nextval from dual;
select b_seq.nextval from dual;
...
insert into a (id) values (1);
insert into b (id, fk) values (1, 1);
insert into b (id, fk) values (2, 1);
insert into b (id, fk) values (3, 1);
...

目前使用 toplink 作为持久性提供程序,但我也尝试过 eclipselink 和 hibernate。后端是oracle 11g。问题实际上是如何将 sql 组合在一起。这些操作中的每一个都是离散完成的,而不是批量完成的,因此,如果我的应用程序服务器和数据库服务器之间的网络延迟甚至为 5 毫秒,则执行 200 个离散操作会增加 1 秒。我尝试增加序列的分配大小,但这只会有一点帮助。我还尝试过直接 JDBC 作为批处理语句:

for...{
  statement = connection.prepareStatement(sql);
  statement.addBatch();
}
statement.executeBatch();

对于我的数据模型,作为直接 JDBC 批处理完成大约需要 33 毫秒。 Oracle 本身需要 5 毫秒来完成 100 多个插入。

有没有办法让 JPA(我现在一直使用 1.0)运行得更快,而不需要深入研究供应商特定的东西,比如休眠批量插入?

谢谢!

I'm trying to do a cascading save on a large object graph using JPA. For example (my object graph is a little bigger but close enough):

@Entity
@Table(name="a")
public class A {
  private long id;
  @OneToMany(cascade = CascadeType.ALL, mappedBy = "a")
  private Collection<B> bs;
}

@Entity
@Table(name="b")
public class B {
  private long id;
  @ManyToOne
  private A a;
}

So I'm trying to persist A which has a collection of 100+ B's. Code is just

em.persist(a);

Problem is, it's SLOW. My save is taking approximately 1300ms. I looked at the SQL being generated and it's horribly inefficient. Something like this:

select a_seq.nextval from dual;
select b_seq.nextval from dual;
select b_seq.nextval from dual;
select b_seq.nextval from dual;
...
insert into a (id) values (1);
insert into b (id, fk) values (1, 1);
insert into b (id, fk) values (2, 1);
insert into b (id, fk) values (3, 1);
...

Currently using toplink as the persistence provider but I've tried eclipselink and hibernate also. Backend is oracle 11g. Problem is really how the sql is put together. Each of these operations is getting done discretely rather than in bulk, so if there is a network latency of even 5ms between my appserver and db server, doing 200 discrete operations adds 1 second. I've tried increasing the allocationSize of my sequences but that only helps out a bit. I've also tried direct JDBC as a batch statement:

for...{
  statement = connection.prepareStatement(sql);
  statement.addBatch();
}
statement.executeBatch();

For my datamodel it takes about 33ms done as direct JDBC batch. Oracle itself is taking 5ms for the 100+ inserts.

Is there anyway of making JPA (i'm stuck with 1.0 right now...) go faster without delving into vendor specific things like hibernate bulk insert?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦幻之岛 2024-09-13 15:47:00

好奇为什么你觉得增加 INCRMENT BY 很脏?这是一种优化,可以减少为检索下一个序列值而对数据库进行的调用次数,并且是数据库客户端中使用的常见模式,其中 id 值在 INSERT 之前在客户端中分配。我不认为这是 JPA 或 ORM 问题,并且在 JDBC 比较中应该具有相同的成本,因为它还必须在 INSERT 之前为每个新行检索新的序列号。如果您的 JDBC 案例有不同的方法,那么我们应该能够让 EclipseLink JPA 遵循相同的方法。

JPA 的成本在孤立的 INSERT 场景中可能最为明显,因为您无法从事务或共享缓存的重复读取中获得任何好处,并且根据您的缓存配置,您需要付出一定的代价才能将这些新实体放入缓存中刷新/提交。

请注意,创建第一个 EntityManager 也是有成本的,其中所有元数据处理、类加载、可能的编织和元模型初始化。确保将此时间排除在比较之外。在您的实际应用程序中,这种情况只会发生一次,所有后续的 EntityManager 都会从共享元数据中受益。

如果您有其他场景需要读取这些实体,那么将它们放入缓存中的成本可以减少检索它们的成本。根据我的经验,我可以使应用程序总体上比典型的手写 JDBC 解决方案快得多,但它是在整个并发用户集上实现平衡,而不是在孤立的测试用例上实现。

我希望这有帮助。很高兴提供更多指导以及 EclipseLink JPA 及其性能和可扩展性选项。

道格

Curious why you find increasing the INCREMENT BY as dirty? It is an optimization that reduces the number of calls to the database to retrieve the next sequence value and is a common pattern used in database clients where the id value is assigned in the client prior to INSERT. I don't see this as a JPA or ORM issue and should be the same cost in your JDBC comparison since it must also retrieve a new sequence number for each new row prior to INSERT. If you have a different approach in your JDBC case then we should be able to get EclipseLink JPA to follow the same approach.

The cost of JPA is probably most obvious on the isolated INSERT scenario because you are not gaining any benefit from repeated reads on a transactional or shared cache and depending on your cache configuration you are paying a price to put these new entities into the cache within the flush/commit.

Please note that there is also a cost to creating the first EntityManager where all of the metadata processing, class-loading, possibly weaving, and metamodel initialization. Make sure you keep this time out of your comparison. In your real application this occurs once and all subsequent EntityManager benefit from the shared metadata.

If you have other scenarios that need to read these entities then the cost of putting them in the cache can reduce their cost of retrieving them. In my experience I can make an application overall much faster then a typical hand-written JDBC solution but its a balance across the entire set of concurrent users and not on an isolated test case.

I hope this helps. Happy to provide any more guidance and EclipseLink JPA and its performance and scalability options.

Doug

梨涡少年 2024-09-13 15:47:00

解决方案是启用 JDBC 批处理,并定期刷新和清除 EntityManager(与批处理大小相同),但我不知道是否有供应商中立的方法来执行此操作:

The solution would be to enable JDBC batching and to flush and clear the EntityManager at regular intervals (the same than the batch size) but I'm not aware of a vendor neutral way to do this:

  • With Hibernate, you'd have to set the hibernate.jdbc.batch_size configuration option. See Chapter 13. Batch processing

  • With EclipseLink, it looks like there is a batch writing mode. See Jeff Sutherland's post in this thread (it should also be possible to specify the size).

  • According to the comments of this blog post, batch writing is not available in TopLink Essentials :(

梦罢 2024-09-13 15:47:00

感谢帕斯卡的回复。我已经做了一些测试,并且能够显着提高性能。

在没有优化的情况下,我的插入大约花费了 1100 毫秒。我使用 eclipselink 添加到 persistence.xml:

   <property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
   <property name="eclipselink.jdbc.batch-writing.size" value="1000"/>

我尝试了其他属性(Oracle-JDBC 等),但 JDBC 似乎提供了最佳的性能提升。这使得插入时间降低到大约 900 毫秒。因此,性能提高了 200 毫秒,相当适度。增加序列分配大小可以节省大量成本。我不太喜欢这样做。我发现仅仅为了适应 JPA 而增加序列的 INCRMENT BY 是很肮脏的。增加这些可以将每次插入的时间减少到大约 600 毫秒。因此,这些增强功能总共缩短了大约 500 毫秒。

所有这些都很好,但它仍然比 JDBC 批处理慢得多。为了简化编码,JPA 付出了相当高的代价。

Thanks Pascal for the response. I've done some tests and I was able to significantly increase the performance.

With no optimizations i had an insert taking approximately 1100ms. Using eclipselink I added to the persistence.xml:

   <property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
   <property name="eclipselink.jdbc.batch-writing.size" value="1000"/>

I tried the other properties (Oracle-JDBC etc) but JDBC appeared to give the best performance increase. That brought the insert down to approximately 900ms. So a fairly modest performance increase of 200ms. A big savings came from increasing the sequence allocationSize. I'm not a huge fan of doing this. I find it dirty to increase the INCREMENT BY of my sequences just to accommodate JPA. Increasing these brought the time down to approximately 600ms for each insert. So a total of about 500 ms were shaved off with those enhancements.

All this is fine and dandy, but it's still significantly slower than JDBC batch. JPA is a pretty high price to pay for ease of coding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文