使用JPA/Toplink批量插入

发布于 2024-07-04 21:24:49 字数 1204 浏览 8 评论 0原文

我有一个通过 HTTP 接口接收消息的 Web 应用程序，例如：

http://server/application?source=123&destination=234&text=hello

此请求包含发送者的 ID、接收者的 ID 和消息的文本。

该消息的处理方式应如下所示：

从数据库中查找源和目标的匹配 User 对象，
创建对象树：一个包含消息文本字段的 Message 以及两个用于源和目标的 User 对象，
以保存该消息树到数据库。

该树将由我无法触及的其他应用程序加载。

我使用 Oracle 作为后备数据库，并使用 JPA 和 Toplink 来执行数据库处理任务。如果可能的话，我会留下这些。

无需太多优化，我就可以在我的环境中实现约 30 个请求/秒的吞吐量。这并不多，我需要每秒约 300 个请求。因此，我测量了性能瓶颈所在，发现对 em.persist() 的调用占用了大部分时间。如果我简单地注释掉该行，吞吐量将远远超过 1000 个请求/秒。

我尝试编写一个小型测试应用程序，它使用简单的 JDBC 调用将 100 万条消息保存到同一数据库。我使用了批处理，这意味着我执行了 100 次插入，然后提交，然后重复执行，直到所有记录都在数据库中。在这种情况下，我测量了约 500 个请求/秒的吞吐量，这可以满足我的需求。

很明显，我需要在这里优化插入性能。然而，正如我之前提到的，我想继续使用 JPA 和 Toplink，而不是纯粹的 JDBC。

您知道如何使用 JPA 和 Toplink 创建批量插入吗？您能推荐任何其他技术来提高 JPA 持久性能吗？

附加信息：

“请求数/秒”在此表示：请求总数/从测试开始到最后一条记录写入数据库的总时间。

我尝试通过在 servlet 内容和持久器之间创建一个内存队列来异步调用 em.persist()。对表演有很大帮助。然而，队列确实增长得非常快，并且应用程序将连续接收约 200 个请求/秒，这对我来说不是一个可接受的解决方案。

在这种解耦方法中，我收集了 100 毫秒的请求，并在提交事务之前对所有收集的项目调用 em.persist()。 EntityManagerFactory 在每个事务之间进行缓存。

原文

I have a web application that receives messages through an HTTP interface, e.g.:

http://server/application?source=123&destination=234&text=hello

This request contains the ID of the sender, the ID of the recipient and the text of the message.

This message should be processed like:

finding the matching User object for both the source and the destination from the database
creating a tree of objects: a Message that contains a field for the message text and two User objects for the source and the destination
persisting this tree to a database.

The tree will be loaded by other applications that I can't touch.

I use Oracle as the backing database and JPA with Toplink for the database handling tasks. If possible, I'd stay with these.

Without much optimization I can achieve ~30 requests/sec throughput in my environment. That's not much, I'd require ~300 requests/sec. So I measured where the performance bottleneck is and found that the calls to em.persist() takes most of the time. If I simply comment out that line, the throughput go well over 1000 requests/sec.

I tried to write a small test application that used simple JDBC calls to persist 1 million messages to the same database. I used batching, meaning I did 100 inserts then a commit, and repeated until all the records was in the database. I measured ~500 requests/sec throughput in this scenario, that would meet my needs.

It is clear that I need to optimize insert performance here. However as I mentioned earlier I would like to keep using JPA and Toplink for this, not pure JDBC.

Do you know a way to create batch inserts with JPA and Toplink? Can you recommend any other technique for improving JPA persist performance?

ADDITIONAL INFO:

"requests/sec" means here: total number of requests / total time from beginning of test to last record written to database.

I tried to make the calls to em.persist() asynchronous by creating an in-memory queue between the servlet stuff and the persister. It helped the performance greatly. However the queue did grow really fast and as the application will receive ~200 requests/second continuously, It is not an acceptable solution for me.

In this decoupled approach I collected requests for 100 msec and called em.persist() on all collected items before commiting the transaction. The EntityManagerFactory is cached between each transaction.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

噩梦成真你也成魔 2024-07-11 21:24:49

您对“请求/秒”的衡量标准是多少？换句话说，第 31 个请求会发生什么？什么资源被阻止？如果是前端/servlet/web部分，可以在另一个线程中运行em.persist()并立即返回吗？

另外，您每次都创建交易吗？您是否为每个请求创建 EntityManagerFactory 对象？

回复收藏 0 原文

天暗了我发光 2024-07-11 21:24:49

您应该与 JPA 接口解耦并使用裸露的 TopLink API。您可以将要保留的对象放入工作单元中，并按您的计划提交工作单元（同步或异步）。请注意，em.persist() 的成本之一是整个对象图发生的隐式克隆。如果您自己 uow.registerObject() 您的两个用户对象，TopLink 将工作得更好，从而节省了它必须执行的身份测试。所以你最终会得到：

uow=sess.acquireUnitOfWork();
for (job in batch) {
 thingyCl=uow.registerObject(new Thingy());
 user1Cl=uow.registerObject(user1);
 user2Cl=uow.registerObject(user2);
 thingyCl.setUsers(user1Cl,user2Cl);
}
uow.commit();

这是非常老派的 TopLink 顺便说一句；）

请注意，批处理将有很大帮助，因为批处理写入，尤其是带有参数绑定的批处理写入将会启动，对于这个简单的示例来说，可能会有一个非常好的结果。对你的表现影响很大。

其他需要注意的事项：您的测序大小。在 TopLink 中编写对象所花费的大量时间实际上是从数据库中读取排序信息，尤其是默认值较小的情况（我的序列大小可能有数百甚至更多）。

You should decouple from the JPA interface and use the bare TopLink API. You can probably chuck the objects you're persisting into a UnitOfWork and commit the UnitOfWork on your schedule (sync or async). Note that one of the costs of em.persist() is the implicit clone that happens of the whole object graph. TopLink will work rather better if you uow.registerObject() your two user objects yourself, saving itself the identity tests it has to otherwise do. So you'll end up with:

uow=sess.acquireUnitOfWork();
for (job in batch) {
 thingyCl=uow.registerObject(new Thingy());
 user1Cl=uow.registerObject(user1);
 user2Cl=uow.registerObject(user2);
 thingyCl.setUsers(user1Cl,user2Cl);
}
uow.commit();

This is very old school TopLink btw ;)

Note that the batch will help a lot, because batch writing and more especially batch writing with parameter binding will kick in which for this simple example will probably have a very large impact on your performance.

Other things to look for: your sequencing size. A lot of the time spent writing objects in TopLink is actually spent reading sequencing information from the database, especially with the small defaults (I would probably have several hundred or even more as my sequence size).

回复收藏 0 原文

~没有更多了~