如何持久化大量实体(JPA)
我需要处理 CSV 文件,并为每条记录(行)保留一个实体。现在,我这样做:
while ((line = reader.readNext()) != null) {
Entity entity = createEntityObject(line);
entityManager.save(entity);
i++;
}
其中 save(Entity)
方法基本上只是一个 EntityManager.merge()
调用。 CSV 文件中约有 20,000 个实体(行)。这是一种有效的方法吗?看起来好像相当慢。使用 EntityManager.persist() 会更好吗?这个解决方案有什么缺陷吗?
编辑
这是一个漫长的过程(超过 400 秒),我尝试了两种解决方案:persist
和 merge
。两者完成所需的时间大致相同(459 秒与 443 秒)。问题是像这样一一保存实体是否是最佳的。据我所知,Hibernate(我的 JPA 提供程序)确实实现了一些缓存/刷新功能,因此我不必担心这一点。
I need to process a CSV file and for each record (line) persist an entity. Right now, I do it this way:
while ((line = reader.readNext()) != null) {
Entity entity = createEntityObject(line);
entityManager.save(entity);
i++;
}
where the save(Entity)
method is basically just an EntityManager.merge()
call. There are about 20,000 entities (lines) in the CSV file. Is this an effective way to do it? It seems to be quite slow. Would it be better to use EntityManager.persist()
? Is this solution flawed in any way?
EDIT
It's a lengthy process (over 400s) and I tried both solutions, with persist
and merge
. Both take approximately the same amount of time to complete (459s vs 443s). The question is if saving the entities one by one like this is optimal. As far as I know, Hibernate (which is my JPA provider) does implement some cache/flush functionality so I shouldn't have to worry about this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
JPA API 没有为您提供实现最佳效果的所有选项。根据您想要执行此操作的速度,您必须寻找 ORM 特定选项 - 根据您的情况选择 Hibernate。
需要检查的事项:
所以在 Ebean ORM 中这将是:
哦,并且如果您通过原始 JDBC 执行此操作,您将跳过 ORM 开销(减少对象创建/垃圾收集等) - 所以我不会忽略该选项。
所以,是的,这并不能回答你的问题,但可能会帮助你搜索更多 ORM 特定的批量插入调整。
The JPA API doesn't provide you all the options to make this optimal. Depending on how fast you want to do this you are going to have to look for ORM specific options - Hibernate in your case.
Things to check:
So in Ebean ORM this would be:
Oh, and if you do this via raw JDBC you skip the ORM overhead (less object creation / garbage collection etc) - so I wouldn't ignore that option.
So yes, this doesn't answer your question but might help your search for more ORM specific batch insert tweaks.
我认为一种常见的方法是通过交易。如果您开始一个新事务,然后保留大量对象,那么在您提交事务之前,它们实际上不会插入到数据库中。如果您有大量项目需要提交,这可以提高您的效率。
查看 EntityManager.getTransaction
I think one common way to do this is with transactions. If you begin a new transaction and then persist a large number of objects, they won't actually be inserted into the DB until you commit the transaction. This can gain you some efficiencies if you have a large number of items to commit.
Check out EntityManager.getTransaction
为了让它运行得更快,至少在 Hibernate 中,您可以在一定数量的插入后执行lush()和clear()。我已经对数百万条记录采用了这种方法,并且它有效。虽然还是慢,但是比不做快多了。基本结构是这样的:
To make it go faster, at least in Hibernate, you would do a flush() and a clear() after a certain number of inserts. I have done this approach for millions of records and it works. It's still slow, but it's much faster than not doing it. The basic structure is like this:
您可以使用经典的 SQL Insert 语句将它们直接写入数据库。
@see EntityManager.createNativeQuery
You can write them with a classical SQL Insert Statement direct into the database.
@see EntityManager.createNativeQuery