使用 Hibernate 根据唯一键查找或插入
我正在尝试编写一个方法,该方法将基于唯一但非主键返回 Hibernate 对象。如果该实体已存在于数据库中,我想返回它,但如果不存在,我想创建一个新实例并在返回之前保存它。
更新:让我澄清一下,我正在编写的应用程序基本上是输入文件的批处理器。系统需要逐行读取文件并将记录插入数据库。文件格式基本上是我们模式中几个表的非规范化视图,所以我要做的就是解析父记录,或者将其插入数据库,以便我可以获得新的合成键,或者如果它已经存在,则选择它。然后,我可以在其他表中添加其他关联记录,这些表具有指向该记录的外键。
这变得棘手的原因是每个文件都需要完全导入或根本不导入,即对给定文件完成的所有插入和更新都应该是一个事务的一部分。如果只有一个进程执行所有导入,那么这很容易,但如果可能的话,我想将其分解到多个服务器上。由于这些限制,我需要能够留在一个事务中,但处理记录已存在的异常。
父记录的映射类如下所示:
@Entity
public class Foo {
@Id
@GeneratedValue(strategy = IDENTITY)
private int id;
@Column(unique = true)
private String name;
...
}
我最初尝试编写此方法如下:
public Foo findOrCreate(String name) {
Foo foo = new Foo();
foo.setName(name);
try {
session.save(foo)
} catch(ConstraintViolationException e) {
foo = session.createCriteria(Foo.class).add(eq("name", name)).uniqueResult();
}
return foo;
}
问题是当我要查找的名称存在时,调用 uniqueResult( 会引发 org.hibernate.AssertionFailure 异常。 )。完整的堆栈跟踪如下:
org.hibernate.AssertionFailure: null id in com.searchdex.linktracer.domain.LinkingPage entry (don't flush the Session after an exception occurs)
at org.hibernate.event.def.DefaultFlushEntityEventListener.checkId(DefaultFlushEntityEventListener.java:82) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultFlushEntityEventListener.getValues(DefaultFlushEntityEventListener.java:190) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:147) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:219) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:99) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:58) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:1185) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1709) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.CriteriaImpl.uniqueResult(CriteriaImpl.java:369) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
有谁知道是什么导致抛出此异常? Hibernate 是否支持更好的方法来实现这一目标?
让我先解释一下为什么我要先插入,然后选择是否失败以及何时失败。这需要在分布式环境中工作,因此我无法同步检查记录是否已存在和插入。最简单的方法是让数据库通过检查每个插入的约束违规来处理此同步。
I'm trying to write a method that will return a Hibernate object based on a unique but non-primary key. If the entity already exists in the database I want to return it, but if it doesn't I want to create a new instance and save it before returning.
UPDATE: Let me clarify that the application I'm writing this for is basically a batch processor of input files. The system needs to read a file line by line and insert records into the db. The file format is basically a denormalized view of several tables in our schema so what I have to do is parse out the parent record either insert it into the db so I can get a new synthetic key, or if it already exists select it. Then I can add additional associated records in other tables that have foreign keys back to that record.
The reason this gets tricky is that each file needs to be either totally imported or not imported at all, i.e. all inserts and updates done for a given file should be a part of one transaction. This is easy enough if there's only one process that's doing all the imports, but I'd like to break this up across multiple servers if possible. Because of these constraints I need to be able to stay inside one transaction, but handle the exceptions where a record already exists.
The mapped class for the parent records looks like this:
@Entity
public class Foo {
@Id
@GeneratedValue(strategy = IDENTITY)
private int id;
@Column(unique = true)
private String name;
...
}
My initial attempt at writting this method is as follows:
public Foo findOrCreate(String name) {
Foo foo = new Foo();
foo.setName(name);
try {
session.save(foo)
} catch(ConstraintViolationException e) {
foo = session.createCriteria(Foo.class).add(eq("name", name)).uniqueResult();
}
return foo;
}
The problem is when the name I'm looking for exists, an org.hibernate.AssertionFailure exception is thrown by the call to uniqueResult(). The full stack trace is below:
org.hibernate.AssertionFailure: null id in com.searchdex.linktracer.domain.LinkingPage entry (don't flush the Session after an exception occurs)
at org.hibernate.event.def.DefaultFlushEntityEventListener.checkId(DefaultFlushEntityEventListener.java:82) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultFlushEntityEventListener.getValues(DefaultFlushEntityEventListener.java:190) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:147) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:219) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:99) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:58) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:1185) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1709) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.CriteriaImpl.uniqueResult(CriteriaImpl.java:369) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
Does anyone know what is causing this exception to be thrown? Does hibernate support a better way of accomplishing this?
Let me also preemptively explain why I'm inserting first and then selecting if and when that fails. This needs to work in a distributed environment so I can't synchronize across the check to see if the record already exists and the insert. The easiest way to do this is to let the database handle this synchronization by checking for the constraint violation on every insert.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我有类似的批处理要求,进程在多个 JVM 上运行。我为此采取的方法如下。这非常像 jtahlborn 的建议。但是,正如 vbence 指出的那样,如果您使用嵌套事务,当您收到约束违反异常时,您的会话将失效。相反,我使用 REQUIRES_NEW,它会挂起当前事务并创建一个新的独立事务。如果新的事务回滚,不会影响原来的事务。
我正在使用 Spring 的 TransactionTemplate,但我确信如果您不想依赖 Spring,您可以轻松地翻译它。
I had a similar batch processing requirement, with processes running on multiple JVMs. The approach I took for this was as follows. It is very much like jtahlborn's suggestion. However, as vbence pointed out, if you use a NESTED transaction, when you get the constraint violation exception, your session is invalidated. Instead, I use REQUIRES_NEW, which suspends the current transaction and creates a new, independent transaction. If the new transaction rolls back it will not affect the original transaction.
I am using Spring's TransactionTemplate but I'm sure you could easily translate it if you do not want a dependency on Spring.
您需要使用
UPSERT
或MERG
E 来实现此目标。但是,Hibernate 不提供对此构造的支持,因此您需要使用 jOOQ 代替。
在 PostgreSQL 上调用此方法:
生成以下 SQL 语句:
在 Oracle 和 SQL Server 上,jOOQ 将使用
MERGE
,而在 MySQL 上它将使用ON DUPLICATE KEY
。并发机制是通过插入、更新或删除记录时采用的行级锁定机制来保证的,您可以在下图中查看:
代码可在 GitHub。
You need to use
UPSERT
orMERG
E to achieve this goal.However, Hibernate does not offer support for this construct, so you need to use jOOQ instead.
Calling this method on PostgreSQL:
Yields the following SQL statements:
On Oracle and SQL Server, jOOQ will use
MERGE
while on MySQL it will useON DUPLICATE KEY
.The concurrency mechanism is ensured by the row-level locking mechanism employed when inserting, updating, or deleting a record, which you can view in the following diagram:
Code avilable on GitHub.
我想到了两个解决方案:
这就是表锁的用途,
Hibernate 不支持表锁,但当它们派上用场时就会出现这种情况。幸运的是,您可以通过
Session.createSQLQuery()
使用本机 SQL。例如(在MySQL上):这样当一个会话(客户端连接)获得锁时,所有其他连接都会被阻塞,直到操作结束并释放锁。其他连接的读操作也会被阻止,因此不用说仅在原子操作的情况下才使用它。
Hibernate 的锁怎么样?
Hibernate 使用行级锁定。我们不能直接使用它,因为我们不能锁定不存在的行。但是我们可以创建一个包含单个记录的虚拟表,将其映射到 ORM,然后在该对象上使用 SELECT ... FOR UPDATE 样式锁来同步我们的客户端。基本上,我们只需要确保在我们工作时没有其他客户端(运行相同的软件,具有相同的约定)会执行任何冲突的操作。
你的数据库必须知道 SELECT ... FOR UPDATE 语法(Hibernate 会使用它),当然这只有在所有客户端都具有相同约定的情况下才有效(它们需要锁定相同的约定)虚拟实体)。
Two solution come to mind:
That's what TABLE LOCKS are for
Hibernate does not support table locks, but this is the situation when they come handy. Fortunately you can use native SQL thru
Session.createSQLQuery()
. For example (on MySQL):This way when a session (client connection) gets the lock, all the other connections are blocked until the operation ends and the locks are released. Read operations are also blocked for other connections, so needless to say use this only in case of atomic operations.
What about Hibernate's locks?
Hibernate uses row level locking. We can not use it directly, because we can not lock non-existent rows. But we can create a dummy table with a single record, map it to the ORM, then use
SELECT ... FOR UPDATE
style locks on that object to synchronize our clients. Basically we only need to be sure that no other clients (running the same software, with the same conventions) will do any conflicting operations while we are working.Your database has to know the
SELECT ... FOR UPDATE
syntax (Hibernate is goig to use it), and of course this only works if all your clients has the same convention (they need to lock the same dummy entity).有关事务的 Hibernate 文档and exceptions 声明所有 HibernateException 都是不可恢复的,并且一旦遇到就必须回滚当前事务。这解释了为什么上面的代码不起作用。最终,您不应该在不退出事务并关闭会话的情况下捕获 HibernateException。
实现这一目标的唯一真正方法似乎是在方法本身内管理旧会话的关闭并重新打开新会话。根据我的发现,使用 Hibernate 实现可以参与现有事务并且在分布式环境中安全的 findOrCreate 方法似乎是不可能的。
The Hibernate documentation on transactions and exceptions states that all HibernateExceptions are unrecoverable and that the current transaction must be rolled back as soon as one is encountered. This explains why the code above does not work. Ultimately you should never catch a HibernateException without exiting the transaction and closing the session.
The only real way to accomplish this it would seem would be to manage the closing of the old session and reopening of a new one within the method itself. Implementing a findOrCreate method which can participate in an existing transaction and is safe within a distributed environment would seem to be impossible using Hibernate based on what I have found.
解决方案实际上非常简单。首先使用您的名称值执行选择。如果找到结果,则返回该结果。如果没有,请创建一个新的。如果创建失败(有例外),这是因为另一个客户端在您的 select 和 insert 语句之间添加了完全相同的值。那么这是合乎逻辑的,你有一个例外。抓住它,回滚您的事务并再次运行相同的代码。因为该行已经存在,所以 select 语句将找到它并且您将返回对象。
您可以在此处查看有关 hibernate 乐观和悲观锁定策略的说明: http://docs.jboss.org/hibernate/core/3.3/reference/en/html/transactions.html
The solution is in fact really simple. First perform a select using your name value. If a result is found, return that. If not, create a new one. In case the creation fail (with an exception), this is because another client added this very same value between your select and your insert statement. This is then logical that you have an exception. Catch it, rollback your transaction and run the same code again. Because the row already exist, the select statement will find it and you'll return your object.
You can see here explanation of strategies for optimistic and pessimistic locking with hibernate here : http://docs.jboss.org/hibernate/core/3.3/reference/en/html/transactions.html
有几个人提到了总体战略的不同部分。假设您通常希望查找现有对象比创建新对象更频繁:
只是为了澄清,正如另一个答案中指出的那样, “嵌套”事务实际上是一个单独的事务(许多数据库甚至不支持真正的嵌套事务)。
a couple people have mentioned different parts of the overall strategy. assuming that you generally expect to find an existing object more often than you create a new object:
just to clarify, as pointed out in another answer, the "nested" transaction is actually a separate transaction (many databases don't even support true, nested transactions).
嗯,这是一种方法 - 但它并不适合所有情况。
name
上的“unique = true”属性。添加在每次插入时更新的时间戳。findOrCreate()
中,不必费心检查给定名称的实体是否已存在 - 只需每次插入一个新实体即可。name
查找 Foo 实例时,可能有 0 个或多个具有给定名称的实例,因此您只需选择最新的一个即可。这个方法的好处是它不需要任何锁定,所以一切都应该运行得很快。缺点是您的数据库将充斥着过时的记录,因此您可能需要在其他地方做一些事情来处理它们。另外,如果其他表通过其
id
引用 Foo,那么这将搞乱这些关系。Well, here's one way to do it - but it's not appropriate for all situations.
name
. Add a timestamp that gets updated on every insert.findOrCreate()
, don't bother checking if the entity with the given name already exists - just insert a new one every time.name
, there may be 0 or more with a given name, so you just select the newest one.The nice thing about this method is that it doesn't require any locking, so everything should run pretty fast. The downside is that your database will be littered with obsolete records, so you may have to do something somewhere else to deal with them. Also, if other tables refer to Foo by its
id
, then this will screw up those relations.也许你应该改变你的策略:
首先找到具有该名称的用户,并且仅当该用户不存在时才创建它。
Maybe you should change your strategy:
First find the user with the name and only if the user thoes not exist, create it.
我会尝试以下策略:
A。启动主事务(在时间 1)
B。启动子事务(在时间 2)
现在,在时间 1 之后创建的任何对象在主事务中将不可见。所以当你做
C时。创建新的竞争条件对象,提交子事务
D。通过启动新的子事务(在时间 3)并从查询中获取对象(来自 B 点的子事务现在超出范围)来处理冲突。
仅返回对象主键,然后使用 EntityManager.getReference(..) 获取您将在主事务中使用的对象。或者,在D之后开始主事务;我并不完全清楚你的主交易中有多少个竞争条件,但上面应该允许在“大”交易中出现 n 次 BCD。
请注意,您可能想要执行多线程处理(每个 CPU 一个线程),然后您可以通过使用共享静态缓存来解决此类冲突,从而大大减少此问题 - 并且第 2 点可以保持“乐观”,即不这样做首先是 .find(..) 。
编辑:对于新事务,您需要使用事务类型注释的 EJB 接口方法调用 REQUIRES_NEW。
编辑:仔细检查 getReference(..) 是否按我的想法工作。
I would try the following strategy:
A. Start a main transaction (at time 1)
B. Start a sub-transaction (at time 2)
Now, any object created after time 1 will not be visible in the main transaction. So when you do
C. Create new race-condition object, commit sub-transaction
D. Handle conflict by starting a new sub-transaction (at time 3) and getting the object from a query (the sub-transaction from point B is now out-of-scope).
only return the object primary key and then use EntityManager.getReference(..) to obtain the object you will be using in the main transaction. Alternatively, start the main transaction after D; it is not totally clear to me in how many race conditions you will have within your main transaction, but the above should allow for n times B-C-D in a 'large' transaction.
Note that you might want to do multi-threading (one thread per CPU) and then you can probably reduce this issue considerably by using a shared static cache for these kind of conflicts - and point 2 can be kept 'optimistic', i.e. not doing a .find(..) first.
Edit: For a new transaction, you need an EJB interface method call annotated with transaction type REQUIRES_NEW.
Edit: Double check that the getReference(..) works as I think it does.