防止实体框架中 if-exists-update-else-insert 的竞争条件
我一直在阅读有关如何在 EF 中实现 if-exists-insert-else-update 语义的其他问题,但要么我不明白答案是如何工作的,要么它们实际上没有解决问题。提供的常见解决方案是将工作包装在事务范围内(例如:在没有竞争条件的情况下使用实体框架实现 if-not-exists-insert):
using (var scope = new TransactionScope()) // default isolation level is serializable
using(var context = new MyEntities())
{
var user = context.Users.SingleOrDefault(u => u.Id == userId); // *
if (user != null)
{
// update the user
user.property = newProperty;
context.SaveChanges();
}
else
{
user = new User
{
// etc
};
context.Users.AddObject(user);
context.SaveChanges();
}
}
但我看不出这如何解决任何问题,至于这个工作,我在上面加星标的行如果第二个线程尝试访问相同的用户 ID,则应阻塞,仅当第一个线程完成其工作时才解除阻塞。然而,使用事务不会导致这种情况,并且由于当第二个线程第二次尝试创建同一用户时发生密钥冲突,我们将抛出 UpdateException。
与其捕获由竞争条件引起的异常,不如从一开始就防止竞争条件发生。实现此目的的一种方法是让加星号的行在与其条件匹配的数据库行上获取独占锁,这意味着在此块的上下文中,一次只有一个线程可以与用户一起工作。
看来这对于 EF 用户来说一定是一个常见问题,所以我正在寻找一个干净、通用的解决方案,我可以在任何地方使用。
如果可能的话,我真的很想避免使用存储过程来创建我的用户。
有什么想法吗?
编辑:我尝试使用相同的用户 ID 在两个不同的线程上同时执行上述代码,尽管取出了可序列化事务,但它们都能够同时进入临界区 (*)。当第二个线程尝试插入第一个线程刚刚插入的相同用户 ID 时,这会导致引发 UpdateException。这是因为,正如下面 Ladislav 所指出的,可序列化事务仅在开始修改数据(而不是读取)后才获取排他锁。
I've been reading other questions on how to implement if-exists-insert-else-update semantics in EF, but either I'm not understanding how the answers work, or they are in fact not addressing the issue. A common solution offered is to wrap the work in a transaction scope (eg: Implementing if-not-exists-insert using Entity Framework without race conditions):
using (var scope = new TransactionScope()) // default isolation level is serializable
using(var context = new MyEntities())
{
var user = context.Users.SingleOrDefault(u => u.Id == userId); // *
if (user != null)
{
// update the user
user.property = newProperty;
context.SaveChanges();
}
else
{
user = new User
{
// etc
};
context.Users.AddObject(user);
context.SaveChanges();
}
}
But I fail to see how this solves anything, as for this to work, the line I have starred above should block if a second thread tries to access the same user ID, unblocking only when the first thread has finished its work. Using a transaction will not cause this however, and we'll get an UpdateException thrown due to the key violation that occurs when the second thread attempts to create the same user for a second time.
Instead of catching the exception caused by the race condition, it would be better to prevent the race condition from happening in the first place. One way to do this would be for the starred line to take out an exclusive lock on the database row that matches its condition, meaning that in the context of this block, only one thread at a time could work with a user.
It seems that this must be a common problem for users of the EF, so I'm looking for a clean, generic solution that I can use everywhere.
I'd really like to avoid using a stored procedure to create my user if possible.
Any ideas?
EDIT: I tried executing the above code concurrently on two different threads using the same user ID, and despite taking out serializable transactions, they were both able to enter the critical section (*) concurrently. This lead to an UpdateException being thrown when the second thread attempted to insert the same user ID that the first had just inserted. This is because, as pointed out by Ladislav below, a serializable transaction takes exclusive locks only after it has begun modifying data, not reading.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当使用可序列化事务时,SQL Server 会在读取记录/表上发出共享锁。共享锁不允许其他事务修改锁定的数据(事务将阻塞),但它允许其他事务在发出锁的事务开始修改数据之前读取数据。这就是该示例不起作用的原因 - 允许使用共享锁进行并发读取,直到第一个事务开始修改数据。
您需要隔离,其中 select 命令专门为单个客户端锁定整个表。它必须锁定整个表,否则它无法解决插入“相同”记录的并发性。使用提示时,可以通过 select 命令对锁定记录或表进行精细控制,但您必须编写直接 SQL 查询才能使用它们 - EF 对此不支持。我描述了专门锁定该表的方法 这里,但这就像创建对表的顺序访问,它会影响访问该表的所有其他客户端。
如果您确实确定此操作仅在您的单个方法中发生,并且没有其他应用程序使用您的数据库,您可以简单地将代码放入临界区(.NET 同步,例如使用
lock
)并确保在.NET方面,只有单个线程可以访问临界区。这不是那么可靠的解决方案,但是任何使用锁和事务级别的操作都会对数据库性能和吞吐量产生很大影响。您可以将此方法与乐观并发(唯一约束、时间戳等)结合起来。When using serializable transaction SQL Server issues shared locks on read records / tables. Shared locks doesn't allow other transactions modifying locked data (transactions will block) but it allows other transactions reading data before the transaction which issued locks start modifying data. That is the reason why the example doesn't work - concurrent reads are allowed with shared locks until the first transaction starts modifying data.
You want isolation where select command locks the whole table exclusively for a single client. It must lock the whole table because otherwise it will not solve concurrency for inserting "the same" record. Granular control for locking records or tables by select commands is possible when using hints but you must write direct SQL queries to use them - EF has no support for that. I described approach for exclusively locking that table here but it is like creating sequential access to the table and it affects all other clients accessing this table.
If you are really sure that this operation happens just in your single method and there are not other applications using your database you can simply place the code into critical section (.NET synchronization for example with
lock
) and ensure on the .NET side that only single thread can access critical section. That is not so reliable solution but any playing with locks and transaction levels has a big impact on the database performance and throughput. You can combine this approach with optimistic concurrency (unique constraints, timestamps, etc).只是补充一下我的方式,并不是说它真正处理抛出异常和事务的烦恼,并不能完全将其作为可扩展的解决方案,但它确实避免了竞争条件导致锁类型解决方案不可能(易于管理)的问题,例如在分布式系统中。
我非常简单地使用异常并首先尝试插入。我使用对原始代码的修改作为示例:
它并不漂亮,但它有效并且可能对某人有用。
Just to add my way, not that it really deals with the annoyance of exceptions being thrown and transactions not quite cutting it as a scalable solution but it does avoid race conditions from causing problems where lock type solutions are not possible (easily managed) such as in distributed systems.
I very simply use the exception and try the insert first. I use a modification of your original code as an example:
It's not pretty, but it works and might be of use to someone.
也许我遗漏了一些东西,但是当我在 SQL Management Studio 中模拟上面的示例时,这按预期工作。
两个可序列化事务都会检查 userId 是否存在,并获取指定选择的范围锁。
假设此 userId 不存在,两个事务都会尝试使用该 userId 插入新记录 - 这是不可能的。由于其可序列化隔离级别,两个事务都无法将新记录插入到用户表中,因为这会为另一个事务引入幻读。
所以这种情况会因为范围锁而导致死锁。您最终会陷入僵局,一笔交易将受到影响,另一笔交易将成功。
实体框架处理这个问题有什么不同吗?我怀疑您最终会得到一个
UpdateException
和一个嵌套的SqlException
来标识死锁。Maybe I am missing something, but when I simulate the example above in the SQL Management Studio, this is working as expected.
Both Serializable transactions check if the userId exists, and acquire range locks on the specified selection.
Assuming this userId does not exist, both transactions try to insert a new record with the userId - which is not possible. Because of their Serializable isolation level, both transactions cannot insert a new record into the users table because this would introduce phantom reads for the other transaction.
So this situation results in a deadlock because of the range locks. You'll end up with an deadlock and one transaction shall be victimized, the other one will succeed.
Is Entity Framework handling this any different? I suspect you'd end up with an
UpdateException
with a nestedSqlException
identifying the deadlock.您可以使用 TransactionOptions for TransactionScope 将事务隔离级别更改为更严格(我猜,对于您的情况,它是 RepeatableRead 或 Serialized),但请记住,任何锁定都会降低可伸缩性。
提供这种级别的并发控制真的很重要吗?您的应用程序会在生产环境中用于相同的情况吗? 这是 Udi Dahan 关于竞赛条件的好帖子 。
You can change transaction isolation level using TransactionOptions for TransactionScope to more stricter (I guess, for your case it is RepeatableRead or Serializable), but remember that any locks decrease scalability.
Does it really matter to provide such level of concurrency control? Will your application be used in same cases in production environment? Here is good post by Udi Dahan about race conditions.