我有一个系统,它有一个用于与外部系统连接的复杂主键,以及一个供内部使用的快速、小型不透明主键。例如:外部键可能是一个复合值 - 例如(名字 (varchar)、姓氏 (varchar)、邮政编码 (char)),而内部键可能是一个整数(“客户 ID”)。
当我收到带有外部密钥的传入请求时,我需要查找内部密钥 - 这是棘手的部分 - 如果我还没有给定外部密钥,则分配一个新内部密钥ID。
显然,如果我一次只有一个客户端与数据库通信,那就没问题。 从客户中选择 customer_id WHERE给定_name = 'foo' AND ...
,然后插入客户值(...)
(如果我找不到值)。但是,如果可能有许多请求同时从外部系统传入,并且许多请求可能同时到达以前闻所未闻的客户,那么就会出现竞争情况,多个客户端可能会尝试INSERT
新行。
如果我要修改现有行,那就很容易了;只需先SELECT FOR UPDATE
,在执行UPDATE
之前获取适当的行级锁。但在这种情况下,我没有可以锁定的行,因为该行还不存在!
到目前为止,我已经提出了几个解决方案,但每个解决方案都有一些非常重要的问题:
- 捕获
INSERT
上的错误,从顶部重新尝试整个事务。如果交易涉及十几个客户,特别是如果传入数据每次可能以不同的顺序谈论相同的客户,那么这就是一个问题。有可能陷入相互递归的死锁循环,其中冲突每次都发生在不同的客户身上。您可以通过重试尝试之间的指数等待时间来缓解这种情况,但这是一种处理冲突的缓慢且昂贵的方法。此外,这使应用程序代码变得相当复杂,因为所有内容都需要可重新启动。
- 使用保存点。在
SELECT
之前启动一个保存点,捕获 INSERT
上的错误,然后回滚到该保存点并再次 SELECT
。保存点并不完全可移植,并且它们的语义和功能在数据库之间略有不同;我注意到的最大区别是,有时它们似乎筑巢,有时则不然,所以如果我能避开它们就好了。但这只是一个模糊的印象——难道不准确吗?保存点是否标准化,或者至少实际上一致?此外,保存点使得在同一个事务上并行执行操作变得困难,因为您可能无法确切地知道将回滚多少工作,尽管我意识到我可能只需要忍受这一点。
- 获取一些全局锁,例如使用 LOCK 语句的表级锁 (oracle mysql postgres)。这显然会减慢这些操作的速度并导致大量的锁争用,所以我宁愿避免它。
- 获取更细粒度但特定于数据库的锁。我只熟悉 Postgres 的做法,这在其他数据库中绝对不支持(这些函数甚至以“
pg_
”开头),所以这又是一个可移植性问题。另外,postgres 的做法需要我以某种方式将密钥转换为一对整数,这可能不太适合。有没有更好的方法来获取假设对象的锁?
在我看来,这肯定是数据库的一个常见并发问题,但我还没有找到很多关于它的资源;可能只是因为我不知道规范的措辞。是否可以在任何标记的数据库中使用一些简单的额外语法来做到这一点?
I have a system which has a complex primary key for interfacing with external systems, and a fast, small opaque primary key for internal use. For example: the external key might be a compound value - something like (given name (varchar), family name (varchar), zip code (char)) and the internal key would be an integer ("customer ID").
When I receive an incoming request with the external key, I need to look up the internal key - and here's the tricky part - allocate a new internal key if I don't already have one for the given external ID.
Obviously if I have only one client talking to the database at a time, this is fine. SELECT customer_id FROM customers WHERE given_name = 'foo' AND ...
, then INSERT INTO customers VALUES (...)
if I don't find a value. But, if there are potentially many requests coming in from external systems concurrently, and many may arrive for a previously unheard-of customer all at once, there is a race condition where multiple clients may try to INSERT
the new row.
If I were modifying an existing row, that would be easy; simply SELECT FOR UPDATE
first, to acquire the appropriate row-level lock, before doing an UPDATE
. But in this case, I don't have a row that I can lock, because the row doesn't exist yet!
I've come up with several solutions so far, but each of them has some pretty significant issues:
- Catch the error on
INSERT
, re-try the entire transaction from the top. This is a problem if the transaction involves a dozen customers, especially if the incoming data is potentially talking about the same customers in a different order each time. It's possible to get stuck in mutually recursive deadlock loops, where the conflict occurs on a different customer each time. You can mitigate this with an exponential wait time between re-try attempts, but this is a slow and expensive way to deal with conflicts. Also, this complicates the application code quite a bit as everything needs to be restartable.
- Use savepoints. Start a savepoint before the
SELECT
, catch the error on INSERT
, and then roll back to the savepoint and SELECT
again. Savepoints aren't completely portable, and their semantics and capabilities differ slightly and subtly between databases; the biggest difference I've noticed is that, sometimes they seem to nest and sometimes they don't, so it would be nice if I could avoid them. This is only a vague impression though - is it inaccurate? Are savepoints standardized, or at least practically consistent? Also, savepoints make it difficult to do things in parallel on the same transaction, because you might not be able to tell exactly how much work you'll be rolling back, although I realize I might just need to live with that.
- Acquire some global lock, like a table-level lock using a LOCK statement (oracle mysql postgres). This obviously slows down these operations and results in a lot of lock contention, so I'd prefer to avoid it.
- Acquire a more fine-grained, but database-specific lock. I'm only familiar with Postgres's way of doing this, which is very definitely not supported in other databases (the functions even start with "
pg_
") so again it's a portability issue. Also, postgres's way of doing this would require me to convert the key into a pair of integers somehow, which it may not neatly fit into. Is there a nicer way to acquire locks for hypothetical objects?
It seems to me that this has got to be a common concurrency problem with databases but I haven't managed to find a lot of resources on it; possibly just because I don't know the canonical phrasing. Is it possible to do this with some simple extra bit of syntax, in any of the tagged databases?
发布评论
评论(3)
我不清楚为什么不能使用 INSERT IGNORE,它将运行而不会出错,并且您可以检查是否发生插入(修改的记录)。如果插入“失败”,那么您就知道该键已经存在并且可以执行 SELECT。您可以先执行 INSERT,然后执行 SELECT。
或者,如果您使用 MySQL,请使用支持事务的 InnoDB。这将使回滚变得更容易。
I'm not clear on why you can't use INSERT IGNORE, which will run without error and you can check if an insert occurred (modified records). If the insert "fails", then you know the key already exists and you can do a SELECT. You could do the INSERT first, then the SELECT.
Alternatively, if you are using MySQL, use InnoDB which supports transactions. That would make it easier to rollback.
在主要的多客户事务之前和之外,以自动提交模式执行每个客户的“查找或创建”操作。
Perform each customer's "lookup or maybe create" operations in autocommit mode, prior to and outside of the main, multi-customer transaction.
WRT 生成不透明主键,有许多选项,例如,使用 guid 或(至少对于 Oracle)序列表。 WRT 确保外部键是唯一的,对列应用唯一约束。如果由于键存在而导致插入失败,请重新尝试提取。您可以使用 where not exit 或 where not in 的插入。使用存储过程可以减少往返次数并提高性能。
WRT generating an opaque primary key, there are a number of options, eg., use a guid or (at least, with Oracle) a sequence table. WRT insuring the external key is unique, apply unique constraint on the column. If the insert fails because the key exists, reattempt the fetch. You can use an insert with where not exist or where not in. Use a stored procedure to reduce the round trips and improve performance.