select ... for update 引起的mysql死锁

发布于 2022-09-11 19:00:18 字数 7553 浏览 18 评论 0

描述

表:

idabc
1a001bbb0
2b002bbb0

id 列主键, a,b 列非主键索引. 业务目标是要保证能安全的并发修改 c 列.

主要代码:

@Transactional
public void foo(){
    // 加锁查询
    SELECT * FROM 表 WHERE a=? AND b=? FOR UPDATE // <1>
    // 更新数据
    UPDATE 表 SET c=1 WHERE a=? AND b=?;    // <2>
}

环境:
阿里云的rds, mysql 5.6.16.
隔离级别: READ-COMMITTED

这条线, 之前一直没有爆出问题. 但在昨天, 我离线测试数据时, 一次批量请求几万服务, 同时, 有一个其他用户也在请求我的服务. 这样就爆出了Deadlock问题.

错误信息:

clipboard.png

我的自查

先假设 甲 和 乙 并发.
甲的目标时修改a="a001" and b="bbb"行数据, 乙的目标时修改a="a002" and b="bbb"的数据.
当初, 按照我天真的设想, 甲 操作时, 只会对 1 行加锁, 乙操作时只会对 2 行加锁.
然后两者互不干扰. 而甲或乙自己多线程操作的话, 线程1加锁之后, 会阻塞其他线程, 直到线程1 update 完成数据提交之后, 其他线程才能继续读写. 所以, 综合看, 预期foo()并发时安全的, 也不会发生死锁.

出问题后, 我explain了下, 发现<1>和<2>的SQL, 走的是 b 列索引, 并不是 a 列索引(可能是真实数据中 b 列更短的原因). 这样的话, 甲或乙在加锁查时, 会对1,2两行数据都加锁(因为 b 列值一样). 这样甲操作时会阻塞乙. 但是,, 这样依然不会产生死锁呀?!

本地测试时, 这个bug并没有重现.
本地测试环境: mysql Ver 14.14 Distrib 5.7.22

====
更新
====
本地测试环境的mysql, 将隔离级别, 调成read commited(和线上一致了), bug重现了.

看来我的代码逻辑发生死锁和隔离级别关系很大, 但不知道是什么原理.

show engine innodb status(和上面表无关, 这是实际测试时的表, 上面表例子时简化模型):

| InnoDB |      |
=====================================
2019-04-03 11:52:39 2a94 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1049 srv_active, 0 srv_shutdown, 64204 srv_idle
srv_master_thread log flush and writes: 65253
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 149
OS WAIT ARRAY INFO: signal count 148
Mutex spin waits 5772, rounds 13466, OS waits 86
RW-shared spins 57, rounds 1669, OS waits 54
RW-excl spins 0, rounds 240, OS waits 8
Spin rounds per wait: 2.33 mutex, 29.28 RW-shared, 240.00 RW-excl
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-04-03 11:40:44 1fe8
*** (1) TRANSACTION:
TRANSACTION 975024, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1184, 3 row lock(s)
MySQL thread id 135, OS thread handle 0x2f0c, query id 224407 10.209.5.7 root Sending data
SELECT * FROM anti_fraud_service.afs_appkey_api WHERE app_key='AK527165764218671104' AND api_id='xxx' FOR UPDATE  // 甲的SQL
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 282 page no 5 n bits 88 index `ix_apiid` of table `anti_fraud_service`.`afs_appkey_api` trx id 975024 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 3; hex 787878; asc xxx;;
 1: len 16; hex 36333963393938333064393533616435; asc 639c99830d953ad5;;  // 乙 目标操作行的id

*** (2) TRANSACTION:
TRANSACTION 975006, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1184, 3 row lock(s)
MySQL thread id 141, OS thread handle 0x1fe8, query id 224416 10.209.5.7 root updating
UPDATE anti_fraud_service.afs_appkey_api SET used_num=2981 WHERE app_key='AK531894660444999680' AND api_id='xxx'  // 乙的SQL
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 282 page no 5 n bits 88 index `ix_apiid` of table `anti_fraud_service`.`afs_appkey_api` trx id 975006 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 3; hex 787878; asc xxx;;
 1: len 16; hex 36333963393938333064393533616435; asc 639c99830d953ad5;;  // 甲 目标操作行的id

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 282 page no 5 n bits 88 index `ix_apiid` of table `anti_fraud_service`.`afs_appkey_api` trx id 975006 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 3; hex 787878; asc xxx;;
 1: len 16; hex 35626531633535636236336534356365; asc 5be1c55cb63e45ce;;

*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 975035
Purge done for trx's n:o < 975032 undo n:o < 0 state: running but idle
History list length 3253
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 965882, not started
MySQL thread id 121, OS thread handle 0x219c, query id 188600 10.209.5.7 root
---TRANSACTION 965881, not started
MySQL thread id 120, OS thread handle 0x2a94, query id 224438 10.209.5.7 root init
show engine innodb status
---TRANSACTION 975034, not started
MySQL thread id 108, OS thread handle 0x1620, query id 224437 10.209.5.7 root
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
486 OS file reads, 31622 OS file writes, 29092 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1970, seg size 1972, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 276707, node heap has 2 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 2433246968
Log flushed up to   2433246968
Pages flushed up to 2433246968
Last checkpoint at  2433246968
0 pending log writes, 0 pending chkp writes
28933 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 137428992; in additional pool allocated 0
Dictionary memory allocated 158568
Buffer pool size   8192
Free buffers       7615
Database pages     575
Old database pages 212
Modified db pages  0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 458, created 117, written 2611
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 575, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Main thread id 6316, state: sleeping
Number of rows inserted 1, updated 28523, deleted 0, read 172589
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

另外, 发现一个奇怪的现象: navicat里开启两个命令行窗口, for update 会阻塞另一个有交叉数据行的for update. 但是在java代码里测试, 没有成功阻塞. 这就导致后面的update语句会并行, 而不是预期的串行.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陈年往事 2022-09-18 19:00:18

虽然不是很清楚你的问题的原因, 但有一个建议: 建议先按条件查到主键 id, 再用主键 id SELECT...FOR UPDATE, UPDATE ...; 虽然前面多一次查询,但因为只查 id 走索引无需回表, 后两次都是走主键, 出现问题也更好查一些.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文