了解InnodB僵局日志

发布于 2025-02-02 14:58:47 字数 5023 浏览 3 评论 0原文

我有一个正在执行多个插入查询的事务。同时,可能发生了一个正在执行由另一个用户运行的数据库执行一致性检查的作业,该数据库正在通过在数据库上发出 select 查询来进行复制检查计算。

问题在于,有时我的应用程序级交易僵局与正在执行这些检查的工具的交易有关:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-05-19 18:25:20 0x7f6eb63e3700
*** (1) TRANSACTION:
TRANSACTION 421588457956552, ACTIVE 0 sec fetching rows
mysql tables in use 2, locked 2
LOCK WAIT 1012 lock struct(s), heap size 123088, 85725 row lock(s)
MySQL thread id 15333390, OS thread handle 140086616594176, query id 2884722462 10.96.7.108 replication_checksum_user Sending data
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'bar', 'bars', '17', 'foo+idx', '688438b1-63b7-4cdd-ba14-5ac2811fce08,688438b1-63b7-4cdd-ba14-5ac2811fce08,6149061644177471', '6f2aecfe-44b8-4b04-8913-5086c5402c02,6f2aecfe-44b8-4b04-8913-5086c5402c02,4356115808993199', COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(CRC32(CONCAT_WS('#', convert(`field1` using utf8mb4), convert(`field2` using utf8mb4), convert(`field3` using utf8mb4), convert(`field4` using utf8mb4), `field5`, `field6`, `created_time`, `finished_time`, CONCAT(ISNULL(`notification_id`), ISNULL(`attempt_type`), ISNULL(`successful`), ISNULL(`finished_time`)))), 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0')))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 421588457956552 lock mode S waiting
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** (2) TRANSACTION:
TRANSACTION 677538783, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 3
MySQL thread id 15333007, OS thread handle 140113480660736, query id 2884722486 10.96.7.206 app_user Update
INSERT into bars(field1, field2, field3, created_time) VALUES('6bca9464-f75d-4141-b21e-0a0b9a7d06ca', '1656221208545861', 272.97000, CURRENT_TIMESTAMP())
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 677538783 lock_mode X locks rec but not gap
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 677538783 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** WE ROLL BACK TRANSACTION (2)

我很难理解该日志。两项交易正在奋斗的共同资源是什么?是插入该特定行的索引记录(ID = 6BCA9464-F75D-4141-B21E-0A0B9A7D06CA)?

据我了解和阅读,第一笔交易是试图获取S锁定(用于阅读),第二笔交易可容纳插入物的X锁(独家锁)。但是,第二笔交易正在等待第一次交易所持的是什么呢?对我来说,第二笔交易似乎正在等待它所保持自己的锁 - 这是错误的吗?

另外,两项交易是否有可能在他们没有持有的锁上等待?

差距和差距部分意味着什么?

解决此类问题的首选方法是什么?

编辑:这是架构

CREATE TABLE `bars` (
    `field1` VARCHAR(50) NOT NULL,
    `field2` VARCHAR(50) NOT NULL,
    `field3` VARCHAR(50) NULL DEFAULT NULL,
    `field4` ENUM('WHOLE','PARTIAL') NOT NULL,
    `field5` TINYINT(4) NULL DEFAULT NULL,
    `field6` DECIMAL(16,2) NOT NULL,
    `created_time` DATETIME NOT NULL,
    `finished_time` DATETIME NULL DEFAULT NULL,
    UNIQUE INDEX `foo_idx` (`field1`, `field2`),
    CONSTRAINT `f1_idx` FOREIGN KEY (`field1`) REFERENCES `bays` (`field1`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;

,然后是两个迁移

ALTER TABLE `bars`
ADD FOREIGN KEY `ff_fk` (`field1`, `field2`)
REFERENCES `fields_references` (`field1`, `field2`);
ALTER TABLE bars
DROP CONSTRAINT ff_fk;

I have a transaction which is performing multiple INSERT queries. At the same time, there might occur a job that is performing consistency check on the database run by another user, which is doing replication checksum calculations by issuing SELECT queries over the database.

The problem is that sometimes my app-level transaction deadlocks with the transaction from the tool which is performing these checks:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-05-19 18:25:20 0x7f6eb63e3700
*** (1) TRANSACTION:
TRANSACTION 421588457956552, ACTIVE 0 sec fetching rows
mysql tables in use 2, locked 2
LOCK WAIT 1012 lock struct(s), heap size 123088, 85725 row lock(s)
MySQL thread id 15333390, OS thread handle 140086616594176, query id 2884722462 10.96.7.108 replication_checksum_user Sending data
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'bar', 'bars', '17', 'foo+idx', '688438b1-63b7-4cdd-ba14-5ac2811fce08,688438b1-63b7-4cdd-ba14-5ac2811fce08,6149061644177471', '6f2aecfe-44b8-4b04-8913-5086c5402c02,6f2aecfe-44b8-4b04-8913-5086c5402c02,4356115808993199', COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(CRC32(CONCAT_WS('#', convert(`field1` using utf8mb4), convert(`field2` using utf8mb4), convert(`field3` using utf8mb4), convert(`field4` using utf8mb4), `field5`, `field6`, `created_time`, `finished_time`, CONCAT(ISNULL(`notification_id`), ISNULL(`attempt_type`), ISNULL(`successful`), ISNULL(`finished_time`)))), 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0')))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 421588457956552 lock mode S waiting
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** (2) TRANSACTION:
TRANSACTION 677538783, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 3
MySQL thread id 15333007, OS thread handle 140113480660736, query id 2884722486 10.96.7.206 app_user Update
INSERT into bars(field1, field2, field3, created_time) VALUES('6bca9464-f75d-4141-b21e-0a0b9a7d06ca', '1656221208545861', 272.97000, CURRENT_TIMESTAMP())
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 677538783 lock_mode X locks rec but not gap
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 677538783 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** WE ROLL BACK TRANSACTION (2)

I'm having difficulties in understanding that log. What is the common resource that the two transactions are fighting over? Is it the index record for that particular row that is being inserted (with id = 6bca9464-f75d-4141-b21e-0a0b9a7d06ca)?

From what I understand and read about, the first transaction is trying to acquire S lock (which is for reading), the second transaction holds an X lock (exclusive lock) for the insert. But what is that the second transaction is waiting for that the first transaction is holding? To me it seems that the second transaction is waiting on a lock that it is holding itself - is this wrong?

Also, is it possible that the two transactions are waiting on a lock that is not held by them?

What does the gap and no gap part means?

What is the preferred way of solving this type of issue?

Edit: This is the schema

CREATE TABLE `bars` (
    `field1` VARCHAR(50) NOT NULL,
    `field2` VARCHAR(50) NOT NULL,
    `field3` VARCHAR(50) NULL DEFAULT NULL,
    `field4` ENUM('WHOLE','PARTIAL') NOT NULL,
    `field5` TINYINT(4) NULL DEFAULT NULL,
    `field6` DECIMAL(16,2) NOT NULL,
    `created_time` DATETIME NOT NULL,
    `finished_time` DATETIME NULL DEFAULT NULL,
    UNIQUE INDEX `foo_idx` (`field1`, `field2`),
    CONSTRAINT `f1_idx` FOREIGN KEY (`field1`) REFERENCES `bays` (`field1`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;

and then two migrations afterwards

ALTER TABLE `bars`
ADD FOREIGN KEY `ff_fk` (`field1`, `field2`)
REFERENCES `fields_references` (`field1`, `field2`);
ALTER TABLE bars
DROP CONSTRAINT ff_fk;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

树深时见影 2025-02-09 14:58:47

该争论是在PT-table-Checksum和您的应用程序运行的插入物之间进行的替换。

您问锁定选择。默认情况下,普通选择是无锁的,但是如果选择是锁定语句的一部分,则隐含地成为锁定选择。也就是说,由于PT-Table-Checksum运行替换为校验和校验和bars选择,因此它必须在bars中的某些行上获取s锁定。

在将数据写入行或变量的语句中使用的任何选择都是如此。

  • 创建表...选择...
  • 插入... select ...
  • set ... set @variable =(select ...)
  • select ... select ... in @variable
  • 在更新或删除语句中使用
  • select in trigger中的

sepqueries中的选择将应用程序的交易隔离级别更改为读取可以帮助避免差距锁。但是,也许PT-table-Checksum具有自己的交易隔离级别,无论如何都会创建差距锁。

最重要的是发生了僵局。您不能全部消除它们,当并发会议完成锁定时,它们是自然的结果。因此,您应该设计代码以捕获异常并根据需要重试。


回复您的评论:

交易1不一定会持有另一个锁。以下操作顺序可能导致死锁:

  1. 事务2(您的应用程序)在bars中的一行中获取了一些锁。

  2. 事务1(PT-table-Checksum)需要在一批多行上进行锁。交易2至少已经锁定了一个,所以ptable-checksum等待。

  3. 事务2想要在另一行上锁定它以前没有锁,但是该行是PT-table-checksum试图锁定的批次的一部分。

The contention is between the REPLACE run by pt-table-checksum and the INSERT run by your app.

You asked about locking SELECT. An ordinary SELECTs is lock-free by default, but if a SELECT is part of a locking statement, it implicitly becomes a locking SELECT. That is, because pt-table-checksum runs REPLACE INTO checksums ... SELECT FROM bars, it must acquire an S lock on some rows in bars.

The same is true of any SELECT that is used in a statement that writes data to a row or a variable.

  • CREATE TABLE ... SELECT ...
  • INSERT INTO ... SELECT ...
  • SET @variable = (SELECT ...)
  • SELECT ... INTO @variable
  • Using SELECT in subqueries in an UPDATE or DELETE statement
  • Using SELECT in a trigger

Changing your app's transaction isolation level to READ-COMMITTED could help avoid the gap locks. But maybe pt-table-checksum has its own transaction isolation level that will create gap locks anyway.

The bottom line is that deadlocks happen. You can't eliminate them all, they're a natural outcome when locking is done by concurrent sessions. So you should design code to catch exceptions and retry as needed.


Re your comment:

Transaction 1 doesn't necessarily hold another lock. The following sequence of actions could result in a deadlock:

  1. Transaction 2 (your app) acquires some lock on a single row in bars.

  2. Transaction 1 (pt-table-checksum) needs locks on a batch of multiple rows. At least one is already locked by transaction 2, so pt-table-checksum waits.

  3. Transaction 2 wants a lock on another row it didn't have a lock on before, but the row is part of the batch that pt-table-checksum is trying to lock.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文