如何调试 MySQL 上的锁等待超时?
在我的生产错误日志中,我偶尔会看到:
SQLSTATE[HY000]:一般错误:1205 超过锁等待超时;尝试 重新启动交易
我知道哪个查询当时正在尝试访问数据库,但是有没有办法找出哪个查询在那个精确时刻拥有锁定?
In my production error logs I occasionally see:
SQLSTATE[HY000]: General error: 1205
Lock wait timeout exceeded; try
restarting transaction
I know which query is trying to access the database at that moment but is there a way to find out which query had the lock at that precise moment?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
暴露这一点的是交易这个词。从该语句可以明显看出,该查询试图更改一个或多个 InnoDB 表中的至少一行。
由于您知道该查询,因此所有正在访问的表都是罪魁祸首的候选表。
从那里,您应该能够运行 SHOW ENGINE INNODB STATUS\G
您应该能够看到受影响的表
您可以获得各种附加锁定和互斥信息。
这是我的一位客户的示例:
您应该考虑通过设置 innodb_lock_wait_timeout,默认为 50 秒
您可以将其设置为更高的值
/etc/my.cnf
永久保留此行并重新启动 mysql。如果此时无法重新启动 mysql,请运行以下命令:
您也可以将其设置为会话持续时间,
然后执行查询
What gives this away is the word transaction. It is evident by the statement that the query was attempting to change at least one row in one or more InnoDB tables.
Since you know the query, all the tables being accessed are candidates for being the culprit.
From there, you should be able to run
SHOW ENGINE INNODB STATUS\G
You should be able to see the affected table(s)
You get all kinds of additional Locking and Mutex Information.
Here is a sample from one of my clients:
You should consider increasing the lock wait timeout value for InnoDB by setting the innodb_lock_wait_timeout, default is 50 sec
You can set it to higher value in
/etc/my.cnf
permanently with this lineand restart mysql. If you cannot restart mysql at this time, run this:
You could also just set it for the duration of your session
followed by your query
正如有人在有关此问题的许多 SO 线程之一中提到的:有时锁定表的进程在进程列表中显示为休眠!我一直在抓狂,直到我杀死了相关数据库中打开的所有睡眠线程(当时没有一个是活动的)。最终解锁了表并让更新查询运行。
评论者说了类似于“有时 MySQL 线程会锁定一个表,然后在等待与 MySQL 无关的事情发生时休眠”。
重新查看
show engine innodb status
日志(一旦我追踪到负责锁定的客户端),我注意到有问题的卡住线程列在事务的最底部列表,位于由于冻结锁而即将出错的活动查询下方:(不确定“Trx read view”消息是否与冻结锁相关,但与其他活动事务不同,此事务不会随发出的查询并声称该交易是“清理”,但具有多个行锁)
这个故事的寓意是,即使线程正在休眠,事务也可以处于活动状态。
As someone mentioned in one of the many SO threads concerning this problem: Sometimes the process that has locked the table shows up as sleeping in the processlist! I was tearing my hair out until I killed all the sleeping threads that were open in the database in question (none were active at the time). That finally unlocked the table and let the update query run.
The commenter said something akin to "Sometimes a MySQL thread locks a table, then sleeps while it waits for something non-MySQL-related to happen."
After re-re-reviewing the
show engine innodb status
log (once I'd tracked down the client responsible for the lock), I noticed the stuck thread in question was listed at the very bottom of the transaction list, beneath the active queries that were about to error out because of the frozen lock:(unsure if the "Trx read view" message is related to the frozen lock, but unlike the other active transactions, this one does not show up with the query that was issued and instead claims the transaction is "cleaning up," yet has multiple row locks)
The moral of the story is that a transaction can be active even though the thread is sleeping.
争用越多,发生死锁的可能性就越大,数据库引擎将通过使死锁事务之一超时来解决这一问题。
此外,修改(例如
UPDATE
或DELETE
)大量条目的长时间运行的事务更有可能与其他事务产生冲突。尽管是 InnoDB MVCC,您仍然可以使用
FOR UPDATE
子句请求显式锁定。然而,与其他流行的数据库(Oracle、MSSQL、PostgreSQL、DB2)不同,MySQL 使用 REPEATABLE_READ 作为默认隔离级别。现在,您获取的锁(通过修改行或使用显式锁定)将在当前运行的事务期间保留。如果您想很好地解释
REPEATABLE_READ
和READ COMMITTED
在锁定方面的区别,请阅读此 Percona文章。因此:隔离级别(
REPEATABLE_READ
、SERIALIZABLE
)限制越多,发生死锁的可能性就越大。这本身不是一个问题,而是一个权衡。使用 READ_COMMITTED 可以获得非常好的结果,因为在使用跨越多个 HTTP 请求的逻辑事务时,您需要应用程序级丢失更新预防。乐观锁定方法的目标是即使您使用
SERIALIZABLE
隔离级别也可能发生丢失的更新,同时通过允许您使用READ_COMMITTED
来减少锁争用。The more contention you have, the greater the chance of deadlocks, which a DB engine will resolve by time-outing one of the deadlocked transactions.
Also, long-running transactions that have modified (e.g.
UPDATE
orDELETE
) a large number of entries are more likely to generate conflicts with other transactions.Although InnoDB MVCC, you can still request explicit locks using the
FOR UPDATE
clause. However, unlike other popular DBs (Oracle, MSSQL, PostgreSQL, DB2), MySQL usesREPEATABLE_READ
as the default isolation level.Now, the locks that you acquired (either by modifying rows or using explicit locking) are held for the duration of the currently running transaction. If you want a good explanation of the difference between
REPEATABLE_READ
andREAD COMMITTED
in regards to locking, please read this Percona article.Therefore: The more restrictive the isolation level (
REPEATABLE_READ
,SERIALIZABLE
) the greater the chance of deadlock. This is not an issue "per se", it's a trade-off.You can get very good results with
READ_COMMITTED
, as you need application-level lost update prevention when using logical transactions that span over multiple HTTP requests. The optimistic locking approach targets lost updates that might happen even if you use theSERIALIZABLE
isolation level while reducing the lock contention by allowing you to useREAD_COMMITTED
.根据记录,当存在死锁并且 MySQL 无法检测到它时,也会发生锁等待超时异常,因此它只是超时。另一个原因可能是运行时间极长的查询,但是它更容易解决/修复,我不会在这里描述这种情况。
如果死锁在两个事务中“正确”构造,MySQL 通常能够处理死锁。然后,MySQL 将终止/回滚拥有较少锁的一个事务(不太重要,因为它将影响较少的行)并让另一个事务完成。
现在,假设有两个进程 A 和 B 以及 3 个事务:
这是一个非常不幸的设置,因为 MySQL 无法发现存在死锁(跨越 3 个事务)。所以 MySQL 所做的就是……什么都没有!它只是等待,因为它不知道该怎么做。它会等待,直到第一个获取的锁超过超时(处理 A 事务 1:锁 X),然后这将解锁锁 X,从而解锁事务 2 等。
技巧是找出导致第一个锁的原因(哪个查询)(锁定X)。您将能够轻松地看到(
show engine innodb status
)事务 3 正在等待事务 2,但您不会看到事务 2 正在等待哪个事务(事务 1)。 MySQL 不会打印与事务 1 相关的任何锁或查询。唯一的提示是在事务列表的最底部(在show engine innodb status
打印输出中),您将明显看到事务 1什么都不做(但实际上是在等待事务 3 完成)。此处描述了如何查找哪个 SQL 查询导致为正在等待的给定事务授予锁 (Lock X) 的技术
跟踪长时间运行事务中的 MySQL 查询历史
如果您想知道进程和事务是什么正是在示例中。该进程是一个PHP进程。事务是innodb-trx-table定义的事务。就我而言,我有两个 PHP 进程,在每个进程中我手动启动一个事务。有趣的是,即使我在一个进程中启动了一个事务,MySQL 实际上在内部使用了两个单独的事务(我不知道为什么,也许一些 MySQL 开发人员可以解释)。
MySQL 在内部管理自己的事务,并决定(在我的例子中)使用两个事务来处理来自 PHP 进程(进程 A)的所有 SQL 请求。事务 1 正在等待事务 3 完成的语句是 MySQL 内部的事情。 MySQL“知道”事务 1 和事务 3 实际上是作为一个“事务”请求(来自进程 A)的一部分进行实例化的。现在整个“交易”被阻止,因为交易 3(“交易”的子部分)被阻止。因为“交易”无法完成交易 1(也是“交易”的子部分)也被标记为未完成。这就是我所说的“事务 1 等待事务 3 完成”的意思。
For the record, the lock wait timeout exception happens also when there is a deadlock and MySQL cannot detect it, so it just times out. Another reason might be an extremely long running query, which is easier to solve/repair, however, and I will not describe this case here.
MySQL is usually able to deal with deadlocks if they are constructed "properly" within two transactions. MySQL then just kills/rollback the one transaction that owns fewer locks (is less important as it will impact less rows) and lets the other one finish.
Now, let's suppose there are two processes A and B and 3 transactions:
This is a very unfortunate setup because MySQL cannot see there is a deadlock (spanned within 3 transactions). So what MySQL does is ... nothing! It just waits, since it does not know what to do. It waits until the first acquired lock exceeds the timeout (Process A Transaction 1: Locks X), then this will unblock the Lock X, which unlocks Transaction 2 etc.
The art is to find out what (which query) causes the first lock (Lock X). You will be able to see easily (
show engine innodb status
) that Transaction 3 waits for Transaction 2, but you will not see which transaction Transaction 2 is waiting for (Transaction 1). MySQL will not print any locks or query associated with Transaction 1. The only hint will be that at the very bottom of the transaction list (of theshow engine innodb status
printout), you will see Transaction 1 apparently doing nothing (but in fact waiting for Transaction 3 to finish).The technique for how to find which SQL query causes the lock (Lock X) to be granted for a given transaction that is waiting is described here
Tracking MySQL query history in long running transactions
If you are wondering what the process and the transaction is exactly in the example. The process is a PHP process. Transaction is a transaction as defined by innodb-trx-table. In my case, I had two PHP processes, in each I started a transaction manually. The interesting part was that even though I started one transaction in a process, MySQL used internally in fact two separate transactions (I don't have a clue why, maybe some MySQL dev can explain).
MySQL is managing its own transactions internally and decided (in my case) to use two transactions to handle all the SQL requests coming from the PHP process (Process A). The statement that Transaction 1 is waiting for Transaction 3 to finish is an internal MySQL thing. MySQL "knew" the Transaction 1 and Transaction 3 were actually instantiated as part of one "transaction" request (from Process A). Now the whole "transaction" was blocked because Transaction 3 (a subpart of "transaction") was blocked. Because "transaction" was not able to finish the Transaction 1 (also a subpart of the "transaction") was marked as not finished as well. This is what I meant by "Transaction 1 waits for Transaction 3 to finish".
这个异常的一个大问题是,它通常无法在测试环境中重现,并且当它发生在生产环境中时,我们无法运行 innodb 引擎状态。因此,在其中一个项目中,我将以下代码放入此异常的 catch 块中。这帮助我在异常发生时捕获引擎状态。这很有帮助。
The big problem with this exception is that its usually not reproducible in a test environment and we are not around to run innodb engine status when it happens on prod. So in one of the projects I put the below code into a catch block for this exception. That helped me catch the engine status when the exception happened. That helped a lot.
这是我最终必须做的事情,以找出导致锁定超时问题的“其他查询”。在应用程序代码中,我们在专用于该任务的单独线程上跟踪所有挂起的数据库调用。如果任何数据库调用花费的时间超过 N 秒(对我们来说是 30 秒),我们会记录:
通过上述操作,我们能够查明锁定导致死锁的行的并发查询。就我而言,它们是像
INSERT ... SELECT
这样的语句,与普通的 SELECT 不同,它会锁定底层行。然后,您可以重新组织代码或使用不同的事务隔离,例如未提交读。祝你好运!
Here is what I ultimately had to do to figure out what "other query" caused the lock timeout problem. In the application code, we track all pending database calls on a separate thread dedicated to this task. If any DB call takes longer than N-seconds (for us it's 30 seconds) we log:
With above, we were able to pinpoint concurrent queries that locked the rows causing the deadlock. In my case, they were statements like
INSERT ... SELECT
which unlike plain SELECTs lock the underlying rows. You can then reorganize the code or use a different transaction isolation like read uncommitted.Good luck!
查看
pt 的手册页-deadlock-logger
实用程序:它从上面提到的
engine innodb status
中提取信息,并且还它可用于创建每 30 秒运行一次的
守护进程
。Take a look at the man page of the
pt-deadlock-logger
utility:It extracts information from the
engine innodb status
mentioned above and alsoit can be used to create a
daemon
which runs every 30 seconds.您可以使用:
它将列出 MySQL 中的所有连接以及连接的当前状态以及正在执行的查询。还有一个较短的变体
show processlist;
,它显示截断的查询以及连接统计信息。You can use:
which will list all the connections in MySQL and the current state of connection as well as the query being executed. There's also a shorter variant
show processlist;
which displays the truncated query as well as the connection stats.从 Rolando 上面的答案推断,正是这些阻塞了您的查询:
如果您需要执行查询并且不能等待其他查询运行,请使用 MySQL 线程 id 杀死它们:
(来自 mysql 内部,而不是显然是 shell)
您必须从以下
命令中找到线程 ID,并找出哪个线程 ID 阻塞了数据库。
Extrapolating from Rolando's answer above, it is these that are blocking your query:
If you need to execute your query and can not wait for the others to run, kill them off using the MySQL thread id:
(from within mysql, not the shell, obviously)
You have to find the thread IDs from the:
command, and figure out which one is the one that is blocking the database.
如果您使用 JDBC,那么您可以选择
includeInnodbStatusInDeadlockExceptions=true
https:/ /dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html
If you're using JDBC, then you have the option
includeInnodbStatusInDeadlockExceptions=true
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html
就我而言,我只能使用 SELECT 命令查询数据库,而不能使用 UPDATE 和 DELETE 命令查询数据库。
该问题是由于线程卡住造成的。它没有响应,状态为“睡眠”。所以,我必须杀了它。
1-查找线程ID:
选择:SELECT * FROM information_schema.processlist ORDER BY id
2-对我来说,它返回了两个线程。当前的一个和卡住的一个。我使用“kill THREAD_ID”来杀死它。
请参阅此处了解更多信息
https://oracle-base.com/articles/mysql/mysql-killing-线程
In my case, I was able only to query the database with the SELECT command, but not the UPDATE neither DELETE command.
The problem was due to a stuck thread. It was not responding, with the status "sleep". So, I had to kill it.
1- Find the thread ID:
Do select to: SELECT * FROM information_schema.processlist ORDER BY id
2- For me it returned two threads. The current one and the stuck one. I used "kill THREAD_ID" to kill it.
See here for more info
https://oracle-base.com/articles/mysql/mysql-killing-threads
例如,激活 MySQL General.log(磁盘密集型)并使用 mysql_analysis_general_log.pl 提取长时间运行的事务和 :
之后禁用general.log。
Activate MySQL general.log (disk intensive) and use mysql_analyse_general_log.pl to extract long running transactions, for example with :
Disable general.log after that.