在 MySQL 中，为什么关闭 innodb_support_xa 进行单线程更新是安全的？

发布于 2024-11-09 02:36:28 字数 884 浏览 2 评论 0原文

在 MySQL 的文档中，有关 innodb_support_xa 选项的说明如下：

启用 InnoDB 对 XA 事务中两阶段提交的支持，从而导致事务准备时需要额外的磁盘刷新。此设置是默认设置。 XA 机制在内部使用，对于任何打开二进制日志并从多个线程接受数据更改的服务器来说都是必不可少的。 如果将其关闭，则事务可以按照与实时数据库提交事务的顺序不同的顺序写入二进制日志。当在灾难中重播二进制日志时，这可能会产生不同的数据恢复或在复制从属设备上。不要在复制主服务器上将其关闭，除非您有一个不寻常的设置，其中只有一个线程能够更改数据。
对于仅从一个线程接受数据更改的服务器，这是安全的，建议关闭此选项以提高 InnoDB 表的性能。例如，您可以在复制时将其关闭仅复制 SQL 线程正在更改数据的从属服务器。

然而，根据我对臭名昭著的组提交错误的理解，2PC 用于保证事务日志和binlog包含相同的事务集，而prepare_commit_mutex负责确保相同的顺序。

有了prepare_commit_mutex，事务日志和binlog的写入就已经序列化了，那么多线程更新和单线程更新有什么区别呢？

另一方面，即使只有一个线程可以更改数据，没有2PC，如果在事务写入binlog之后、写入事务日志之前发生崩溃，Innodb将如何处理恢复情况如何？理论上它可以简单地执行binlog中未完成的事务，就像slave所做的那样，但我怀疑Innodb实际上是这样做的，因为否则为什么我们需要2PC呢？

我对MySQL的内部结构一点也不熟悉，所以如果我错了，请原谅我。谢谢！

原文

In MySQL's docs the following is said about the innodb_support_xa option:

Enables InnoDB support for two-phase commit in XA transactions, causing an extra disk flush for transaction preparation. This setting is the default. The XA mechanism is used internally and is essential for any server that has its binary log turned on and is accepting changes to its data from more than one thread. If you turn it off, transactions can be written to the binary log in a different order from the one in which the live database is committing them. This can produce different data when the binary log is replayed in disaster recovery or on a replication slave. Do not turn it off on a replication master server unless you have an unusual setup where only one thread is able to change data.
For a server that is accepting data changes from only one thread, it is safe and recommended to turn off this option to improve performance for InnoDB tables. For example, you can turn it off on replication slaves where only the replication SQL thread is changing data.

However, from my understanding from reading on the infamous group commit bug, 2PC is used to guarantee that the transaction log and the binlog contain the same set of transactions, while the prepare_commit_mutex is responsible for ensuring the same order.

With prepare_commit_mutex, the writing to the transaction log and the binlog are already serialized, then what's the difference between multi-threaded and single-threaded updates?

On the other hand, even if there is only one thread that can change data, without 2PC, if a crash occurs after a transaction is written to the binlog, but before it's written to the transaction log, how is Innodb going to deal with this situation in recovery? In theory it can simply execute the outstanding transaction in the binlog, just as what the slaves do, but I doubt Innodb actually does that, because otherwise why do we need 2PC at all?

I'm not familiar at all with the internals of MySQL, so please forgive me if I'm terribly wrong. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜中书 2024-11-16 02:36:28

首先...

http://yoshinorimatsunobu.blogspot .com/2009/08/great-performance-effect-of-fixing.html

在InnoDB Plugin 1.0.4之前，它是这样的：

obtain mutex
  write innodb log and fsync, for prepare-phase (skip if innodb_support_xa=0)
  write binlog (fsync as appropriate if sync_binlog > 0)
  write innodb log and fsync, for commit-phase
release mutex

InnoDB上和之后插件 1.0.4（和 MySQL 5.5），现在是：

write innodb log and fsync, for prepare-phase (skip if innodb_support_xa=0)
obtain mutex
  write binlog (fsync as appropriate if sync_binlog > 0)
  write innodb log, for commit-phase
release mutex
fsync innodb log, for commit-phase

如您所见，在新版本中，临界区中没有任何内容（除了 sync_binlog > 0 的情况）。这样，组提交现在就可以工作并确保更好的并发吞吐量。

例如，在之前的“损坏”版本中，如果您有 100 个线程并发提交，则所有 fsync 都会被序列化，您将获得 100 个 fsync 用于准备，另外 100 个 fsync 用于提交。因此，小组提交完全被破坏了。

现在，在较新的实现中，fsync 根据事务的并发性进行分组，同时确保 innodb log 和 binlog 之间的操作顺序。这也意味着如果只有一个线程，则不会有任何性能提升。

至于你的问题，当在事务写入二进制日志之后但在写入事务日志之前发生崩溃时 - 我和你在同一页上。

如果服务器在最后一步之前崩溃，则 innodb 日志和 binlog 之间有可能存在差异（其中一个可能领先于另一个），但可以保证您拥有有关要处理的所有信息。检查 innodb 日志中的内容，因为它是在准备阶段记录的。

然而，如何处理未提交的内容仍然不确定。例如，除非 sync_binlog = 1，从属设备有可能收到数据，但尚未完全同步主设备上的 binlog。您不能只重做失败的事务，因为它可能已经在其中一个从属设备上运行。

这也意味着，binlog 可能比 innodb 日志短，返回“二进制日志 [file_name] 比其预期大小短”。正如官方文档中所述，您必须从头开始重建从站。不太人性化。

http://dev.mysql.com/doc/refman/5.1 /en/binary-log.html

由于操作顺序的一致性保证独立于 innodb_support_xa 设置（这与官方文档中所说的相矛盾） innodb_support_xa，可能是因为它是关于库存 innodb 5.0.3 远在并发修复之前编写的），并且即使使用 也不能严格保证主站上的 innodb 日志和从站上的中继日志之间的一致性>innodb_support_xa，我认为使用innodb_support_xa没有任何意义。不过，不遵循官方建议是很可怕的，但它在很多方面似乎陈旧且错误。

我想知道当前者设置为 2 或 0 时，innodb_flush_log_at_trx_commit 设置与 innodb_support_xa 行为之间是否存在任何相关性。

一种实用的思维方式是，故障转移到从属设备是安全的 - 毕竟，失败的事务是您想要完成的事情 - 但永远不会故障恢复到主设备，因为数据中可能存在一些差异。在将主站变成新的从站之前，您需要从从站完全复制数据。换句话说，当主服务器崩溃时，从此信任从服务器 - 这样，您就不需要为了崩溃恢复而搞乱 innodb 日志。

另请注意，MySQL 5.5 支持半同步复制，与“信任从属”相同 - 我想您可能会感兴趣。

http://dev.mysql.com/doc/refman/5.5 /en/replication-semisync.html

To begin with...

http://yoshinorimatsunobu.blogspot.com/2009/08/great-performance-effect-of-fixing.html

Before InnoDB Plugin 1.0.4, it was like:

obtain mutex
  write innodb log and fsync, for prepare-phase (skip if innodb_support_xa=0)
  write binlog (fsync as appropriate if sync_binlog > 0)
  write innodb log and fsync, for commit-phase
release mutex

On and after InnoDB Plugin 1.0.4 (and MySQL 5.5), it is now:

write innodb log and fsync, for prepare-phase (skip if innodb_support_xa=0)
obtain mutex
  write binlog (fsync as appropriate if sync_binlog > 0)
  write innodb log, for commit-phase
release mutex
fsync innodb log, for commit-phase

As you can see, in the new version, nothing (except in case sync_binlog > 0) is fsync'd in the critical section. That way, group commit now works and ensures far better concurrent throughput.

For instance, with the previous "broken" version, if you had 100 threads concurrent commits, all fsyncs were serialized and you would get 100 fsyncs for prepare and another 100 fsyncs for commit. Therefore group commit was completely broken.

Now with the newer implementation, fsyncs are grouped depending on the concurrency of transactions, while ensuring operation ordering between innodb log and binlog. It also means that if there's only one thread, there's no performance gain.

As to your question that, when crash occurs after a transaction is written to the binlog, but before it's written to the transaction log - I'm on the same page as you.

If the server crashed before the final step, there's a slight chance that you have a discrepancy between innodb log and binlog (either one could be ahead of the other), but it is guaranteed that you have all the information on what to examine in the innodb log, as it is recorded in the prepare-phase.

However, what to do with the uncommitted is still nondeterministic. For instance, unless sync_binlog = 1 there's a chance that a slave received the data but not fully fsync'd the binlog on the master yet. You can't just redo the failed transaction as it may have already run on one of the slaves.

Which also means, the binlog could be shorter than the innodb log, returning "The binary log [file_name] is shorter than its expected size." as described in the official doc, and you have to rebuild the slave from scratch. Not very human friendly.

http://dev.mysql.com/doc/refman/5.1/en/binary-log.html

As consistency in terms of operation ordering is guaranteed independent of the innodb_support_xa setting (which contradicts what's said in the official doc on innodb_support_xa, maybe because it was written about the stock innodb 5.0.3 far before the concurrency fix), and consistency between innodb log on the master and relay log on the slave is not strictly guaranteed even with innodb_support_xa, I don't see any point in using innodb_support_xa. It's scary to not follow the official recommendation, though, however it seems stale and wrong in many points.

I'm wondering if there's any correlation between the innodb_flush_log_at_trx_commit setting and the innodb_support_xa behavior when the former is set to 2 or 0.

One practical way of thinking is that, failover to the slave is safe - after all, the failed transaction was something that you wanted to get done - but never fail-back to master, as there might be some discrepancy in the data. You need to fully copy the data from the slave, before you make the master a new slave. In other words, when the master crashed, trust the slave from then on - that way, you don't need to mess with innodb log for crash recovery.

Also note that MySQL 5.5 supports semi-synchronous replication, along the same line as "trust the slave" - thought you might be interested.

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

回复收藏 0 原文

~没有更多了~