MySQL主主复制&自动增量列问题

发布于 2024-12-17 04:03:34 字数 1351 浏览 6 评论 0原文

我正在使用主主复制进行一些测试，并且遇到了一些奇怪的问题，我将尝试描述我遵循的过程，以便有人可以重现该问题。

我在 2 个虚拟机上设置了复制，并在每个虚拟机的配置文件中设置了复制：

-- Master1 -- 
auto_increment_increment = 2
auto_increment_offset = 1

-- Master2 -- 
auto_increment_increment = 2
auto_increment_offset = 2

这些设置应该会导致自动增量列的算术级数：

- Master1: 1,3,5,7,9,11,13  ...
- Master2: 2,4,6,8,10,12,14 ...

Master1 获取奇数，Master2 获取偶数。然后，我创建一个测试数据库，并添加一个具有以下定义的表：

CREATE TABLE `t1` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `c1` varchar(50) DEFAULT NULL,
 `d1` date DEFAULT '1970-01-01',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;

当然，数据库是在两台服务器上创建的。然后，我在两台服务器上执行

START SLAVE;

以便开始复制。为了生成数据，我使用以下过程：

必须插入一条记录才能启动进程
INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) ;
然后，您从同一个表中使用 INSERT - SELECT，它将开始以 2ⁿ 的速率插入，n 是您执行查询的时间：
INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1;

提示：我发现此处描述的这种方法对于为表生成随机数据也非常方便。

因此，当我开始在两台服务器上同时执行这些查询时，这总是会导致自动增量列的复制重复键错误。如果有人有任何想法，我将不胜感激！

PS：当然这种查询在生产应用中很少发生，但我相信这仍然证明了一点。

原文

I am doing some testing with master-master replication and I ran on some weird problem, I will try and describe the procedure I followed so that someone can perhaps reproduce the issue.

I set up replication on 2 VMs and in the configuration file for each one:

-- Master1 -- 
auto_increment_increment = 2
auto_increment_offset = 1

-- Master2 -- 
auto_increment_increment = 2
auto_increment_offset = 2

These settings should result in an arithmetic progression for the auto-increment columns:

- Master1: 1,3,5,7,9,11,13  ...
- Master2: 2,4,6,8,10,12,14 ...

Master1 gets the odd numbers and Master2 gets the even ones.
I then create a test database and I add a table with the following definition:

CREATE TABLE `t1` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `c1` varchar(50) DEFAULT NULL,
 `d1` date DEFAULT '1970-01-01',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;

Of course the database is created on both servers. Afterwards, I execute

START SLAVE;

on both servers so that replication begins.
In order to generate data, I use the following procedure:

A single record must be inserted for the process to take off
INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) ;
Then you use INSERT - SELECT from the same table which will start inserting at a rate of 2ⁿ, n being the times you execute the query:
INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1;

Tip: I find this method described here also very convenient for generating random data for your tables.

So, when I start executing these queries simultaneously on both servers and this always results in replication duplicate key error for the auto-increment column.
If anyone has any ideas, I would be thankful!

PS: Of course this kind of queries seldom happen in production applications, but I believe it still proves a point.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对不⑦ 2024-12-24 04:03:34

注意：我确实找到了答案，并将其放在顶部。答案下面是一些其他的咆哮（我最初的答案），它们仍然可以解释这一点。

由于您的查询使语句的行数加倍 INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1; 可以在服务器1和服务器2上插入不同数量的行。所有语句使用自动增量列的实例将其 INSERT_ID 与复制一起发送，并且如果 a 语句也已在服务器 2 上运行，则该值在服务器 2 上将不为 true。

让我们看一个例子。我将执行 stop Slave 来模拟长时间运行的查询或不良网络。

创建两个数据库并设置主主复制
创建表并插入初始行
在服务器 2 上停止复制
在服务器 1 上运行几次将行数加倍的语句。2 就足够了，但我做了 3。
检查show binlog events（警告，不要在旧数据库上执行此操作，这将花费很长时间）。这就是我所看到的。
查询|开始
因特瓦 | INSERT_ID=3
查询 |使用测试；插入 t1(c1,d1) 选择 ...
查询 |提交
查询 |开始
因特瓦 | INSERT_ID=5
查询 |使用测试；插入 t1(c1,d1) 选择 ...
查询 |提交
查询 |开始
因特瓦 | INSERT_ID=9
查询 |使用测试；插入 t1(c1,d1) 选择 ...
查询 | COMMIT
请注意，每次运行复制时 INSERT_ID 都会相应更改。在第二次插入时，它是 5，这意味着第一次插入插入了 1 行（请记住，增量是 2）。在第三次插入中，INSERT_ID 为 9，这意味着第二次插入插入了 2 行。这都是有道理的。让我们继续
在服务器 2 上复制一次，暂时不要开始复制。现在，执行 select * from t1 可以正确显示两行，id 为 1 和 2。
现在再次启动从属服务器并运行 SHOW SLAVE STATUS \G。它已停止，重复 ID 为 5。再次选择 t1 中的所有值会显示四行。第一个是最初的一个。第二个是我们在服务器 2 上所做的，最后一次 id 3 和 5 的操作来自服务器 1 上的第一个语句，即仅添加 1 行的语句。
复制的下一部分是这个
查询|开始
因特瓦 | INSERT_ID=5
查询 |使用测试；插入 t1(c1,d1) 选择 ...
查询 | COMMIT
发生这种情况时，服务器 1 上的 INSERT_ID 为 5，这就是复制将要使用的内容，但是，在服务器 2 上我们已经有 id 5，因为我们在获得此信息之前额外复制了行一次。所以复制会中断。

底线是这样的。进行主主复制时，每个语句都需要以相同的方式影响数据库。添加或删除相同数量的行等。

也就是说，如果您需要做这样的事情，那么对于这种特殊情况有一个简单的解决方法。

将 server_id 添加到数据中并创建一个像这样的表
创建表t1 (
id int(11) NOT NULL AUTO_INCRMENT,
server_id int(1) 默认为 NULL，
c1 varchar(50) 默认为空，
d1 日期默认“1970-01-01”，
主键（id）
) ENGINE=MyISAM AUTO_INCRMENT=4 DEFAULT CHARSET=latin1;
准备两行，每个服务器 ID 一行
INSERT INTO t1(server_id, c1,d1) SELECT 1, LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) ;
INSERT INTO t1(server_id, c1,d1) SELECT 2, LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) ;
对于每个重复项，只需考虑在您的服务器上创建的行。
INSERT INTO t1(server_id, c1,d1) SELECT server_id, LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1其中 server_id = 1;

下面是原始答案

首先，当您假设您将有两组 id 范围为 1, 3, 5, .. 和 2, 4, 6 ... 时，您就错了，无论什么服务器如果 Auto_increment 始终为 max(id)+1，则对该值运行该语句。因此，如果您在服务器 1 上执行两次插入，它将获得 1 和 3 的奇数值。如果您随后在服务器 2 上执行一次插入，它将获得偶数值 4（4 是下一个大于 3 且满足 的数字。 >auto_increment_offset + N × auto_increment_increment）。

您可以通过运行 show table status; 查看 Auto_increment 值

其次，第一次插入后的每次插入都会使表中的行数加倍，很快就会使其成为一个非常慢的操作，我不会如果这与每个查询如此慢有关，请感到惊讶。

也就是说，这就是我测试的方法（并得到了同样令人惊讶的结果）。

我创建了一个新的空设置，其中包含两台服务器和 master master make_replication_sandbox --master_master mysql-5.5.17-osx10.6-x86_64.tar.gz。他们都是开始的，所以也有奴隶。它们会在您进行设置时自动配置。
然后我创建了表格并根据您的问题插入了第一行。现在两台服务器上的 Auto_increment 均为 2，然后
我正在运行 while (true) do ./n1 test -e "INSERT INTO t1(c1,d1) SELECT LPAD('', 50 , MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY )从 t1;";完成； 同时针对两台服务器（./n2 在另一台服务器上）。

我有一个理论。

假设表中有 1000 行，并且您同时在两台服务器上启动相同的复制。简而言之，您将在两台服务器上获得 4000 行，并且它们都是相同的。

但会发生的情况是，您复制每个数据库上的行，以便服务器 1 看到 2000 行，服务器看到 2000 行，但只有前 1000 行是相同的，其他 1000 行在两台服务器上生成不同。

然后复制开始。这是基于语句的复制，因此运行相同的语句意味着在两台服务器上，行再次复制到 4000，这是正确的计数，但仍然只有 1000 个相同，其他 3000 个会不同。

只要每台服务器运行相同数量的查询，这可能会起作用（没有重复，但数据不同），但如果一台服务器在复制缓存之前设法运行两个查询，那么您会在复制中得到一条在服务器 2 上添加了 1000 行的语句（如果之前有 1000 行），但在服务器 1 上添加了 4000 行（因为服务器 1 已经将 1000 增加了一倍）。如果下一条语句在服务器 2 上添加了另外 2000 行，并且二进制日志包含类似“服务器上使用的第一个自动增量”之类的内容，那么您将遇到冲突。

我知道这是抽象和奇怪的，甚至比思考它更难写出来:)

我希望这会有所帮助，我希望这就是问题所在......大师大师很难，这绝对是一个我不会在大师大师中做的事情。

NOTE: I did find the answer and I've put it at top. Below the answer is some other rantings (my initial answer) that still could hold some value to explain this.

Since your query doubles the number of rows your statement INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1; can insert a different number of rows on server 1 and the server 2. All statements that uses a auto increment column sends its INSERT_ID along with the replication and that value will not be true on server 2 if the a statement has been run there as well.

Lets look at an example. I will do stop slave to simulate a long running query or a bad network.

Create two databases and set up master-master replication
Create the table and insert the inital row
Stop replication on server 2
Run the statement that doubles the number of rows a couple of time on server 1. 2 is enough but I did 3.
Check show binlog events (warning, don't do this on an old database, it will take forever). This is what I see.
Query | BEGIN
Intvar | INSERT_ID=3
Query | use test; INSERT INTO t1(c1,d1) SELECT ...
Query | COMMIT
Query | BEGIN
Intvar | INSERT_ID=5
Query | use test; INSERT INTO t1(c1,d1) SELECT ...
Query | COMMIT
Query | BEGIN
Intvar | INSERT_ID=9
Query | use test; INSERT INTO t1(c1,d1) SELECT ...
Query | COMMIT
Note that for each time I run the duplication INSERT_ID changes accordingly. On the second insert it is 5 meaning that the first insert inserted 1 row (remember, the increment is 2). On the third insert INSERT_ID is 9 meaning that the second insert inserted 2 rows. This all makes sense. Let's continue
On server 2 do the duplication once, don't start replication yet. Doing a select * from t1 now correctly shows two rows, with id 1 and 2.
Now start the slave again and run a SHOW SLAVE STATUS \G. It has stopped with a duplicate id of 5. Selecting all values from t1 again shows four rows. The first was the inital one. The second is what we did on server 2 and the two last once with id 3 and 5 was from that first statement on server 1, the one that added just 1 row.
The next part of replication is this
Query | BEGIN
Intvar | INSERT_ID=5
Query | use test; INSERT INTO t1(c1,d1) SELECT ...
Query | COMMIT
On server 1 INSERT_ID was 5 when this happend and that is what replication is going to use, however, on server 2 we already have id 5, cause we duplicated the rows one extra time before getting this. So replication breaks.

Bottom line is this. When doing master-master replication each statement needs to affect the database in the same way. Adding or deleting the same number of rows among other things.

That said, there is an easy fix for this particular case if you need to do things like this.

Add a server_id to the data and create a table like this
CREATE TABLE t1 (
id int(11) NOT NULL AUTO_INCREMENT,
server_id int(1) DEFAULT NULL,
c1 varchar(50) DEFAULT NULL,
d1 date DEFAULT '1970-01-01',
PRIMARY KEY (id)
) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
Prepare two rows, one for each server id
INSERT INTO t1(server_id, c1,d1) SELECT 1, LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) ;
INSERT INTO t1(server_id, c1,d1) SELECT 2, LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) ;
For each duplication just take the rows created on your server into account.
INSERT INTO t1(server_id, c1,d1) SELECT server_id, LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1 where server_id = 1;

BELOW IS THE ORIGNAL ANSWER

First of all your wrong when you assume that you will have two sets of ids ranged 1, 3, 5, .. and 2, 4, 6 ... Regardless of what server the statement is run on the value if the Auto_increment is always max(id)+1. So if you do two inserts on server 1 it will get the odd values of 1 and 3. If you then do one insert on server 2 it will get the even value of 4 (4 is the next number higher than 3 that satisfies auto_increment_offset + N × auto_increment_increment).

You can see the Auto_increment value by running show table status;

Second, each of your inserts after the first doubles the number of rows in the table, quickly making it a very slow operation and I wouldn't be surprised if this has something to do with that each query is so slow.

That said, this is how I tested this (and got the same surprising results).

I created a new empty setup with two servers and master master make_replication_sandbox --master_master mysql-5.5.17-osx10.6-x86_64.tar.gz. They are both started and so are there slaves. They are configured automatically as you did your setup.
I then created the table and inserted the first row as per your question. Auto_increment is now 2 on both servers and there is one row in the table
I'm then running while (true) do ./n1 test -e "INSERT INTO t1(c1,d1) SELECT LPAD('', 50, MD5( RAND() ) ), DATE_ADD( CURDATE(), INTERVAL FLOOR( RAND() * 365 ) DAY ) FROM t1;"; done; against both servers at the same time (./n2 on the other one).

And I have a theory.

Say you have 1000 rows in the table and you initiate this same duplication on both servers at the same time. In a perfect word you would get 4000 rows afterwards on both servers and they would all be the same.

But what happens is that you duplicate the rows on each database so that server 1 sees 2000 rows and server 2000 rows, but only the first 1000 is the same, the other 1000 has been generated differently on the two servers.

Then replication kicks in. This is statement based replication so the same statement runs meaning that on both servers the rows get duplicated again to 4000 and that is the correct count but still only 1000 of them are the same, the other 3000 will differ.

As long as each server runs the same number of queries this might work (no duplicates, but data differs) but if one server manages to run two queries before replication caches up then you get a statement in replication that on server 2 added 1000 rows (if there was 1000 rows before) but on server 1 adds 4000 rows (cause server 1 has already doubled 1000 twice). If the next statement added another 2000 rows on server 2 and the binary log contains something like "first auto increment used on server" then you would get a collision.

I know this is abstract and weird, and even harder to put into writing than it was to think about it :)

I hope this helps, and I hope this is the problem... Master-master is hard, and this is definitely one of the things I wouldn't do in master-master.

回复收藏 0 原文

~没有更多了~