Description:
Hi,
I found a case where START SLAVE UNTIL is not stopping at the expected position but a little further. In my case, it is not a big problem (because it is happening on the primary of a master/master setup with only writes on the primary), but this is very scary (I do not know what would happen if writes would be pointed to both masters). More details in How to repeat.
Many thanks for looking into that,
JFG
How to repeat:
I have a master/master (cyclic) replication setup like this (details in the file setup.txt in the comments) :
- Writes happen on “master” (M),
- “slave1” (S1) replicates from M,
- M replicates from S1,
- and I have “slave2” (S2) replicating from master.
I want to add S2 to the cycle for having M replicating from S2, S1 replicating from M, and S2 replicating from S1.
Note: I am not using GTID or Parallel Replication. If I would, commands below would be different.
To achieve above, I do (details in the file repointing.txt in the comments):
1) On M: STOP SLAVE; SHOW SLAVE STATUS\G
2) On S2:
2.1) Making sure it is ahead of M: SELECT MASTER_POS_WAIT(<the position of SHOW SLAVE STATUS from M in #1>);
2.2: STOP SLAVE; SHOW SLAVE STATUS\G SHOW MASTER STATUS\G START SLAVE;
3) Back on M:
3.1) START SLAVE UNTIL <the position of SHOW SLAVE STATUS from S2 in #2.2>;
3.2) SELECT MASTER_POS_WAIT(<the position of START SLAVE UNTIL in #3.1);
3.3) STOP SLAVE; CHANGE MASTER TO <the position of SHOW MASTER STATUS from S2 in #2.2>; START SLAVE;
But being paranoid, I did a SHOW SLAVE STATUS\G between #3.2 and #3.3 and I did not like what I saw (all details in repointing.txt):
master [localhost] {msandbox} (test) > START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000002', MASTER_LOG_POS = 143517;
Query OK, 0 rows affected, 1 warning (0.01 sec)
master [localhost] {msandbox} (test) > SELECT MASTER_POS_WAIT('mysql-bin.000002', 143517);
+---------------------------------------------+
| MASTER_POS_WAIT('mysql-bin.000002', 143517) |
+---------------------------------------------+
| NULL |
+---------------------------------------------+
1 row in set (0.00 sec)
master [localhost] {msandbox} (test) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 16746
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 233734
Relay_Log_File: mysql-relay.000007
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
[...]
Exec_Master_Log_Pos: 175953 . <<<<<=====----- This does not match Until_Log_Pos below and START SLAVE UNTIL ABOVE.
Relay_Log_Space: 689
Until_Condition: Master
Until_Log_File: mysql-bin.000002
Until_Log_Pos: 143517
[...]
Replicate_Ignore_Server_Ids:
Master_Server_Id: 200
Master_UUID: 00016746-2222-2222-2222-222222222222
Master_Info_File: /home/jgagne/sandboxes/rsandbox_5_7_22/master/data/master.info
[...]
1 row in set (0.00 sec)
See how Exec_Master_Log_Pos is larger than what was asked in the START SLAVE UNTIL. I would expect to have exactly the value from START SLAVE UNTIL.