Bug #69943 | Transactions skipped on slave after "stop/start slave" using GTID replication | ||
---|---|---|---|
Submitted: | 7 Aug 2013 0:46 | Modified: | 30 Oct 2013 8:35 |
Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.6.13 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[7 Aug 2013 0:46]
Yoshinori Matsunobu
[7 Aug 2013 0:47]
Yoshinori Matsunobu
test script to repeat the bug
Attachment: tran_insert.pl (application/octet-stream, text), 943 bytes.
[7 Aug 2013 6:19]
MySQL Verification Team
Hello Yoshinori, Thank you for the bug report and the test case. Verified as described on MySQL version 5.6.13 Thanks, Umesh
[7 Aug 2013 6:20]
MySQL Verification Team
// At the end of all steps.. confirms that the entire large transaction (500000 rows) is lost // On Master mysql> show master status; +------------------+-----------+--------------+------------------+---------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+-----------+--------------+------------------+---------------------------------------------+ | mysql-bin.000003 | 581988168 | | | 4470d4c7-ffe1-11e2-8214-49f3e6a792e3:1-1007 | +------------------+-----------+--------------+------------------+---------------------------------------------+ 1 row in set (0.00 sec) mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 501000 | +----------+ 1 row in set (0.97 sec) // On slave mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 581988168 Relay_Log_File: B-relay-bin.000003 Relay_Log_Pos: 1321401 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 581988168 Relay_Log_Space: 156422612 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 4470d4c7-ffe1-11e2-8214-49f3e6a792e3 Master_Info_File: /tmp/B/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 4470d4c7-ffe1-11e2-8214-49f3e6a792e3:1-1007 Executed_Gtid_Set: 4470d4c7-ffe1-11e2-8214-49f3e6a792e3:1:3-1007 Auto_Position: 1 1 row in set (0.00 sec) mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row in set (0.00 sec) mysql>
[7 Aug 2013 15:37]
MySQL Verification Team
Tried the workaround relay_log_recovery=1 as suggested by venkat, issue is still reproducible: from step 5 onwards from test case.. // On slave mysql> change master to master_host='127.0.0.1', master_port=3306, master_user='repl',MASTER_PASSWORD='slavepass', master_auto_position=1; Query OK, 0 rows affected, 2 warnings (0.04 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> stop slave; Query OK, 0 rows affected (0.15 sec) mysql> start slave io_thread; Query OK, 0 rows affected (0.00 sec) mysql> select sleep(3); +----------+ | sleep(3) | +----------+ | 0 | +----------+ 1 row in set (3.00 sec) mysql> stop slave io_thread; Query OK, 0 rows affected (0.02 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 581988168 Relay_Log_File: B-relay-bin.000003 Relay_Log_Pos: 202408 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 580869175 Relay_Log_Space: 63704965 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 119 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 0380caa2-002f-11e3-840f-531e9aab8cbb Master_Info_File: /tmp/B/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Reading event from the relay log Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1-1007 Executed_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1:3-154 Auto_Position: 1 1 row in set (0.04 sec) mysql> select count(*) from t1; ERROR 1046 (3D000): No database selected mysql> use test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row in set (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 581988168 Relay_Log_File: B-relay-bin.000004 Relay_Log_Pos: 448 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 581988168 Relay_Log_Space: 1321898 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 0380caa2-002f-11e3-840f-531e9aab8cbb Master_Info_File: /tmp/B/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1-1007 Executed_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1:3-1007 Auto_Position: 1 1 row in set (0.00 sec) mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row in set (0.00 sec) mysql> show global variables like 'relay%'; +---------------------------+----------------+ | Variable_name | Value | +---------------------------+----------------+ | relay_log | | | relay_log_basename | | | relay_log_index | | | relay_log_info_file | relay-log.info | | relay_log_info_repository | FILE | | relay_log_purge | ON | | relay_log_recovery | ON | | relay_log_space_limit | 0 | +---------------------------+----------------+ 8 rows in set (0.00 sec) mysql> \! more B.cnf [mysqld] port = 3307 socket = /tmp/B/mysql.sock basedir = /data/ushastry/server/mysql-5.6.13 datadir = /tmp/B/ tmpdir = /tmp/B/ pid-file = /tmp/B/B.pid log-bin = mysql-bin server-id = 2 log-error = /tmp/B/B.err report-host=SlaveB report-port=3307 #replicate-ignore-table=database.table log-slave-updates = 1 gtid_mode=ON enforce-gtid-consistency=ON relay-log-recovery=1
[7 Aug 2013 16:07]
MySQL Verification Team
// with "--relay_log_info_repository=TABLE --master_info_repository=TABLE --sync_master_info=1 --relay-log-recovery=1" on slave. // from step 5 onwards from test case.. mysql> SHOW GLOBAL VARIABLES WHERE Variable_Name LIKE 'relay_log_info_repository' OR Variable_Name LIKE 'master_info_repository' OR Variable_Name LIKE 'sync_master_info' OR Variable_Name LIKE 'relay_log_recovery'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | master_info_repository | TABLE | | relay_log_info_repository | TABLE | | relay_log_recovery | ON | | sync_master_info | 1 | +---------------------------+-------+ 4 rows in set (0.00 sec) mysql> mysql> change master to master_host='127.0.0.1', master_port=3306, master_user='repl',MASTER_PASSWORD='slavepass', master_auto_position=1; Query OK, 0 rows affected, 2 warnings (0.06 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> stop slave; Query OK, 0 rows affected (0.07 sec) mysql> start slave io_thread; Query OK, 0 rows affected (0.00 sec) mysql> select sleep(3); +----------+ | sleep(3) | +----------+ | 0 | +----------+ 1 row in set (3.00 sec) mysql> stop slave io_thread; Query OK, 0 rows affected (0.01 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: System lock Master_Host: 127.0.0.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 581155742 Relay_Log_File: B-relay-bin.000004 Relay_Log_Pos: 448 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 581039455 Relay_Log_Space: 488200 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 135 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 3d2babcf-0033-11e3-842a-5d4541789890 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Reading event from the relay log Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 3d2babcf-0033-11e3-842a-5d4541789890:1-373 Executed_Gtid_Set: 3d2babcf-0033-11e3-842a-5d4541789890:1:3-283 Auto_Position: 1 1 row in set (0.01 sec) mysql> use test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 999 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 999 | +----------+ 1 row in set (0.00 sec)
[7 Aug 2013 16:07]
MySQL Verification Team
// On master mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 501000 | +----------+ 1 row in set (0.99 sec) mysql> show master status; +------------------+-----------+--------------+------------------+---------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+-----------+--------------+------------------+---------------------------------------------+ | mysql-bin.000003 | 581988168 | | | 3d2babcf-0033-11e3-842a-5d4541789890:1-1007 | +------------------+-----------+--------------+------------------+---------------------------------------------+ 1 row in set (0.00 sec)
[7 Aug 2013 16:22]
MySQL Verification Team
Conf files used for testing.. [root@cluster-repo mysql-5.6.13]# more A.cnf B.cnf :::::::::::::: A.cnf :::::::::::::: [mysqld] port = 3306 socket = /tmp/A/mysql.sock basedir = /data/ushastry/server/mysql-5.6.13 datadir = /tmp/A/ tmpdir = /tmp/A/ pid-file = /tmp/A/A.pid log-bin=mysql-bin server-id=1 log-error = /tmp/A/A.err report-host=SlaveA report-port=3306 #replicate-ignore-table=database.table log-slave-updates = 1 gtid_mode=ON enforce-gtid-consistency=ON :::::::::::::: B.cnf :::::::::::::: [mysqld] port = 3307 socket = /tmp/B/mysql.sock basedir = /data/ushastry/server/mysql-5.6.13 datadir = /tmp/B/ tmpdir = /tmp/B/ pid-file = /tmp/B/B.pid log-bin = mysql-bin server-id = 2 log-error = /tmp/B/B.err report-host=SlaveB report-port=3307 #replicate-ignore-table=database.table log-slave-updates = 1 gtid_mode=ON enforce-gtid-consistency=ON // This is how I started master/slave server Master: bin/mysqld_safe --defaults-file=./A.cnf --user=mysql & Slave: bin/mysqld_safe --defaults-file=./B.cnf --user=mysql --relay_log_info_repository=TABLE --master_info_repository=TABLE --sync_master_info=1 --relay-log-recovery=1 &
[30 Oct 2013 8:35]
Jon Stephens
Fixed in 5.6+. Documented fix in the 5.6.15 and 5.7.3 changelogs, as follows: When stopping the I/O thread, it was possible with a very large transaction (equivalent to a binary log size greater than 100MB) that the thread did not receive the transaction to the end. When reconnecting with MASTER_AUTO_POSITION=1, it then tried to fetch changes from the next transaction, which could lead to loss of the incomplete transaction and its data. Closed.
[30 Oct 2013 9:04]
Venkatesh Duggirala
Bug #69758 marked as duplicate of this one.
[4 Dec 2013 12:10]
Laurynas Biveinis
5.6$ bzr log -r 5567 ------------------------------------------------------------ revno: 5567 committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com> branch nick: mysql-5.6 timestamp: Wed 2013-10-30 11:04:38 +0530 message: Bug#17280176 TRANSACTIONS SKIPPED ON SLAVE AFTER "STOP/START SLAVE" USING GTID REPLICATION Analysis: Slave updates 'GTID_RETRIEVED' set upon receiving GTID_LOG_EVENT for a particular transaction which is first event in the event group. Say, I/O thread is stopped *after* adding GTID number to 'gtid_trieved' set and *before* it actually retrieves all the events from that GTID event group. Next time when this I/O thread is reconnected, it sends union of GTID_RETRIEVED + GTID_EXECUTED set to master. So Master thinks that slave is having all the events from this GTID set(which includes partially retrieved GTID) and it will not resend them again. Hence slave is missing some events for ever. Fix: It is not easy to find the end of a group of events. So mysql server is unsure whether I/O thread retrieved the last gtid transaction events completely or not (before it is going down because of a crash/normal shutdown/normal stop slave io_thread). It is possible that I/O thread would have retrieved and written only partial transaction events. So Server will request Master to send the last gtid event once again. We do this by removing the last I/O thread retrieved gtid event from "Retrieved_gtid_set". Possible cases: 1) I/O thread would have retrieved full transaction already in the first time itself, but retrieving them again will not cause problem because GTID number is same, Hence SQL thread will not commit it again. 2) I/O thread would have retrieved full transaction already and SQL thread would have already executed it. In that case, We are not going remove last retrieved gtid from "Retrieved_gtid_set" otherwise we will see gaps in "Retrieved set".
[3 Feb 2014 10:12]
Laurynas Biveinis
5.6$ bzr log -r 5586 ------------------------------------------------------------ revno: 5586 committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com> branch nick: mysql-5.6 timestamp: Mon 2013-11-04 21:47:16 +0530 message: Bug#17280176 TRANSACTIONS SKIPPED ON SLAVE AFTER "STOP/START SLAVE" USING GTID REPLICATION Fixing post push solaris compiler warning "last_retrieved_gtid hides Relay_log_info::last_retrieved_gtid."