| Bug #69943 | Transactions skipped on slave after "stop/start slave" using GTID replication | ||
|---|---|---|---|
| Submitted: | 7 Aug 2013 0:46 | Modified: | 30 Oct 2013 8:35 |
| Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
| Version: | 5.6.13 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[7 Aug 2013 0:46]
Yoshinori Matsunobu
[7 Aug 2013 0:47]
Yoshinori Matsunobu
test script to repeat the bug
Attachment: tran_insert.pl (application/octet-stream, text), 943 bytes.
[7 Aug 2013 6:19]
MySQL Verification Team
Hello Yoshinori, Thank you for the bug report and the test case. Verified as described on MySQL version 5.6.13 Thanks, Umesh
[7 Aug 2013 6:20]
MySQL Verification Team
// At the end of all steps.. confirms that the entire large transaction (500000 rows) is lost
// On Master
mysql> show master status;
+------------------+-----------+--------------+------------------+---------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+---------------------------------------------+
| mysql-bin.000003 | 581988168 | | | 4470d4c7-ffe1-11e2-8214-49f3e6a792e3:1-1007 |
+------------------+-----------+--------------+------------------+---------------------------------------------+
1 row in set (0.00 sec)
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 501000 |
+----------+
1 row in set (0.97 sec)
// On slave
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 581988168
Relay_Log_File: B-relay-bin.000003
Relay_Log_Pos: 1321401
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 581988168
Relay_Log_Space: 156422612
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 4470d4c7-ffe1-11e2-8214-49f3e6a792e3
Master_Info_File: /tmp/B/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 4470d4c7-ffe1-11e2-8214-49f3e6a792e3:1-1007
Executed_Gtid_Set: 4470d4c7-ffe1-11e2-8214-49f3e6a792e3:1:3-1007
Auto_Position: 1
1 row in set (0.00 sec)
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.00 sec)
mysql>
[7 Aug 2013 15:37]
MySQL Verification Team
Tried the workaround relay_log_recovery=1 as suggested by venkat, issue is still reproducible:
from step 5 onwards from test case..
// On slave
mysql> change master to master_host='127.0.0.1', master_port=3306, master_user='repl',MASTER_PASSWORD='slavepass', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.04 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> stop slave;
Query OK, 0 rows affected (0.15 sec)
mysql> start slave io_thread;
Query OK, 0 rows affected (0.00 sec)
mysql> select sleep(3);
+----------+
| sleep(3) |
+----------+
| 0 |
+----------+
1 row in set (3.00 sec)
mysql> stop slave io_thread;
Query OK, 0 rows affected (0.02 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 581988168
Relay_Log_File: B-relay-bin.000003
Relay_Log_Pos: 202408
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 580869175
Relay_Log_Space: 63704965
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 119
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 0380caa2-002f-11e3-840f-531e9aab8cbb
Master_Info_File: /tmp/B/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1-1007
Executed_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1:3-154
Auto_Position: 1
1 row in set (0.04 sec)
mysql> select count(*) from t1;
ERROR 1046 (3D000): No database selected
mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 581988168
Relay_Log_File: B-relay-bin.000004
Relay_Log_Pos: 448
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 581988168
Relay_Log_Space: 1321898
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 0380caa2-002f-11e3-840f-531e9aab8cbb
Master_Info_File: /tmp/B/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1-1007
Executed_Gtid_Set: 0380caa2-002f-11e3-840f-531e9aab8cbb:1:3-1007
Auto_Position: 1
1 row in set (0.00 sec)
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.00 sec)
mysql> show global variables like 'relay%';
+---------------------------+----------------+
| Variable_name | Value |
+---------------------------+----------------+
| relay_log | |
| relay_log_basename | |
| relay_log_index | |
| relay_log_info_file | relay-log.info |
| relay_log_info_repository | FILE |
| relay_log_purge | ON |
| relay_log_recovery | ON |
| relay_log_space_limit | 0 |
+---------------------------+----------------+
8 rows in set (0.00 sec)
mysql> \! more B.cnf
[mysqld]
port = 3307
socket = /tmp/B/mysql.sock
basedir = /data/ushastry/server/mysql-5.6.13
datadir = /tmp/B/
tmpdir = /tmp/B/
pid-file = /tmp/B/B.pid
log-bin = mysql-bin
server-id = 2
log-error = /tmp/B/B.err
report-host=SlaveB
report-port=3307
#replicate-ignore-table=database.table
log-slave-updates = 1
gtid_mode=ON
enforce-gtid-consistency=ON
relay-log-recovery=1
[7 Aug 2013 16:07]
MySQL Verification Team
// with "--relay_log_info_repository=TABLE --master_info_repository=TABLE --sync_master_info=1 --relay-log-recovery=1" on slave.
// from step 5 onwards from test case..
mysql> SHOW GLOBAL VARIABLES WHERE Variable_Name LIKE 'relay_log_info_repository' OR Variable_Name LIKE 'master_info_repository' OR Variable_Name LIKE 'sync_master_info' OR Variable_Name LIKE 'relay_log_recovery';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| master_info_repository | TABLE |
| relay_log_info_repository | TABLE |
| relay_log_recovery | ON |
| sync_master_info | 1 |
+---------------------------+-------+
4 rows in set (0.00 sec)
mysql>
mysql> change master to master_host='127.0.0.1', master_port=3306, master_user='repl',MASTER_PASSWORD='slavepass', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.06 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> stop slave;
Query OK, 0 rows affected (0.07 sec)
mysql> start slave io_thread;
Query OK, 0 rows affected (0.00 sec)
mysql> select sleep(3);
+----------+
| sleep(3) |
+----------+
| 0 |
+----------+
1 row in set (3.00 sec)
mysql> stop slave io_thread;
Query OK, 0 rows affected (0.01 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: System lock
Master_Host: 127.0.0.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 581155742
Relay_Log_File: B-relay-bin.000004
Relay_Log_Pos: 448
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 581039455
Relay_Log_Space: 488200
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 135
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 3d2babcf-0033-11e3-842a-5d4541789890
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 3d2babcf-0033-11e3-842a-5d4541789890:1-373
Executed_Gtid_Set: 3d2babcf-0033-11e3-842a-5d4541789890:1:3-283
Auto_Position: 1
1 row in set (0.01 sec)
mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 999 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 999 |
+----------+
1 row in set (0.00 sec)
[7 Aug 2013 16:07]
MySQL Verification Team
// On master mysql> select count(*) from t1; +----------+ | count(*) | +----------+ | 501000 | +----------+ 1 row in set (0.99 sec) mysql> show master status; +------------------+-----------+--------------+------------------+---------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+-----------+--------------+------------------+---------------------------------------------+ | mysql-bin.000003 | 581988168 | | | 3d2babcf-0033-11e3-842a-5d4541789890:1-1007 | +------------------+-----------+--------------+------------------+---------------------------------------------+ 1 row in set (0.00 sec)
[7 Aug 2013 16:22]
MySQL Verification Team
Conf files used for testing.. [root@cluster-repo mysql-5.6.13]# more A.cnf B.cnf :::::::::::::: A.cnf :::::::::::::: [mysqld] port = 3306 socket = /tmp/A/mysql.sock basedir = /data/ushastry/server/mysql-5.6.13 datadir = /tmp/A/ tmpdir = /tmp/A/ pid-file = /tmp/A/A.pid log-bin=mysql-bin server-id=1 log-error = /tmp/A/A.err report-host=SlaveA report-port=3306 #replicate-ignore-table=database.table log-slave-updates = 1 gtid_mode=ON enforce-gtid-consistency=ON :::::::::::::: B.cnf :::::::::::::: [mysqld] port = 3307 socket = /tmp/B/mysql.sock basedir = /data/ushastry/server/mysql-5.6.13 datadir = /tmp/B/ tmpdir = /tmp/B/ pid-file = /tmp/B/B.pid log-bin = mysql-bin server-id = 2 log-error = /tmp/B/B.err report-host=SlaveB report-port=3307 #replicate-ignore-table=database.table log-slave-updates = 1 gtid_mode=ON enforce-gtid-consistency=ON // This is how I started master/slave server Master: bin/mysqld_safe --defaults-file=./A.cnf --user=mysql & Slave: bin/mysqld_safe --defaults-file=./B.cnf --user=mysql --relay_log_info_repository=TABLE --master_info_repository=TABLE --sync_master_info=1 --relay-log-recovery=1 &
[30 Oct 2013 8:35]
Jon Stephens
Fixed in 5.6+. Documented fix in the 5.6.15 and 5.7.3 changelogs, as follows:
When stopping the I/O thread, it was possible with a very large
transaction (equivalent to a binary log size greater than 100MB)
that the thread did not receive the transaction to the end. When
reconnecting with MASTER_AUTO_POSITION=1, it then tried to fetch
changes from the next transaction, which could lead to loss of
the incomplete transaction and its data.
Closed.
[30 Oct 2013 9:04]
Venkatesh Duggirala
Bug #69758 marked as duplicate of this one.
[4 Dec 2013 12:10]
Laurynas Biveinis
5.6$ bzr log -r 5567 ------------------------------------------------------------ revno: 5567 committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com> branch nick: mysql-5.6 timestamp: Wed 2013-10-30 11:04:38 +0530 message: Bug#17280176 TRANSACTIONS SKIPPED ON SLAVE AFTER "STOP/START SLAVE" USING GTID REPLICATION Analysis: Slave updates 'GTID_RETRIEVED' set upon receiving GTID_LOG_EVENT for a particular transaction which is first event in the event group. Say, I/O thread is stopped *after* adding GTID number to 'gtid_trieved' set and *before* it actually retrieves all the events from that GTID event group. Next time when this I/O thread is reconnected, it sends union of GTID_RETRIEVED + GTID_EXECUTED set to master. So Master thinks that slave is having all the events from this GTID set(which includes partially retrieved GTID) and it will not resend them again. Hence slave is missing some events for ever. Fix: It is not easy to find the end of a group of events. So mysql server is unsure whether I/O thread retrieved the last gtid transaction events completely or not (before it is going down because of a crash/normal shutdown/normal stop slave io_thread). It is possible that I/O thread would have retrieved and written only partial transaction events. So Server will request Master to send the last gtid event once again. We do this by removing the last I/O thread retrieved gtid event from "Retrieved_gtid_set". Possible cases: 1) I/O thread would have retrieved full transaction already in the first time itself, but retrieving them again will not cause problem because GTID number is same, Hence SQL thread will not commit it again. 2) I/O thread would have retrieved full transaction already and SQL thread would have already executed it. In that case, We are not going remove last retrieved gtid from "Retrieved_gtid_set" otherwise we will see gaps in "Retrieved set".
[3 Feb 2014 10:12]
Laurynas Biveinis
5.6$ bzr log -r 5586 ------------------------------------------------------------ revno: 5586 committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com> branch nick: mysql-5.6 timestamp: Mon 2013-11-04 21:47:16 +0530 message: Bug#17280176 TRANSACTIONS SKIPPED ON SLAVE AFTER "STOP/START SLAVE" USING GTID REPLICATION Fixing post push solaris compiler warning "last_retrieved_gtid hides Relay_log_info::last_retrieved_gtid."
