Bug #78991 WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS hangs after RESET MASTER
Submitted: 28 Oct 2015 2:03 Modified: 6 Nov 2015 20:47
Reporter: monty solomon Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.25 OS:CentOS
Assigned to: CPU Architecture:Any

[28 Oct 2015 2:03] monty solomon
Description:
WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS hangs when used after RESET MASTER

# Slave lucky-hola:3306:
# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17504', 300)
# Return Code = -1
Slave lucky-hola:3306 did not catch up to the master.
ERROR: Slave lucky-hola:3306 did not catch up to the master.

How to repeat:
mysql> show master status\G
*************************** 1. row ***************************
             File: bin.000054
         Position: 59383
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17420,
cfca8b9d-4ba3-11e5-bac1-0a3a87866f75:1-11
1 row in set (0.00 sec)

mysql> reset master;
Query OK, 0 rows affected (0.02 sec)

mysql> show master status\G
*************************** 1. row ***************************
             File: bin.000001
         Position: 1465
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:17468-17470
1 row in set (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: white-fog
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000052
          Read_Master_Log_Pos: 51357
               Relay_Log_File: relay.000034
                Relay_Log_Pos: 51555
        Relay_Master_Log_File: bin.000052
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 51357
              Relay_Log_Space: 51830
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1762385168
                  Master_UUID: 17f11668-4ba8-11e5-bade-0ec3c09abde1
             Master_Info_File: /opt/mysql/dbdata1/monty-q2/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:98-17504
            Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:17468-17504
                Auto_Position: 1
1 row in set (0.00 sec)

mysql> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17504', 5);
+--------------------------------------------------------------------------------------+
| WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17504', 5) |
+--------------------------------------------------------------------------------------+
|                                                                                   -1 |
+--------------------------------------------------------------------------------------+
1 row in set (5.00 sec)

mysql> select GTID_SUBTRACT(@@global.gtid_executed, @@global.gtid_purged);;
+-------------------------------------------------------------+
| GTID_SUBTRACT(@@global.gtid_executed, @@global.gtid_purged) |
+-------------------------------------------------------------+
| 17f11668-4ba8-11e5-bade-0ec3c09abde1:17468-17504            |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select GTID_SUBSET('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17504', @@global.gtid_executed);
+-------------------------------------------------------------------------------------+
| GTID_SUBSET('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17504', @@global.gtid_executed) |
+-------------------------------------------------------------------------------------+
|                                                                                   0 |
+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[29 Oct 2015 15:22] Sinisa Milivojevic
Hi,

Thank you for your report. Can you tell us how long does a slave hang up after these commands:

Slave lucky-hola:3306 did not catch up to the master.
ERROR: Slave lucky-hola:3306 did not catch up to the master.

Also, can you send us the commands that SQL thread was executing after the problematic SELECT was sent. Please, also upload your my.cnf for the slave server.
[29 Oct 2015 23:17] monty solomon
The WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS command hangs for the full duration of the specified timeout waiting for the slave to execute the specified GTID set.

# Slave lucky-hola:3306:
# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-179697', 300)

mysql> show master status\G
*************************** 1. row ***************************
             File: bin.000144
         Position: 174221
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:1-179697
1 row in set (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: white-fog
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000144
          Read_Master_Log_Pos: 174221
               Relay_Log_File: relay.000310
                Relay_Log_Pos: 103406
        Relay_Master_Log_File: bin.000144
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 174221
              Relay_Log_Space: 174769
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1762385168
                  Master_UUID: 17f11668-4ba8-11e5-bade-0ec3c09abde1
             Master_Info_File: /opt/mysql/dbdata1/monty-q2/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:98-179697
            Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:17468-179697
                Auto_Position: 1
1 row in set (0.00 sec)
[30 Oct 2015 16:36] Sinisa Milivojevic
This is expected behavior, unless waiting gets much, much longer then the value specified for the  "timeout" parameter.

How much time it takes to executed GTIDs if a timeout is set to 0 ????

Some commands may take a very , very long time, particularly if  wait for mutex locks occurs.
[6 Nov 2015 20:39] monty solomon
The slave was not executing any statements.
[6 Nov 2015 20:47] monty solomon
The slave was caught up to the master but it appears that the WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS() was waiting for the slave to execute previous GTIDs.

SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17504', 5);

Excerpt from SHOW SLAVE STATUS

           Retrieved_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:98-17504
            Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:17468-17504

I just noticed that the RESET MASTER on the slave changed the Executed_Gtid_Set from

Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:1-17420,
cfca8b9d-4ba3-11e5-bac1-0a3a87866f75:1-11

to

Executed_Gtid_Set: 17f11668-4ba8-11e5-bade-0ec3c09abde1:17468-17470