Bug #77239 SQL thread status information is lost if multi-threaded slave is enabled
Submitted: 4 Jun 2015 2:24 Modified: 4 Jun 2015 7:20
Reporter: Davi Arnaut (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6, 5.6.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: Last_SQL_Errno, Last_SQL_Error, MTS, show slave status, sql thread

[4 Jun 2015 2:24] Davi Arnaut
Description:
When multi-threaded slave is enabled, an error that causes the SQL worker threads to stop is not preserved in the slave status information (SHOW SLAVE STATUS).

How to repeat:
Setup a slave with slave_parallel_workers greater than 0.

master> CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
Query OK, 0 rows affected (0.03 sec)

master> INSERT INTO t1 VALUES (1);
Query OK, 1 row affected (0.01 sec)

slave> SET GLOBAL innodb_lock_wait_timeout = 1;
Query OK, 0 rows affected (0.00 sec)

slave> BEGIN;
Query OK, 0 rows affected (0.00 sec)

slave> SELECT * FROM t1 WHERE a = 1 FOR UPDATE;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

slave> SELECT SLEEP(1);
+----------+
| SLEEP(1) |
+----------+
|        0 |
+----------+
1 row in set (1.00 sec)

slave> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 5555
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 393
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 361
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1756
                   Last_Error: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details).
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 151
              Relay_Log_Space: 808
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1756
               Last_SQL_Error: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details).
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 10
                  Master_UUID: 157c7cfa-0a32-11e5-9002-e83935312055
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 150603 19:21:16
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 157c7cfa-0a32-11e5-9002-e83935312055:1
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

Last_SQL_Errno is 1756 (ER_MTS_INCONSISTENT_DATA), the timeout error information is not available.

Error log:

2015-06-03 19:21:16 13646 [Warning] Slave SQL: Worker 49 failed executing transaction '157c7cfa-0a32-11e5-9002-e83935312055:1' at master log mysql-bin.000001, end_log_pos 362; Could not execute Update_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log FIRST, end_log_pos 362, Error_code: 1205
2015-06-03 19:21:16 13646 [ERROR] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2015-06-03 19:21:16 13646 [Note] Error reading relay log event: slave SQL thread was killed
2015-06-03 19:21:16 13646 [Warning] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2015-06-03 19:21:16 13646 [Warning] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
[4 Jun 2015 7:20] Umesh Shastry
Hello Davi,

Thank you for report and test case.
Observed this with latest 5.6.26 build.

Thanks,
Umesh
[4 Jun 2015 7:20] Umesh Shastry
// Slave

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 15000
                Connect_Retry: 60
              Master_Log_File: master-bin.000003
          Read_Master_Log_Pos: 1839
               Relay_Log_File: slave-relay-bin.000003
                Relay_Log_Pos: 1095
        Relay_Master_Log_File: master-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1756
                   Last_Error: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details).
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1624
              Relay_Log_Space: 1647
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1756
               Last_SQL_Error: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details).
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: e447b07d-0a83-11e5-9218-0010e05f3e06
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 150604 09:16:00
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
1 row in set (0.00 sec)

##

2015-06-04 09:16:00 7429 [Warning] Slave SQL: Worker 4 failed executing transaction '' at master log master-bin.000003, end_log_pos 1808; Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'update t1 set a = 3 where a = 2', Error_code: 1205
2015-06-04 09:16:00 7429 [ERROR] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2015-06-04 09:16:00 7429 [Note] Error reading relay log event: slave SQL thread was killed
2015-06-04 09:16:00 7429 [Warning] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2015-06-04 09:16:00 7429 [Warning] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
[1 Feb 2016 6:41] Umesh Shastry
Bug #80180 marked as duplicate of this