MySQL Bugs: #72131: Slave running state: System lock

Bug #72131	Slave running state: System lock
Submitted:	26 Mar 2014 5:52	Modified:	24 Apr 2020 19:41
Reporter:	Shlomi Noach (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.16-log	OS:	Linux (CentOS 5.9)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	replication, slave lag, System lock

Description:
After upgrading a slave server from 5.5 to 5.6.16-log (and following the standard upgrade procedure), the slave finds it very hard to catch up.
Synopsis is increased disk utilization (from 20%-30% on 5.5 to 70%-80% on 5.6) and a consistent "System lock" in SHOW SLAVE STATUS:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: zzzzzzzzzzz
                  Master_User: zzzzzzzzzzz
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysqld-bin.000060
          Read_Master_Log_Pos: 772040979
               Relay_Log_File: mysqld-relay-bin.000026
                Relay_Log_Pos: 771722680
        Relay_Master_Log_File: mysqld-bin.000060
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 771722520
              Relay_Log_Space: 772041353
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 16
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 10005
                  Master_UUID:
             Master_Info_File: /path/to/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 0
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0

Furthermore, on performance_schema.threads there seems to be another "system lock" thread, which says "INTERNAL DDL LOG RECOVER IN PROGRESS".

mysql> select * from performance_schema.threads where processlist_state='system lock';

+-----------+----------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+-------------------+--------------------------------------+------------------+------+--------------+
| THREAD_ID | NAME                 | TYPE       | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE | PROCESSLIST_INFO                     | PARENT_THREAD_ID | ROLE | INSTRUMENTED |
+-----------+----------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+-------------------+--------------------------------------+------------------+------+--------------+
|         1 | thread/sql/main      | BACKGROUND |           NULL | NULL             | NULL             | NULL           | NULL                |            47142 | System lock       | INTERNAL DDL LOG RECOVER IN PROGRESS |             NULL | NULL | YES          |
|    412044 | thread/sql/slave_sql | BACKGROUND |           NULL | shlomi           | localhost        | NULL           | NULL                |              111 | System lock       | NULL                                 |           409251 | NULL | YES          |
+-----------+----------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+-------------------+--------------------------------------+------------------+------+--------------+

mysql> show global variables like '%locking%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| skip_external_locking | ON    |
+-----------------------+-------+

all tables except system tables are InnoDB. InnoDB configuration attached.

How to repeat:
This problem happens on two different servers, upon upgrading from 5.5 to 5.6
Otherwise there is no change to the underlying server.

innodb configuration

Attachment: innodb-configuration.txt (text/plain), 8.45 KiB.

Last, I should add this happens on two machines with local SAS spindle disks; on another two SSD hosts the problem is not visible, which, again, correlates with the fact the disk utilization is much higher on 5.6; the SSD are apparently able to contain this.

Same happens to me.

Slave cannot catch up due to:
-- Connection Id: 0
-- User: None
-- Host: None
-- DB: None
-- Command: None
-- Time: 584
-- State: System lock
INTERNAL DDL LOG RECOVER IN PROGRESS

OS: Windows Server 2008 R2

MySQL Server 5.6.17

Log:
2014-04-07 08:58:35 3632 [Note] Plugin 'FEDERATED' is disabled.
2014-04-07 08:58:35 3632 [Note] InnoDB: Using atomics to ref count buffer pool pages
2014-04-07 08:58:35 3632 [Note] InnoDB: The InnoDB memory heap is disabled
2014-04-07 08:58:35 3632 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2014-04-07 08:58:35 3632 [Note] InnoDB: Compressed tables use zlib 1.2.3
2014-04-07 08:58:35 3632 [Note] InnoDB: Not using CPU crc32 instructions
2014-04-07 08:58:35 3632 [Note] InnoDB: Initializing buffer pool, size = 5.0G
2014-04-07 08:58:36 3632 [Note] InnoDB: Completed initialization of buffer pool
2014-04-07 08:58:36 3632 [Note] InnoDB: Highest supported file format is Barracuda.
2014-04-07 08:58:37 3632 [Note] InnoDB: 128 rollback segment(s) are active.
2014-04-07 08:58:37 3632 [Note] InnoDB: Waiting for purge to start
2014-04-07 08:58:37 3632 [Note] InnoDB: 5.6.17 started; log sequence number 517347371961
2014-04-07 08:58:37 3632 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
2014-04-07 08:58:37 3632 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
2014-04-07 08:58:37 3632 [Note] Server socket created on IP: '0.0.0.0'.
2014-04-07 08:58:38 3632 [Note] Event Scheduler: Loaded 0 events
2014-04-07 08:58:38 3632 [Note] C:\MySQL\MySQL5617\bin\mysqld: ready for connections.
Version: '5.6.17'  socket: ''  port: 3306  MySQL Community Server (GPL)
2014-04-07 09:13:19 3632 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.006191' at position 740899989, relay log '.\xxxxxxx-relay-bin.000896' position: 514519766
2014-04-07 09:13:19 3632 [Note] Slave I/O thread: connected to master 'xxxxx@xxxxxxxx:3306',replication started in log 'mysql-bin.006193' at position 917735472

Some more data on my comment:

CPU load is close to 50% while the slave is running, transfering master binlog to relay log. IO is heavy on disk. Stopping the slave takes 5 minutes. After stopping the slave CPU is idle, no IO, but still a system log "INTERNAL DDL LOG RECOVER IN PROGRESS"
There are no other connections to the server apart from replication threads and workbench monitoring.

Update:
No error anymore on my side. It was just the slave sql thread taking a long time to catch up. The DDL message and the System Lock state did not block the execution of the slave thread.

Same happens here. We are running 5.6.20-log on CentOS 6.5 with kernel 2.6.32-431.23.3.el6.x86_64

well, the system lock part is probably bogus, i had filed an internal report long ago for another case: Bug 16189134 - "SYSTEM LOCK" THREAD STATE IS NONSENSE. 

Not sure about the lagging though.

Happens also to me with 5.6.20 and 5.7.4 - the sql thread sits most time in INTERNAL DDL LOG RECOVER IN PROGRESS and the slave gets delayed more and more (can't catch up to master).

The only workarround I have found (was suggested in similar issue forum thread) is to use MariaDB as a slave - at least 10.0.11 which is based on 5.6 has no problems.

We experience also the same problem with 5.6.15.
The delay/lag keeps growing and is now 10-11 hours...
Is there any forecast for a solution ? Any workaround ?

Best regards

We've tried to implement the Multi-Thread slave replication, separating the main DB into three DBs and setting the value slave_parallel_workers to 3.
This didn't help also, and the lag is still being growing slowly...
Any sugestions, please ?

It looks like the category for this bug is wrong as 'Server: ClusterRep' is for replication between NDB Cluster isn't it?

we noticed the same issue. our slave is 5.6.14(percona server).

but for us we noticed that server-uuid (auto.cnf) was the same on both slaves as we had rebuilt a slave from the other.

once both were different - the lag reduced.

Exactly the same problem with our master !

-- Connection Id: 0
-- User: None
-- Host: None
-- DB: None
-- Command: None
-- Time: 5657
-- State: System lock
INTERNAL DDL LOG RECOVER IN PROGRESS

The master is very slow sinds we have this line in our PROCESSLIST.

We restarted Mysql Server but this line is still there.

All is OK with mysqlcheck.

Nothing in MYSQL Start:

2015-01-20 08:31:38 31996 [Note] Plugin 'FEDERATED' is disabled.
2015-01-20 08:31:38 49ac InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
2015-01-20 08:31:38 31996 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-01-20 08:31:38 31996 [Note] InnoDB: The InnoDB memory heap is disabled
2015-01-20 08:31:38 31996 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2015-01-20 08:31:38 31996 [Note] InnoDB: Memory barrier is not used
2015-01-20 08:31:38 31996 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-01-20 08:31:38 31996 [Note] InnoDB: Not using CPU crc32 instructions
2015-01-20 08:31:38 31996 [Note] InnoDB: Initializing buffer pool, size = 780.0M
2015-01-20 08:31:38 31996 [Note] InnoDB: Completed initialization of buffer pool
2015-01-20 08:31:38 31996 [Note] InnoDB: Highest supported file format is Barracuda.
2015-01-20 08:31:39 31996 [Note] InnoDB: 128 rollback segment(s) are active.
2015-01-20 08:31:39 31996 [Note] InnoDB: Waiting for purge to start
2015-01-20 08:31:39 31996 [Note] InnoDB: 5.6.20 started; log sequence number 65854550223
2015-01-20 08:31:40 31996 [Note] Server hostname (bind-address): '*'; port: 3406
2015-01-20 08:31:40 31996 [Note] IPv6 is available.
2015-01-20 08:31:40 31996 [Note]   - '::' resolves to '::';
2015-01-20 08:31:40 31996 [Note] Server socket created on IP: '::'.
2015-01-20 08:31:40 31996 [Note] Event Scheduler: Loaded 0 events
2015-01-20 08:31:40 31996 [Note] C:\Program Files\MySQL\MySQL Server 5.6\bin\mysqld: ready for connections.
Version: '5.6.20-log'  socket: ''  port: 3406  MySQL Community Server (GPL)

How can we resolve this matter ? 

Michael

For those reporting the main thread in DDL recovery, this is another bug, where that state is not getting updated once done. See:

http://bugs.mysql.com/bug.php?id=74517

Others that report a lot of time spent in System lock state here, you may also be affected by this bug, if you are using replication filtering rules:

http://bugs.mysql.com/bug.php?id=76049

Otherwise, keeping this bug open until we can gather some more data about where the SQL thread is spending its time. 

For others reporting issues here, if you can gather some performance schema data for affected slaves for a little while, that would be great. Something like this:

/* Enable all instrumentation, but only for the slave SQL thread */
SET @sql_thread = (SELECT thread_id FROM performance_schema.threads WHERE name = 'thread/sql/slave_sql');
UPDATE performance_schema.threads SET instrumented = 'NO' WHERE thread_id != @sql_thread;
UPDATE performance_schema.setup_instruments SET enabled = 'YES', timed = 'YES';
UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name LIKE '%history_long' OR name LIKE '%current';

/* Let the SQL thread run for a minute or so */

/* Get stage and statement raw details */
SELECT  event_id, event_name, timer_wait, lock_time FROM performance_schema.events_statements_history_long WHERE thread_id =  @sql_thread;

SELECT event_name, timer_wait, nesting_event_id FROM performance_schema.events_stages_history_long WHERE thread_id =  @sql_thread;

/* Get some thread wait summary data */
SELECT event_name, count_star, sum_timer_wait FROM performance_schema.events_waits_summary_by_thread_by_event_name WHERE thread_id =  @sql_thread AND sum_timer_wait > 0 ORDER BY sum_timer_wait DESC;

All data should be anonymous. Please upload outputs as a file to this bug.

HI Mark,

O have collected the required information.  partial output is as following

| stage/sql/Opening tables                   |      2756000 |          1801101 |
| stage/sql/System lock                      |    560448000 |          1801101 |
| stage/sql/Reading event from the relay log |      4631000 |          1801101 |
| stage/sql/Reading event from the relay log |      2564000 |          1801101 |
| stage/sql/Opening tables                   |      3582000 |          1801101 |
| stage/sql/System lock                      |    527945000 |          1801101 |
| stage/sql/Reading event from the relay log |      4575000 |          1801101 |
| stage/sql/Reading event from the relay log |      2666000 |          1801101 |
| stage/sql/Opening tables                   |      3678000 |          1801101 |
| stage/sql/System lock                      |    532084000 |          1801101 |
| stage/sql/Reading event from the relay log |      4393000 |          1801101 |
| stage/sql/Reading event from the relay log |      2739000 |          1801101 |
| stage/sql/Opening tables                   |      3586000 |          1801101 |
+--------------------------------------------+--------------+------------------+
10000 rows in set (0.03 sec)

mysql> SELECT  event_id, event_name, timer_wait, lock_time FROM performance_schema.events_statements_history_long WHERE thread_id =  @sql_thread;
+----------+---------------------+-------------+-----------+
| event_id | event_name          | timer_wait  | lock_time |
+----------+---------------------+-------------+-----------+
|  2369848 | statement/sql/begin | 13841030000 |         0 |
|  2509715 | statement/sql/begin |    37236000 |         0 |
|  2648539 | statement/sql/begin |    35485000 |         0 |
+----------+---------------------+-------------+-----------+
3 rows in set (0.66 sec)

mysql> SELECT event_name, count_star, sum_timer_wait FROM performance_schema.events_waits_summary_by_thread_by_event_name WHERE thread_id =  @sql_thread AND sum_timer_wait > 0
ORDER BY sum_timer_wait DESC;
+--------------------------------------+------------+--------------------+
| event_name                           | count_star | sum_timer_wait     |
+--------------------------------------+------------+--------------------+
| wait/io/table/sql/handler            |  244200923 | 698374214490644692 |
| wait/io/file/innodb/innodb_data_file |   96223806 | 677203805528335034 |
| wait/io/file/sql/binlog              |    4900931 |    190905810578112 |
| wait/lock/table/sql/handler          |  436707764 |    178480329175026 |
| wait/io/file/innodb/innodb_log_file  |    1805808 |    138900438268436 |
| wait/io/file/sql/binlog_index        |       3749 |      1044853993580 |
| wait/io/file/sql/FRM                 |       3018 |       873853852266 |
| wait/io/file/myisam/dfile            |         13 |        42253565120 |
| wait/io/file/sql/dbopt               |         46 |        35141396508 |
| wait/io/file/sql/relaylog            |        219 |         1559657756 |
| wait/io/file/sql/misc                |          3 |          725895560 |
| wait/io/file/myisam/kfile            |          4 |           69398520 |
+--------------------------------------+------------+--------------------+

At a customer running MySQL 5.6.26, I have just encountered the very same symptom.
I have executed the commands listed by Mark, collected the results in a file, and uploaded it.
The file name is "mysql-bug-72131-fromdual.out"

Experiencing the same issue on 5.6.26-log MySQL Community Server (GPL) - CentOS release 6.7 (Final):

mysql> select * from performance_schema.threads where processlist_state='System lock';
+-----------+----------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+-------------------+------------------+------------------+------+--------------+
| THREAD_ID | NAME                 | TYPE       | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE | PROCESSLIST_INFO | PARENT_THREAD_ID | ROLE | INSTRUMENTED |
+-----------+----------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+-------------------+------------------+------------------+------+--------------+
|        49 | thread/sql/slave_sql | BACKGROUND |           NULL | root             | localhost        | NULL           | NULL                |           344745 | System lock       | NULL             |               47 | NULL | YES          |
+-----------+----------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+-------------------+------------------+------------------+------+--------------+
1 row in set (0.00 sec)

mysql> SELECT  event_id, event_name, timer_wait, lock_time FROM performance_schema.events_statements_history_long WHERE thread_id =  @sql_thread;
+----------+---------------------+------------+-----------+
| event_id | event_name          | timer_wait | lock_time |
+----------+---------------------+------------+-----------+
| 13718647 | statement/sql/begin |   52288000 |         0 |
+----------+---------------------+------------+-----------+
1 row in set (0.00 sec)

mysql> SELECT event_name, timer_wait, nesting_event_id FROM performance_schema.events_stages_history_long WHERE thread_id =  @sql_thread;
+--------------------------------------------+----------------+------------------+
| event_name                                 | timer_wait     | nesting_event_id |
+--------------------------------------------+----------------+------------------+
| stage/sql/Reading event from the relay log |    47165745000 |         11605404 |
| stage/sql/Reading event from the relay log |       71528000 |         13718647 |
| stage/sql/init                             |       37965000 |         13718647 |
| stage/sql/query end                        |        2941000 |         13718647 |
| stage/sql/closing tables                   |        1139000 |         13718647 |
| stage/sql/freeing items                    |        8966000 |         13718647 |
| stage/sql/Reading event from the relay log |       33755000 |         13718647 |
| stage/sql/Reading event from the relay log |       57454000 |         13718647 |
| stage/sql/Opening tables                   |       37059000 |         13718647 |
| stage/sql/System lock                      | 69171334087000 |         13718647 |
+--------------------------------------------+----------------+------------------+
10 rows in set (0.00 sec)

mysql> SELECT event_name, count_star, sum_timer_wait FROM performance_schema.events_waits_summary_by_thread_by_event_name WHERE thread_id =  @sql_thread AND sum_timer_wait > 0 ORDER BY sum_timer_wait DESC;
+--------------------------------------+--------------+---------------------+
| event_name                           | count_star   | sum_timer_wait      |
+--------------------------------------+--------------+---------------------+
| wait/io/table/sql/handler            | 144402516504 | 1250404072951669598 |
| wait/io/file/innodb/innodb_log_file  |      3604843 |   15937497211137986 |
| wait/io/file/myisam/dfile            |  15002210054 |    8803393128725168 |
| wait/io/file/innodb/innodb_data_file |       255599 |     214974113336394 |
| wait/io/file/sql/relaylog            |      2065921 |      25718241091952 |
| wait/lock/table/sql/handler          |     16147358 |      11866836325958 |
| wait/io/file/sql/binlog_index        |        13791 |      11127636839838 |
| wait/io/file/myisam/kfile            |      3156873 |      10025299622808 |
| wait/io/file/sql/binlog              |         2654 |       5698083408876 |
| wait/io/file/sql/FRM                 |        68006 |       2327044962592 |
| wait/io/file/innodb/innodb_temp_file |         1842 |       1571795898132 |
| wait/io/file/archive/data            |          172 |           825258106 |
| wait/io/file/sql/dbopt               |           25 |           743358366 |
| wait/io/file/sql/misc                |            3 |           230464136 |
+--------------------------------------+--------------+---------------------+
14 rows in set (0.01 sec)

performance schema data for affected slave

Attachment: benge.txt (text/plain), 989.87 KiB.

performance schema data for affected slaveperformance schema data for affected slave

Attachment: sparkling-bonus.txt (text/plain), 1.36 MiB.

performance schema data for affected slave

Attachment: dark-haze.txt (text/plain), 1.33 MiB.

performance schema data for affected slave

Attachment: old-atom.txt (text/plain), 1.32 MiB.

We are seeing a similar issue. I uploaded the requested performance schema data for some affected slaves.

dark-haze and benge have the same master. The other slaves have different masters. dark-haze is running 5.6.23 and the other slaves are running 5.6.25.

Linux benge 3.18.10-57.el6.x86_64 #1 SMP Mon Apr 6 15:58:44 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 6.6 (Final)

Linux dark-haze 3.18.19-61.el6.x86_64 #1 SMP Mon Jul 27 11:04:37 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 6.6 (Final)

Linux old-atom 3.18.18-60.el6.x86_64 #1 SMP Mon Jul 13 12:27:34 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 6.6 (Final)

Linux sparkling-bonus 3.18.19-61.el6.x86_64 #1 SMP Mon Jul 27 11:04:37 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 6.6 (Final)

could this system lock caused by auto-inc like in http://bugs.mysql.com/bug.php?id=79324

In my case the such system lock has been caused ROW-based replication with huge tables without primary key (but with indexes present). Every replicated UPDATE to such tables takes a long time on the slave with system lock.

I have a cluster running version 5.7.18 with multiple slaves that can't keep up. Newly created slaves just continue to fall further behind.

I created various slaves using different machine configurations and the servers are mostly idle.

Here is a sample excerpt from show slave status

               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 369898343
              Relay_Log_Space: 31147353955
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 9946
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 

           Id: 488
         User: system user
         Host: 
           db: NULL
      Command: Connect
         Time: 10148
        State: System lock
         Info: NULL
    Rows_sent: 0
Rows_examined: 0

I have a cluster running version 5.7.18 with multiple slaves that can't keep up. Newly created slaves just continue to fall further behind.

I created various slaves using different machine configurations and the servers are mostly idle.

Here is a sample excerpt from show slave status

               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 369898343
              Relay_Log_Space: 31147353955
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 9946
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 

           Id: 488
         User: system user
         Host: 
           db: NULL
      Command: Connect
         Time: 10148
        State: System lock
         Info: NULL
    Rows_sent: 0
Rows_examined: 0

Lag on one Read Replica

Attachment: Screen Shot 2017-10-02 at 7.53.48 PM.png (image/png, text), 168.60 KiB.

We observed same problem on AWS RDS ( MYSQL 5.7.16 ) one of our slave ( read replica ) show System Lock on SQL Thread and Second Behind Master start increasing , where all other salves(read replica ) remain sync with Master, where all has same machine type (class) , configuration and storage .

But when we set sync_binlog=0, the lag drop suddenly and within few minutes it become zero.

Hey Khurram Naseem,

As you're running 5.7 on RDS, were you able to check on CloudWatch or anywhere else if you have I/O pressure or something else on the hardware side?

Hey Wagner,

Thank you.

Yes we are running RDS with MySQL 5.7(5.7.16) and We Checked all RDS metrics and didn't find load any where I means IOPS and Latency seems OK .

Do you know any specific Cloudwatch Metrics which I should check ?

Thank you

You can monitor the I/O for read, write, and/or metadata operations, client connections, and burst disk/cpu bound related would reveal an issue on the underlying hardware.

If I run transaction and block applier threads on the slave, system lock is occurred every time.  I think it is normal.

However, even I don't run transaction on the slave, system lock is occurred on the slave under heavy load on the master.  Transaction on the master is very small.

Please check below.  I hope it can solve under replication environment.
It will be cause of large time lags, Seconds_Behind_Master.

[ON MASTER]

root@localhost [REPLI]> show create table T_SLAP01\G
*************************** 1. row ***************************
       Table: T_SLAP01
Create Table: CREATE TABLE `T_SLAP01` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `n_time` varchar(30) DEFAULT NULL,
  `s_time` varchar(30) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

root@localhost [REPLI]> 

[root@misc01 admin]# cat mysqlslap_exsisting_table.sh 
/usr/local/mysql/bin/mysqlslap -u root -p -S /tmp/mysql.sock --create-schema=REPLI --no-drop -c 10 -i 1000 -q 'INSERT INTO REPLI.T_SLAP01(n_time,s_time) values(now(3),sysdate(6))' 
[root@misc01 admin]# 

[root@misc01 admin]# ./mysqlslap_exsisting_table.sh 
Enter password: 
Benchmark
        Average number of seconds to run all queries: 0.046 seconds
        Minimum number of seconds to run all queries: 0.018 seconds
        Maximum number of seconds to run all queries: 0.135 seconds
        Number of clients running queries: 10
        Average number of queries per client: 1

[root@misc01 admin]# 

[ON SLAVE]

root@localhost [REPLI]> show processlist; show engine innodb status\G show slave status\G
+-----+-----------------+--------------+-------+---------+------+----------------------------------+------------------+
| Id  | User            | Host         | db    | Command | Time | State                            | Info             |
+-----+-----------------+--------------+-------+---------+------+----------------------------------+------------------+
|   1 | system user     |              | NULL  | Connect |  888 | Waiting for master to send event | NULL             |
|   2 | system user     |              | NULL  | Connect |   -1 | System lock                      | NULL             |
|   3 | event_scheduler | localhost    | NULL  | Daemon  |  887 | Waiting on empty queue           | NULL             |
|  13 | root            | localhost    | REPLI | Query   |    0 | starting                         | show processlist |
|  29 | admin           | Labs01:44555 | mysql | Sleep   |    3 |                                  | NULL             |
| 763 | admin           | Labs01:47755 | mysql | Sleep   |   55 |                                  | NULL             |
+-----+-----------------+--------------+-------+---------+------+----------------------------------+------------------+
6 rows in set (0.00 sec)

*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2018-03-23 15:32:58 0x7fcdb65f0700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 10 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 162 srv_active, 0 srv_shutdown, 728 srv_idle
srv_master_thread log flush and writes: 890
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 111
OS WAIT ARRAY INFO: signal count 107
RW-shared spins 0, rounds 65, OS waits 31
RW-excl spins 0, rounds 1110, OS waits 23
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 65.00 RW-shared, 1110.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 756832
Purge done for trx's n:o < 756831 undo n:o < 0 state: running but idle
History list length 11
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421997346109728, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421997346107904, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421997346106992, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421997346106080, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 756831, ACTIVE (PREPARED) 0 sec
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
MySQL thread id 2, OS thread handle 140521468139264, query id 69415 System lock

<SNIP>

--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=8340, Main thread ID=140522077779712, state: sleeping
Number of rows inserted 31953, updated 30756, deleted 0, read 32044
72.99 inserts/s, 72.99 updates/s, 0.00 deletes/s, 72.99 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set (0.01 sec)

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.56.113
                  Master_User: repl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002848
          Read_Master_Log_Pos: 59249068
               Relay_Log_File: misc02-relay-bin.000766
                Relay_Log_Pos: 12937379
        Relay_Master_Log_File: mysql-bin.002848
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: REPLI
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 59248226
              Relay_Log_Space: 12938485
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 2dde009f-d4dc-11e4-b437-0800279cea3c
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 2dde009f-d4dc-11e4-b437-0800279cea3c:216437-234189:266583-810356
            Executed_Gtid_Set: 2d0d3b32-4269-11e6-aca7-0800275fa837:1-40,
2dde009f-d4dc-11e4-b437-0800279cea3c:1-810354,
78c3d305-5716-11e7-98fa-0800275fa837:1
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

root@localhost [REPLI]> 

Is it possible to solve?

I confirm "System lock" Status with following query.
It seems "System lock" means preparing transaction on the slave in my case.

[Query]
select THREAD_ID,SOURCE,EVENT_NAME,TIMER_WAIT,LOCK_TIME,CURRENT_SCHEMA,
SQL_TEXT,(timer_wait)/1000000000.0 "t (ms)",ROWS_EXAMINED 
from performance_schema.events_statements_history order by TIMER_START desc;

[Output]
'76', 'log_event.cc:4698', 'statement/sql/begin', '67906000', '0', 'REPLI', 'BEGIN', '0.0679', '0'
'76', 'log_event.cc:4698', 'statement/sql/begin', '43836000', '0', 'REPLI', 'BEGIN', '0.0438', '0'
'76', 'log_event.cc:4698', 'statement/sql/begin', '40090000', '0', 'REPLI', 'BEGIN', '0.0401', '0'
'76', 'log_event.cc:4698', 'statement/sql/begin', '38171000', '0', 'REPLI', 'BEGIN', '0.0382', '0'
'76', 'log_event.cc:4698', 'statement/sql/begin', '38896000', '0', 'REPLI', 'BEGIN', '0.0389', '0'
'76', 'log_event.cc:4698', 'statement/sql/begin', '37802000', '0', 'REPLI', 'BEGIN', '0.0378', '0'
<SNIP>

[Source]
https://github.com/mysql/mysql-server/blob/5.7/sql/log_event.cc#L4694

Regards
Shinya

View for confirm system lock

Attachment: sys.system_lock_check.sql (application/octet-stream, text), 868 bytes.

Hi Shlomi,

Revisiting this after some years with 5.7.28 and I can't get this to happen again. Redid all you reiterated through the issue from the start and it looks like with 5.7.28 this issue does not exist any more.

I hope you too are not experiencing this any more.

Kind regards
Bogdan