Bug #84415 slave don't report Seconds_Behind_Master when running slave_parallel_workers > 0
Submitted: 4 Jan 2017 21:58 Modified: 8 Feb 2018 13:51
Reporter: Marcelo Altmann (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.15, 5.7.17 OS:Linux
Assigned to: CPU Architecture:Any
Tags: replication

[4 Jan 2017 21:58] Marcelo Altmann
Description:
When running mysql with slave_parallel_workers > 0 , Seconds_Behind_Master from SHOW SLAVE STATUS doesn't report replication lag.

From SHOW PROCESSLIST, Id 3 is executing an event from 76 seconds ago

slave1 [localhost] {msandbox} ((none)) > SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+---------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                                       | Info             |
+----+-------------+-----------+------+---------+------+---------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect |  274 | Waiting for master to send event            | NULL             |
|  2 | system user |           | NULL | Connect |   11 | Waiting for dependent transaction to commit | NULL             |
|  4 | system user |           | NULL | Connect |   76 | Executing event                             | NULL             |
|  5 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
|  6 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
|  7 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
|  8 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
|  9 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 10 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 11 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 12 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 13 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 14 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 15 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 16 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 17 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 18 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 19 | system user |           | NULL | Connect |  274 | Waiting for an event from Coordinator       | NULL             |
| 20 | msandbox    | localhost | NULL | Query   |    0 | starting                                    | SHOW PROCESSLIST |
+----+-------------+-----------+------+---------+------+---------------------------------------------+------------------+
19 rows in set (0.00 sec)

But SHOW SLAVE STATUS doesn't report any lag:
slave1 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 28890
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 569563421
               Relay_Log_File: mysql-relay.000004
                Relay_Log_Pos: 284857134
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 284861060
              Relay_Log_Space: 569564180
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 00028890-1111-1111-1111-111111111111
             Master_Info_File: /home/marcelo.altmann/sandboxes/rsandbox_5_7_15/node1/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for dependent transaction to commit
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

Same happens with slave_parallel_type=DATABASE .

How to repeat:
1) Start slave with slave_parallel_workers > 0

2) Create a table with some rows on master:
sysbench --test=/usr/share/sysbench/tests/db/oltp.lua         --mysql-host=localhost --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox28890.sock --mysql-db=test --oltp-table-size=1500000 prepare

3) Delete a big amount of rows in your master:
DELETE FROM sbtest1 WHERE id > 100;

4) Compare output of SHOW PROCESSLIST and SHOW SLAVE STATUS\G

Suggested fix:
Fix the reported seconds on SHOW SLAVE STATUS or document why it will report 0.
[17 Jan 2017 11:42] MySQL Verification Team
Hello Marcelo,

Thank you for the report.

Thanks,
Umesh
[17 Jan 2017 11:43] MySQL Verification Team
-- 5.7.17

root@localhost [db1]> select count(*) from db1.sbtest1;
+----------+
| count(*) |
+----------+
|  1500000 |
+----------+
1 row in set (0.46 sec)

root@localhost [db1]> DELETE FROM sbtest1 WHERE id > 100;
Query OK, 1499900 rows affected (26.58 sec)

-- slave

root@localhost [(none)]> SHOW PROCESSLIST;SHOW SLAVE STATUS\G
+----+-------------+-----------+------+---------+------+--------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                                            | Info             |
+----+-------------+-----------+------+---------+------+--------------------------------------------------+------------------+
|  3 | root        | localhost | NULL | Query   |    0 | starting                                         | SHOW PROCESSLIST |
|  6 | system user |           | NULL | Connect | 3597 | Waiting for master to send event                 | NULL             |
|  7 | system user |           | NULL | Connect |    0 | Waiting for Slave Workers to free pending events | NULL             |
|  8 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
|  9 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 10 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 11 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 12 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 13 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 14 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 15 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 16 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 17 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 18 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 19 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 20 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 21 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 22 | system user |           | NULL | Connect |   58 | Executing event                                  | NULL             |
+----+-------------+-----------+------+---------+------+--------------------------------------------------+------------------+
18 rows in set (0.00 sec)

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 569579212
               Relay_Log_File: hod03-relay-bin.000003
                Relay_Log_Pos: 284875964
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 284877019
              Relay_Log_Space: 569578687
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
[17 Jan 2017 11:45] MySQL Verification Team
-- earlier message truncated somehow
-- Setup replication

rm -rf master
bin/mysqld --initialize-insecure --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master -v
bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master --core-file --socket=/tmp/mysql_master.sock --port=3306 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master/log.err --log-bin=master-bin --server_id=1 2>&1 &

rm -rf slave
bin/mysqld --initialize-insecure --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave -v
bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave --core-file --socket=/tmp/mysql_slave.sock --port=3307 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave/log.err --log-bin=slave-bin --server_id=2 --skip_slave_start 2>&1 &

-- sysbench

[umshastr@hod03]~/bugs/sysbench: sysbench/sysbench --test=/home/umshastr/bugs/sysbench/sysbench/tests/db/oltp.lua --report-interval=1 --oltp-table-size=1500000 --db-driver=mysql --mysql-user=root --mysql-db=db1 --mysql-socket=/tmp/mysql_master.sock prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 1500000 records into 'sbtest1'
[umshastr@hod03]~/bugs/sysbench:

--master
[25 Jan 2017 22:09] Kenny Gryp
I think this bug should be more a S2 severity. Most monitoring systems rely heavily on seconds_behind_master, this effectively makes monitoring replication lag with show slave status completely useless if you are using parallel replication at all.
[30 Mar 2017 21:39] OCA Admin
Contribution submitted via Github - Bug #84415 slave don't report Seconds_Behind_Master when 
(*) Contribution by Robert Golebiowski (Github robgolebiowski, mysql-server/pull/133#issuecomment-290439173): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_113204019.txt (text/plain), 24.13 KiB.

[30 Mar 2017 21:39] OCA Admin
Contribution submitted via Github - Bug #84415 slave don't report Seconds_Behind_Master when 
(*) Contribution by Robert Golebiowski (Github robgolebiowski, mysql-server/pull/134#issuecomment-290439404): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_113204278.txt (text/plain), 41.83 KiB.

[14 Apr 2017 9:57] Laurynas Biveinis
Bug 84415 fix for 8.0.1

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug84415-8.0.1.patch (application/octet-stream, text), 40.76 KiB.

[15 May 2017 14:49] Robert Golebiowski
The tests that I have submitted sometimes fail. Please find attached a patch which solves the problem.

Attachment: bug84415_test_fix.diff (text/x-patch), 5.62 KiB.

[15 May 2017 14:49] Robert Golebiowski
(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
[4 Aug 2017 19:52] Laurynas Biveinis
Bug 84415 fix for 8.0.2

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug84415-8.0.2.patch (application/octet-stream, text), 39.33 KiB.

[2 Feb 2018 5:23] Laurynas Biveinis
The latest contributed fix applies cleanly and works on 8.0.4 too.
[8 Feb 2018 13:51] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.5 / 5.7.22:

On a multi-threaded replication slave (with slave_parallel_workers greater than 0), the slave's lag behind the master was not being reported by the Seconds_Behind_Master field for SHOW SLAVE STATUS. The value is now reported correctly. Thanks to Robert Golebiowski for the patch.
[14 Nov 2018 7:42] Arie Kachler
We are seeing similar behavior in 5.6.34 and 5.6.41 on masters!
Masters show sporadic slave lag when MTR is enabled.
We run Master-Master pairs, though only master is writeable.
Has this fix been back-ported to 5.6?