| Bug #84415 | slave don't report Seconds_Behind_Master when running slave_parallel_workers > 0 | ||
|---|---|---|---|
| Submitted: | 4 Jan 2017 21:58 | Modified: | 8 Feb 2018 13:51 | 
| Reporter: | Marcelo Altmann (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) | 
| Version: | 5.7.15, 5.7.17 | OS: | Linux | 
| Assigned to: | CPU Architecture: | Any | |
| Tags: | replication | ||
   [17 Jan 2017 11:42]
   MySQL Verification Team        
  Hello Marcelo, Thank you for the report. Thanks, Umesh
   [17 Jan 2017 11:43]
   MySQL Verification Team        
  -- 5.7.17
root@localhost [db1]> select count(*) from db1.sbtest1;
+----------+
| count(*) |
+----------+
|  1500000 |
+----------+
1 row in set (0.46 sec)
root@localhost [db1]> DELETE FROM sbtest1 WHERE id > 100;
Query OK, 1499900 rows affected (26.58 sec)
-- slave
root@localhost [(none)]> SHOW PROCESSLIST;SHOW SLAVE STATUS\G
+----+-------------+-----------+------+---------+------+--------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                                            | Info             |
+----+-------------+-----------+------+---------+------+--------------------------------------------------+------------------+
|  3 | root        | localhost | NULL | Query   |    0 | starting                                         | SHOW PROCESSLIST |
|  6 | system user |           | NULL | Connect | 3597 | Waiting for master to send event                 | NULL             |
|  7 | system user |           | NULL | Connect |    0 | Waiting for Slave Workers to free pending events | NULL             |
|  8 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
|  9 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 10 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 11 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 12 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 13 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 14 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 15 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 16 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 17 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 18 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 19 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 20 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 21 | system user |           | NULL | Connect | 3597 | Waiting for an event from Coordinator            | NULL             |
| 22 | system user |           | NULL | Connect |   58 | Executing event                                  | NULL             |
+----+-------------+-----------+------+---------+------+--------------------------------------------------+------------------+
18 rows in set (0.00 sec)
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 569579212
               Relay_Log_File: hod03-relay-bin.000003
                Relay_Log_Pos: 284875964
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 284877019
              Relay_Log_Space: 569578687
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
 
   [17 Jan 2017 11:45]
   MySQL Verification Team        
  -- earlier message truncated somehow -- Setup replication rm -rf master bin/mysqld --initialize-insecure --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master -v bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master --core-file --socket=/tmp/mysql_master.sock --port=3306 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master/log.err --log-bin=master-bin --server_id=1 2>&1 & rm -rf slave bin/mysqld --initialize-insecure --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave -v bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave --core-file --socket=/tmp/mysql_slave.sock --port=3307 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave/log.err --log-bin=slave-bin --server_id=2 --skip_slave_start 2>&1 & -- sysbench [umshastr@hod03]~/bugs/sysbench: sysbench/sysbench --test=/home/umshastr/bugs/sysbench/sysbench/tests/db/oltp.lua --report-interval=1 --oltp-table-size=1500000 --db-driver=mysql --mysql-user=root --mysql-db=db1 --mysql-socket=/tmp/mysql_master.sock prepare sysbench 0.5: multi-threaded system evaluation benchmark Creating table 'sbtest1'... Inserting 1500000 records into 'sbtest1' [umshastr@hod03]~/bugs/sysbench: --master
   [25 Jan 2017 22:09]
   Kenny Gryp        
  I think this bug should be more a S2 severity. Most monitoring systems rely heavily on seconds_behind_master, this effectively makes monitoring replication lag with show slave status completely useless if you are using parallel replication at all.
   [30 Mar 2017 21:39]
   OCA Admin        
  Contribution submitted via Github - Bug #84415 slave don't report Seconds_Behind_Master when (*) Contribution by Robert Golebiowski (Github robgolebiowski, mysql-server/pull/133#issuecomment-290439173): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_113204019.txt (text/plain), 24.13 KiB.
   [30 Mar 2017 21:39]
   OCA Admin        
  Contribution submitted via Github - Bug #84415 slave don't report Seconds_Behind_Master when (*) Contribution by Robert Golebiowski (Github robgolebiowski, mysql-server/pull/134#issuecomment-290439404): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_113204278.txt (text/plain), 41.83 KiB.
   [14 Apr 2017 9:57]
   Laurynas Biveinis        
  Bug 84415 fix for 8.0.1 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug84415-8.0.1.patch (application/octet-stream, text), 40.76 KiB.
   [15 May 2017 14:49]
   Robert Golebiowski        
  The tests that I have submitted sometimes fail. Please find attached a patch which solves the problem.
Attachment: bug84415_test_fix.diff (text/x-patch), 5.62 KiB.
   [15 May 2017 14:49]
   Robert Golebiowski        
  (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
   [4 Aug 2017 19:52]
   Laurynas Biveinis        
  Bug 84415 fix for 8.0.2 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug84415-8.0.2.patch (application/octet-stream, text), 39.33 KiB.
   [2 Feb 2018 5:23]
   Laurynas Biveinis        
  The latest contributed fix applies cleanly and works on 8.0.4 too.
   [8 Feb 2018 13:51]
   Margaret Fisher        
  Posted by developer: Changelog entry added for MySQL 8.0.5 / 5.7.22: On a multi-threaded replication slave (with slave_parallel_workers greater than 0), the slave's lag behind the master was not being reported by the Seconds_Behind_Master field for SHOW SLAVE STATUS. The value is now reported correctly. Thanks to Robert Golebiowski for the patch.
   [14 Nov 2018 7:42]
   Arie Kachler        
  We are seeing similar behavior in 5.6.34 and 5.6.41 on masters! Masters show sporadic slave lag when MTR is enabled. We run Master-Master pairs, though only master is writeable. Has this fix been back-ported to 5.6?


Description: When running mysql with slave_parallel_workers > 0 , Seconds_Behind_Master from SHOW SLAVE STATUS doesn't report replication lag. From SHOW PROCESSLIST, Id 3 is executing an event from 76 seconds ago slave1 [localhost] {msandbox} ((none)) > SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------+ | 1 | system user | | NULL | Connect | 274 | Waiting for master to send event | NULL | | 2 | system user | | NULL | Connect | 11 | Waiting for dependent transaction to commit | NULL | | 4 | system user | | NULL | Connect | 76 | Executing event | NULL | | 5 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 6 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 7 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 8 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 9 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 10 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 11 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 12 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 13 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 14 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 15 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 16 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 17 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 18 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 19 | system user | | NULL | Connect | 274 | Waiting for an event from Coordinator | NULL | | 20 | msandbox | localhost | NULL | Query | 0 | starting | SHOW PROCESSLIST | +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------+ 19 rows in set (0.00 sec) But SHOW SLAVE STATUS doesn't report any lag: slave1 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: rsandbox Master_Port: 28890 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 569563421 Relay_Log_File: mysql-relay.000004 Relay_Log_Pos: 284857134 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 284861060 Relay_Log_Space: 569564180 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 00028890-1111-1111-1111-111111111111 Master_Info_File: /home/marcelo.altmann/sandboxes/rsandbox_5_7_15/node1/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) Same happens with slave_parallel_type=DATABASE . How to repeat: 1) Start slave with slave_parallel_workers > 0 2) Create a table with some rows on master: sysbench --test=/usr/share/sysbench/tests/db/oltp.lua --mysql-host=localhost --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox28890.sock --mysql-db=test --oltp-table-size=1500000 prepare 3) Delete a big amount of rows in your master: DELETE FROM sbtest1 WHERE id > 100; 4) Compare output of SHOW PROCESSLIST and SHOW SLAVE STATUS\G Suggested fix: Fix the reported seconds on SHOW SLAVE STATUS or document why it will report 0.