MySQL Bugs: #99887: Make MySQL Router aware of Replication Lag on secondaries in InnoDB ReplicaSet

Bug #99887	Make MySQL Router aware of Replication Lag on secondaries in InnoDB ReplicaSet
Submitted:	15 Jun 2020 13:46	Modified:	18 Jun 2020 5:57
Reporter:	Sergey Kuzmichev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Router	Severity:	S4 (Feature request)
Version:	8.0.20	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
MySQL Router currently doesn't check for Secondary Replication Lag when deployed in InnoDB ReplicaSet. Router will route requests from read port to a stale replica, if it's available. That's unfortunate, as it means that clients should expect to read arbitrarily stale data, sometimes hours old.

Router provides a few routing strategies right now, none of which is concerned with the replication lag: https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-conf-options.html#option_mysqlr...

MySQL Shell's ReplicaSet status() command will show replication lag if it's present, so Shell is aware of the lag.

There also doesn't seem to exist an option to manually fence off a Secondary, which is available but delayed, short of maybe blocking network access between the node and the router. As Router is deployed in ReplicaSet, the metadata-cache://<cluster> connection provides list of nodes, so editing router config manually is not an option, like it would be for a basic routing scenario.

How to repeat:
Set up a basic replicaset environment.

dba.deploySandboxInstance(3410)
dba.deploySandboxInstance(3420)
dba.deploySandboxInstance(3430)
\c root@localhost:3410
rs = dba.createReplicaSet('test')
rs.addInstance('root@localhost:3420')
rs.addInstance('root@localhost:3430')

Bootstrap and start a router.

$ mysqlrouter --bootstrap root@localhost:3410 --directory router3410
$ ./router3410/start.sh

Create heavy enough write workload on Primary.

$ sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --auto_inc=off \
--tables=4 --table_size=100000 --threads=4 --report-interval=1 \
--mysql-host=127.0.0.1 --mysql-port=6446 --mysql-user=sbuser \
--mysql-password='password' --mysql-db=sysbench \
prepare
$ sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --db-ps-mode=disable \
--skip_trx=on --mysql-user=sbuser --mysql-password='password' --mysql-db=sysbench \
--mysql-host=127.0.0.1 --mysql-port=6446 --mysql-ignore-errors=all --tables=4 \
--table_size=100000 --report-interval=1 --threads=4 --time=0 --events=0 --rate=20 \
run | grep tps

Observe Secondaries.

$ cat query-sleep.sh
#!/bin/bash

while (true); do
  echo -n "$(date +%F\ %H:%M:%S.%N): " && mysql -h127.0.0.1 -P6447 -uroot -BNe "SELECT @@report_port, @@GLOBAL.gtid_executed;"
  echo -n "$(date +%F\ %H:%M:%S.%N): " && mysql -h127.0.0.1 -P6447 -uroot -BNe "SELECT @@report_port, @@GLOBAL.gtid_executed;"
  sleep 5
done
$ ./query-sleep.sh
...
2020-06-15 09:02:55.051426655: 3430     750eee2e-aef3-11ea-9e94-90b11c553129:1-181778
2020-06-15 09:02:55.084802793: 3410     750eee2e-aef3-11ea-9e94-90b11c553129:1-181778
...

Shut down one of the Secondaries.

Wait for a few minutes.

Start that Secondary back.

Observe clients hitting a now-stale Secondary.

$ ./query-sleep.sh
...
2020-06-15 09:14:42.710042718: 3430     750eee2e-aef3-11ea-9e94-90b11c553129:1-226173
2020-06-15 09:14:42.735640228: 3430     750eee2e-aef3-11ea-9e94-90b11c553129:1-226173
2020-06-15 09:14:47.765533176: 3410     750eee2e-aef3-11ea-9e94-90b11c553129:1-183625 <<<
2020-06-15 09:14:47.815480555: 3430     750eee2e-aef3-11ea-9e94-90b11c553129:1-226574
2020-06-15 09:14:52.852841012: 3410     750eee2e-aef3-11ea-9e94-90b11c553129:1-186512 <<<
2020-06-15 09:14:52.884792873: 3430     750eee2e-aef3-11ea-9e94-90b11c553129:1-226955
...

Check ReplicaSet status() output to further confirm the node is lagging.

"127.0.0.1:3410": {
    "address": "127.0.0.1:3410",
    "instanceRole": "SECONDARY",
    "mode": "R/O",
    "replication": {
        "applierQueuedTransactionSetSize": 27925,
        "applierStatus": "APPLYING",
        "applierThreadState": "waiting for handler commit",
        "receiverStatus": "ON",
        "receiverThreadState": "Waiting for master to send event",
        "replicationLag": "00:07:27.713819"
    },
    "status": "ONLINE"
}

Suggested fix:
Add a new routing strategy which would allow to treat a node lagging for more than N seconds as unreachable.

and/or

Provide a manual way to prevent the Router from routing connections to a specific node when using metadata-cache as a node list provider. This fence can be then used to prevent reads from stale nodes.

Hello SERGEY KUZMICHEV,

Thank you for the feature request!

regards,
Umesh

Confirmed internally that this feature has been implemented and available in upcoming release 8.0.21