Bug #101690 mysqlrouter should not send write requests to the RECOVERING state node
Submitted: 20 Nov 2020 7:20 Modified: 8 Jan 2021 1:33
Reporter: g g Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Router Severity:S3 (Non-critical)
Version:8.0.20 OS:Red Hat (7.7)
Assigned to: CPU Architecture:x86 (64)

[20 Nov 2020 7:20] g g
Description:
When a node in the group replication is in the RECOVERING state, mysqlrouter will send the write request to the node, and the node is in the readonly state.

Can you help me fix it

Thank you.

How to repeat:
When a node in the group replication is in the RECOVERING state

Suggested fix:
When a node in the group replication is in the RECOVERING state, mysqlrouter will send the write request to the node, and the node is in the readonly state.
[20 Nov 2020 12:55] MySQL Verification Team
Hi,

I have issues reproducing this? Can you please let me know the versions of MySQL router you are using and MySQL server you are using?

Adding a full config file of all nodes would be helpful also as well as client type (what connector you are using to connect to the router) and if you are executing a simple query or doing a prepared statement first.

Thanks
Bogdan
[24 Nov 2020 10:27] g g
mysql server 8.0.20
mysql router 8.0.21

node1/node2/node3
[mysqld]
port=3366
basedir=/home/tmp/package/gmysql
datadir=/home/tmp/mysql/data/mysql
max_connections=10000
max_connect_errors=1000
character-set-server=UTF8MB4
default-storage-engine=INNODB
default_authentication_plugin=mysql_native_password
[mysql]
default-character-set=UTF8MB4
[client]
port=3366
default-character-set=UTF8MB4

then
node1
>mysqlsh

shell.connect('root@localhost:3366')
dba.configureLocalInstance()
var cluster = dba.createCluster('testCluster', {multiPrimary: true, force: true})

cluster.addInstance('root@node2:3366')
cluster.addInstance('root@node3:3366')

When my cluster runs for a while,I shut down one of the nodes and keep inserting  millions of data to my cluster,then I restart the node which one I shut down .

The newly restarted node will be in the RECOVERING state for a period of time. At this time, the application fails to write to the restarted node through mysqlrouter.

The correct logic should be that nodes in the RECOVERING state should not be marked as available by the router

Thank you very much.
[24 Nov 2020 14:14] MySQL Verification Team
Hi,

> mysql server 8.0.20
> mysql router 8.0.21

Can you reproduce the problem by using both .21 or both .22 ?

thanks
Bogdan
[25 Nov 2020 1:58] g g
when I changed to both 8.0.20,I try it again.

use mysql client
SELECT MEMBER_ROLE FROM performance_schema.replication_group_members;
+-------------+
| MEMBER_ROLE |
+-------------+
| PRIMARY     |
| PRIMARY     |
| PRIMARY     |
+-------------+

but when I use mysqlsh,the result is:

"recoveryStatusText": "Distributed recovery in progress", 
                "role": "HA", 
                "status": "RECOVERING", 
                "version": "8.0.20"

and the result when i through mysqlrouter was:
mysql -uroot -proot -P7001 -Dobsdb --protocol=TCP -e"insert into EMPLOYEE2 (AGE) values (123);select @@hostname"

ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement

Do you understand ?

Thank you very much.
[25 Nov 2020 10:40] MySQL Verification Team
Hi,

Thanks for bearing with me. I understand what is happening to you but I'm having a problem reproducing the behavior. I'm trying to reproduce on 8.0.22 (server and router and shell) as not much point testing it with older versions. I cannot reproduce but looking trough code and changelog there is nothing regarding this changed recently. That is why I asked if you can reproduce this by all servers running .22.

I get one node to crash and get it up so it start recovering and router just does not send anything to it.

all best
Bogdan
[25 Nov 2020 12:59] MySQL Verification Team
Hi

I verified the report, thanks for reporting it.

In order to report (with 8.0.22) I had to 

1. pump a lot of data while node3 is down so that recovery last a bit
2. hit router with "a lot" of threads and a lot of queries (I was running 20 threads)

Thanks for reporting the bug
all best
Bogdan
[26 Nov 2020 8:33] g g
Hi,I'm so sorry .... after I changed all of them to .22,The bug still exists.

mysql client:
mysql> SELECT MEMBER_ROLE FROM performance_schema.replication_group_members;
+-------------+
| MEMBER_ROLE |
+-------------+
| PRIMARY     |
| PRIMARY     |
| PRIMARY     |
+-------------+
3 rows in set (0.00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.22    |
+-----------+
1 row in set (0.00 sec)

mysql shell
 MySQL  localhost:3366 ssl  JS > cluster.status()
....
....

"R540-148:3366": {
                "address": "R540-148:3366", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "recoveryStatusText": "Recovery in progress", 
                "role": "HA", 
                "status": "RECOVERING", 
                "version": "8.0.22"
....
....
            "topologyMode": "Multi-Primary"

myrouter:

mysql: [Warning] Using a password on the command line interface can be insecure.
+------------+
| @@hostname |
+------------+
| R540-147   |
+------------+
ysql -uroot -proot -P7001 -Dobsdb --protocol=TCP -e"insert into EMPLOYEE2 (AGE) values (123);select @@hostname"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------------+
| @@hostname |
+------------+
| r710-82    |
+------------+

mysql -uroot -proot -P7001 -Dobsdb --protocol=TCP -e"insert into EMPLOYEE2 (AGE) values (123);select @@hostname"
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement

I have three node in my cluster,and the cluster was Multi-Primary mode

I use jmeter to keep inserting to my cluster,then I stopped one of them and keep
inserting,Two hours later,There are millions of rows of data to be synchronized, then the bug reappeared.

Hope I made it clear

Thanks ,Happy Thanksgiving
[26 Nov 2020 11:15] MySQL Verification Team
Hi,

Thanks. I reproduced with .22, the problem I had reproducing is that I was running only few connections from the client. When I created a lot of connections from the client I started getting connections to the recovering node

Thanks for the report
Bogdan
[7 Jan 2021 8:07] g g
Hi,
I found the bug in cluster_metadata.cc:535,
the ClusterMetadataGR::fetch_cluster_hosts() function should be modified as follows:

old:
const std::string query =
      "SELECT member_host, member_port "
      "  FROM performance_schema.replication_group_members "

new:

const std::string query =
      "SELECT member_host, member_port "
      "  FROM performance_schema.replication_group_members where  member_state = 'ONLINE'"
[8 Jan 2021 1:33] g g
Hi,
I found the bug in cluster_metadata.cc:535,
the ClusterMetadataGR::fetch_cluster_hosts() function should be modified
as follows:

old:
const std::string query =
      "SELECT member_host, member_port "
      "  FROM performance_schema.replication_group_members "

new:

const std::string query =
      "SELECT member_host, member_port "
      "  FROM performance_schema.replication_group_members where
member_state = 'ONLINE'"

Please tell me how to install only the router module without installing the complete mysql

Thank you very much.