Bug #101690 | mysqlrouter should not send write requests to the RECOVERING state node | ||
---|---|---|---|
Submitted: | 20 Nov 2020 7:20 | Modified: | 8 Jan 2021 1:33 |
Reporter: | g g | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Router | Severity: | S3 (Non-critical) |
Version: | 8.0.20 | OS: | Red Hat (7.7) |
Assigned to: | CPU Architecture: | x86 (64) |
[20 Nov 2020 7:20]
g g
[20 Nov 2020 12:55]
MySQL Verification Team
Hi, I have issues reproducing this? Can you please let me know the versions of MySQL router you are using and MySQL server you are using? Adding a full config file of all nodes would be helpful also as well as client type (what connector you are using to connect to the router) and if you are executing a simple query or doing a prepared statement first. Thanks Bogdan
[24 Nov 2020 10:27]
g g
mysql server 8.0.20 mysql router 8.0.21 node1/node2/node3 [mysqld] port=3366 basedir=/home/tmp/package/gmysql datadir=/home/tmp/mysql/data/mysql max_connections=10000 max_connect_errors=1000 character-set-server=UTF8MB4 default-storage-engine=INNODB default_authentication_plugin=mysql_native_password [mysql] default-character-set=UTF8MB4 [client] port=3366 default-character-set=UTF8MB4 then node1 >mysqlsh shell.connect('root@localhost:3366') dba.configureLocalInstance() var cluster = dba.createCluster('testCluster', {multiPrimary: true, force: true}) cluster.addInstance('root@node2:3366') cluster.addInstance('root@node3:3366') When my cluster runs for a while,I shut down one of the nodes and keep inserting millions of data to my cluster,then I restart the node which one I shut down . The newly restarted node will be in the RECOVERING state for a period of time. At this time, the application fails to write to the restarted node through mysqlrouter. The correct logic should be that nodes in the RECOVERING state should not be marked as available by the router Thank you very much.
[24 Nov 2020 14:14]
MySQL Verification Team
Hi, > mysql server 8.0.20 > mysql router 8.0.21 Can you reproduce the problem by using both .21 or both .22 ? thanks Bogdan
[25 Nov 2020 1:58]
g g
when I changed to both 8.0.20,I try it again. use mysql client SELECT MEMBER_ROLE FROM performance_schema.replication_group_members; +-------------+ | MEMBER_ROLE | +-------------+ | PRIMARY | | PRIMARY | | PRIMARY | +-------------+ but when I use mysqlsh,the result is: "recoveryStatusText": "Distributed recovery in progress", "role": "HA", "status": "RECOVERING", "version": "8.0.20" and the result when i through mysqlrouter was: mysql -uroot -proot -P7001 -Dobsdb --protocol=TCP -e"insert into EMPLOYEE2 (AGE) values (123);select @@hostname" ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement Do you understand ? Thank you very much.
[25 Nov 2020 10:40]
MySQL Verification Team
Hi, Thanks for bearing with me. I understand what is happening to you but I'm having a problem reproducing the behavior. I'm trying to reproduce on 8.0.22 (server and router and shell) as not much point testing it with older versions. I cannot reproduce but looking trough code and changelog there is nothing regarding this changed recently. That is why I asked if you can reproduce this by all servers running .22. I get one node to crash and get it up so it start recovering and router just does not send anything to it. all best Bogdan
[25 Nov 2020 12:59]
MySQL Verification Team
Hi I verified the report, thanks for reporting it. In order to report (with 8.0.22) I had to 1. pump a lot of data while node3 is down so that recovery last a bit 2. hit router with "a lot" of threads and a lot of queries (I was running 20 threads) Thanks for reporting the bug all best Bogdan
[26 Nov 2020 8:33]
g g
Hi,I'm so sorry .... after I changed all of them to .22,The bug still exists. mysql client: mysql> SELECT MEMBER_ROLE FROM performance_schema.replication_group_members; +-------------+ | MEMBER_ROLE | +-------------+ | PRIMARY | | PRIMARY | | PRIMARY | +-------------+ 3 rows in set (0.00 sec) mysql> select version(); +-----------+ | version() | +-----------+ | 8.0.22 | +-----------+ 1 row in set (0.00 sec) mysql shell MySQL localhost:3366 ssl JS > cluster.status() .... .... "R540-148:3366": { "address": "R540-148:3366", "mode": "R/O", "readReplicas": {}, "recoveryStatusText": "Recovery in progress", "role": "HA", "status": "RECOVERING", "version": "8.0.22" .... .... "topologyMode": "Multi-Primary" myrouter: mysql: [Warning] Using a password on the command line interface can be insecure. +------------+ | @@hostname | +------------+ | R540-147 | +------------+ ysql -uroot -proot -P7001 -Dobsdb --protocol=TCP -e"insert into EMPLOYEE2 (AGE) values (123);select @@hostname" mysql: [Warning] Using a password on the command line interface can be insecure. +------------+ | @@hostname | +------------+ | r710-82 | +------------+ mysql -uroot -proot -P7001 -Dobsdb --protocol=TCP -e"insert into EMPLOYEE2 (AGE) values (123);select @@hostname" mysql: [Warning] Using a password on the command line interface can be insecure. ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement I have three node in my cluster,and the cluster was Multi-Primary mode I use jmeter to keep inserting to my cluster,then I stopped one of them and keep inserting,Two hours later,There are millions of rows of data to be synchronized, then the bug reappeared. Hope I made it clear Thanks ,Happy Thanksgiving
[26 Nov 2020 11:15]
MySQL Verification Team
Hi, Thanks. I reproduced with .22, the problem I had reproducing is that I was running only few connections from the client. When I created a lot of connections from the client I started getting connections to the recovering node Thanks for the report Bogdan
[7 Jan 2021 8:07]
g g
Hi, I found the bug in cluster_metadata.cc:535, the ClusterMetadataGR::fetch_cluster_hosts() function should be modified as follows: old: const std::string query = "SELECT member_host, member_port " " FROM performance_schema.replication_group_members " new: const std::string query = "SELECT member_host, member_port " " FROM performance_schema.replication_group_members where member_state = 'ONLINE'"
[8 Jan 2021 1:33]
g g
Hi, I found the bug in cluster_metadata.cc:535, the ClusterMetadataGR::fetch_cluster_hosts() function should be modified as follows: old: const std::string query = "SELECT member_host, member_port " " FROM performance_schema.replication_group_members " new: const std::string query = "SELECT member_host, member_port " " FROM performance_schema.replication_group_members where member_state = 'ONLINE'" Please tell me how to install only the router module without installing the complete mysql Thank you very much.