Description:
mysqlrouter connects to wrong metadata server (a partitioned node) and fails to work.
used in: 5.7.15-labs-gr090-log
How to repeat:
- create 3 node cluster (mysql2, mysql3, mysql4)
- configure mysqlrouter
- start mysqlrouter
- send some sysbench traffic
- kill the primary node (mysql3)
mysqlrouter log has this:
Failed connecting with Metadata Server mysql3:3306: Can't connect to MySQL server on 'mysql3' (110)
2016-10-01 20:32:47 INFO [7f78ccc05700] Connected with metadata server running on mysql4:3306
2016-10-01 20:32:47 WARNING [7f78ccc05700] Member 5166ecd3-880b-11e6-9971-08002718d305 defined in metadata not found in actual replicaset
2016-10-01 20:32:47 INFO [7f78ccc05700] Changes detected in cluster 'plam' after metadata refresh
2016-10-01 20:32:47 INFO [7f78ccc05700] Metadata for cluster 'plam' has 1 replicasets:
2016-10-01 20:32:47 INFO [7f78ccc05700] 'default' (3 members)
2016-10-01 20:32:47 INFO [7f78ccc05700] mysql3:3306 / 33060 - role=HA mode=n/a
2016-10-01 20:32:47 INFO [7f78ccc05700] mysql4:3306 / 33060 - role=HA mode=RW
2016-10-01 20:32:47 INFO [7f78ccc05700] Replicaset 'default' has a new Primary.
2016-10-01 20:32:47 INFO [7f78ccc05700] mysql2:3306 / 33060 - role=HA mode=RO
2016-10-01 20:37:47 INFO [7f78ccc05700] Connected with metadata server running on mysql3:3306
2016-10-01 20:37:47 WARNING [7f78ccc05700] Member 0ec26995-880c-11e6-b615-08002718d305 defined in metadata not found in actual replicaset
2016-10-01 20:37:47 WARNING [7f78ccc05700] Member 3d36134b-87ef-11e6-93be-08002718d305 defined in metadata not found in actual replicaset
2016-10-01 20:37:47 INFO [7f78ccc05700] Changes detected in cluster 'plam' after metadata refresh
2016-10-01 20:37:47 INFO [7f78ccc05700] Metadata for cluster 'plam' has 1 replicasets:
2016-10-01 20:37:47 INFO [7f78ccc05700] 'default' (3 members)
You can see it connected to mysql4 which is the new master, but almost immediately it connects to mysql3 metdataserver.... (mysql3 got restarted automatically)
Now it does not see the 2 members in the replicaset.
It seems mysqlrouter is stuck on the metadata schema of the partitioned node.
INFO:
mysql2 mysql> show status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 0ec26995-880c-11e6-b615-08002718d305 |
+----------------------------------+--------------------------------------+
1 row in set (0.02 sec)
mysql2 mysql> select member_host as "primary master"
-> from performance_schema.global_status
-> join performance_schema.replication_group_members
-> where variable_name = 'group_replication_primary_member'
-> and member_id=variable_value;
+----------------+
| primary master |
+----------------+
| mysql4 |
+----------------+
1 row in set (0.02 sec)
on mysql3>:
mysql> show global variables like 'super%';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| super_read_only | OFF |
+-----------------+-------+
1 row in set (0.02 sec)
mysql> create database bleh;
Query OK, 1 row affected (0.04 sec)
mysql> show status like 'group_replication_primary_member';
+----------------------------------+-------+
| Variable_name | Value |
+----------------------------------+-------+
| group_replication_primary_member | |
+----------------------------------+-------+
1 row in set (0.00 sec)
mysql> create database jen;
Query OK, 1 row affected (0.00 sec)
Suggested fix:
I am still new to group replication, innodb cluster, mysqlrouter architecture. I'm not sure what to suggest other than: avoid mysqlrouter to connect to wrong metadataserver.