Bug #83237 mysqlrouter connects to wrong metadata server (a partitioned node)
Submitted: 1 Oct 2016 21:21 Modified: 16 May 2017 18:25
Reporter: Kenny Gryp Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Router Severity:S1 (Critical)
Version:v2.1.0 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[1 Oct 2016 21:21] Kenny Gryp
Description:
mysqlrouter connects to wrong metadata server (a partitioned node) and fails to work.

used in: 5.7.15-labs-gr090-log

How to repeat:
- create 3 node cluster (mysql2, mysql3, mysql4)
- configure mysqlrouter
- start mysqlrouter
- send some sysbench traffic
- kill the primary node (mysql3)

mysqlrouter log has this:

Failed connecting with Metadata Server mysql3:3306: Can't connect to MySQL server on 'mysql3' (110)
2016-10-01 20:32:47 INFO    [7f78ccc05700] Connected with metadata server running on mysql4:3306
2016-10-01 20:32:47 WARNING [7f78ccc05700] Member 5166ecd3-880b-11e6-9971-08002718d305 defined in metadata not found in actual replicaset
2016-10-01 20:32:47 INFO    [7f78ccc05700] Changes detected in cluster 'plam' after metadata refresh
2016-10-01 20:32:47 INFO    [7f78ccc05700] Metadata for cluster 'plam' has 1 replicasets:
2016-10-01 20:32:47 INFO    [7f78ccc05700] 'default' (3 members)
2016-10-01 20:32:47 INFO    [7f78ccc05700]     mysql3:3306 / 33060 - role=HA mode=n/a
2016-10-01 20:32:47 INFO    [7f78ccc05700]     mysql4:3306 / 33060 - role=HA mode=RW
2016-10-01 20:32:47 INFO    [7f78ccc05700] Replicaset 'default' has a new Primary.
2016-10-01 20:32:47 INFO    [7f78ccc05700]     mysql2:3306 / 33060 - role=HA mode=RO
2016-10-01 20:37:47 INFO    [7f78ccc05700] Connected with metadata server running on mysql3:3306
2016-10-01 20:37:47 WARNING [7f78ccc05700] Member 0ec26995-880c-11e6-b615-08002718d305 defined in metadata not found in actual replicaset
2016-10-01 20:37:47 WARNING [7f78ccc05700] Member 3d36134b-87ef-11e6-93be-08002718d305 defined in metadata not found in actual replicaset
2016-10-01 20:37:47 INFO    [7f78ccc05700] Changes detected in cluster 'plam' after metadata refresh
2016-10-01 20:37:47 INFO    [7f78ccc05700] Metadata for cluster 'plam' has 1 replicasets:
2016-10-01 20:37:47 INFO    [7f78ccc05700] 'default' (3 members)

You can see it connected to mysql4 which is the new master, but almost immediately  it connects to mysql3 metdataserver.... (mysql3 got restarted automatically)

Now it does not see the 2 members in the replicaset. 
It seems mysqlrouter is stuck on the metadata schema of the partitioned node.

INFO: 

mysql2 mysql> show status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name                    | Value                                |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 0ec26995-880c-11e6-b615-08002718d305 |
+----------------------------------+--------------------------------------+
1 row in set (0.02 sec)

mysql2 mysql> select member_host as "primary master" 
    ->        from performance_schema.global_status 
    ->        join  performance_schema.replication_group_members 
    ->        where variable_name = 'group_replication_primary_member' 
    ->        and member_id=variable_value;
+----------------+
| primary master |
+----------------+
| mysql4         |
+----------------+
1 row in set (0.02 sec)

on mysql3>:

mysql> show global variables like 'super%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| super_read_only | OFF   |
+-----------------+-------+
1 row in set (0.02 sec)

mysql> create database bleh;
Query OK, 1 row affected (0.04 sec)

mysql> show status like 'group_replication_primary_member';
+----------------------------------+-------+
| Variable_name                    | Value |
+----------------------------------+-------+
| group_replication_primary_member |       |
+----------------------------------+-------+
1 row in set (0.00 sec)

mysql> create database jen;
Query OK, 1 row affected (0.00 sec)

Suggested fix:
I am still new to group replication, innodb cluster, mysqlrouter architecture. I'm not sure what to suggest other than: avoid mysqlrouter to connect to wrong metadataserver.
[16 May 2017 18:25] MySQL Verification Team
Hi,

I did reproduce this with 2.1.0 but I'm having issues reproducing this with 2.1.3.

Can you give me your group replication and your mysql router configuration please

all best
Bogdan
[19 Jan 2020 2:21] yong li
I also encountered this problem, how to solve it.