Description:
When the last 2 nodes that remain in the cluster are network partitioned, there is now way of knowing if a node is accepting writes or not.
This is a problem for load balancers (proxysql and router alike) as there is no way of knowing if there's a node available or not.
How to repeat:
Ensure the default group_replication_unreachable_majority_timeout=0 is configured (if you change this, nodes go into ERROR state)
When there is a 2 node cluster (let's assume node 3 dropped already), you will have a replication_group_members status like:
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2 | 3306 | ONLINE | PRIMARY | 8.0.4 |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3 | 3306 | ONLINE | SECONDARY | 8.0.4 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
let's network partition the 2 nodes
[vagrant@node2 ~]$ sudo route add -net 192.168.70.0/24 gw 10.0.2.111
However, when you network partition the 2 nodes, both give some output like:
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2 | 3306 | ONLINE | PRIMARY | 8.0.4 |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3 | 3306 | UNREACHABLE | SECONDARY | 8.0.4 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2 | 3306 | UNREACHABLE | PRIMARY | 8.0.4 |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3 | 3306 | ONLINE | SECONDARY | 8.0.4 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
They both think they are ONLINE, the first one listed even is PRIMARY AND ONLINE.
Even when checking replication_connection_status, I can't tell:
mysql> select * from replication_connection_status\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
GROUP_NAME: 40edb723-3f7b-11e8-b900-08002789cd2e
SOURCE_UUID: 40edb723-3f7b-11e8-b900-08002789cd2e
THREAD_ID: NULL
SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00.000000
RECEIVED_TRANSACTION_SET: 40edb723-3f7b-11e8-b900-08002789cd2e:1-95:1000061,
d84addc8-3f73-11e8-b28d-08002789cd2e:1-22,
d8a4abfc-3f73-11e8-b4fe-08002789cd2e:1-7,
d8fe8a34-3f73-11e8-b753-08002789cd2e:1
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION: 40edb723-3f7b-11e8-b900-08002789cd2e:95
LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-04-18 01:24:18.565458
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2018-04-18 01:24:18.565628
LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2018-04-18 01:24:18.565640
QUEUEING_TRANSACTION:
QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
but when I do a write, it hangs:
mysql> create database test;;
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
There is no way to know if a node is accepting writes and/or is part of the primary partition using just what is shipping with mysql 8.
When you use lefred sys schema addition, there is a way of knowing, sortof...
mysql> select * from sys.gr_member_routing_candidate_status;
+------------------+-----------+---------------------+----------------------+
| viable_candidate | read_only | transactions_behind | transactions_to_cert |
+------------------+-----------+---------------------+----------------------+
| NO | NO | 0 | 0 |
+------------------+-----------+---------------------+----------------------+
mysql> select * from sys.gr_member_routing_candidate_status;
+------------------+-----------+---------------------+----------------------+
| viable_candidate | read_only | transactions_behind | transactions_to_cert |
+------------------+-----------+---------------------+----------------------+
| NO | YES | 0 | 0 |
+------------------+-----------+---------------------+----------------------+
1 row in set (0.00 sec)
Basically, the first one was the writer, it's still accepting writes (albeit being blocked by GR not having quorum).
The magic that you have to do is seeing if the node is a viable_candidate AND read_only NO.
Fred's code just calculates quorum based on the amount of nodes.
I really don't like this solution, the PFS tables should return the information from the GR Plugin showing if writes are accepted or not
Suggested fix:
Make it possible to determine _easily_ if a node is part of primary partition AND which node can accept writes.
I hope I'm missing something here, maybe there's another table status variable somewhere?