MySQL Bugs: #90484: No (easy) way to know if a GR node is writeable or not

Bug #90484	No (easy) way to know if a GR node is writeable or not
Submitted:	18 Apr 2018 1:33	Modified:	9 Mar 2019 15:31
Reporter:	Kenny Gryp	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S4 (Feature request)
Version:	8.0.4	OS:	CentOS (7)
Assigned to:		CPU Architecture:	Any

Description:

When the last 2 nodes that remain in the cluster are network partitioned, there is now way of knowing if a node is accepting writes or not.

This is a problem for load balancers (proxysql and router alike) as there is no way of knowing if there's a node available or not.

How to repeat:
Ensure the default group_replication_unreachable_majority_timeout=0 is configured (if you change this, nodes go into ERROR state)

When there is a 2 node cluster (let's assume node 3 dropped already), you will have a replication_group_members status like: 

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2       |        3306 | ONLINE       | PRIMARY     | 8.0.4          |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3       |        3306 | ONLINE       | SECONDARY   | 8.0.4          |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

let's network partition the 2 nodes
[vagrant@node2 ~]$  sudo route add -net 192.168.70.0/24 gw 10.0.2.111

However, when you network partition the 2 nodes, both give some output like:

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2       |        3306 | ONLINE       | PRIMARY     | 8.0.4          |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3       |        3306 | UNREACHABLE  | SECONDARY   | 8.0.4          |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2       |        3306 | UNREACHABLE  | PRIMARY     | 8.0.4          |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3       |        3306 | ONLINE       | SECONDARY   | 8.0.4          |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

They both think they are ONLINE, the first one listed even is PRIMARY AND ONLINE.

Even when checking replication_connection_status, I can't tell:

mysql> select * from replication_connection_status\G
*************************** 1. row ***************************
                                      CHANNEL_NAME: group_replication_applier
                                        GROUP_NAME: 40edb723-3f7b-11e8-b900-08002789cd2e
                                       SOURCE_UUID: 40edb723-3f7b-11e8-b900-08002789cd2e
                                         THREAD_ID: NULL
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 0
                          LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00.000000
                          RECEIVED_TRANSACTION_SET: 40edb723-3f7b-11e8-b900-08002789cd2e:1-95:1000061,
d84addc8-3f73-11e8-b28d-08002789cd2e:1-22,
d8a4abfc-3f73-11e8-b4fe-08002789cd2e:1-7,
d8fe8a34-3f73-11e8-b753-08002789cd2e:1
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE: 
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: 40edb723-3f7b-11e8-b900-08002789cd2e:95
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-04-18 01:24:18.565458
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2018-04-18 01:24:18.565628
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2018-04-18 01:24:18.565640
                              QUEUEING_TRANSACTION: 
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000

but when I do a write, it hangs:

mysql> create database test;;
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted

There is no way to know if a node is accepting writes and/or is part of the primary partition using just what is shipping with mysql 8.

When you use lefred sys schema addition, there is a way of knowing, sortof...

mysql> select * from sys.gr_member_routing_candidate_status;
+------------------+-----------+---------------------+----------------------+
| viable_candidate | read_only | transactions_behind | transactions_to_cert |
+------------------+-----------+---------------------+----------------------+
| NO               | NO        |                   0 |                    0 |
+------------------+-----------+---------------------+----------------------+

mysql>  select * from sys.gr_member_routing_candidate_status;
+------------------+-----------+---------------------+----------------------+
| viable_candidate | read_only | transactions_behind | transactions_to_cert |
+------------------+-----------+---------------------+----------------------+
| NO               | YES       |                   0 |                    0 |
+------------------+-----------+---------------------+----------------------+
1 row in set (0.00 sec)

Basically, the first one was the writer, it's still accepting writes (albeit being blocked by GR not having quorum).
The magic that you have to do is seeing if the node is a viable_candidate AND read_only NO.

Fred's code just calculates quorum based on the amount of nodes.

I really don't like this solution, the PFS tables should return the information from the GR Plugin showing if writes are accepted or not

Suggested fix:

Make it possible to determine _easily_ if a node is part of primary partition AND which node can accept writes.

I hope I'm missing something here, maybe there's another table status variable somewhere?

Hi Kenny,

First of all, thanks a bunch for new scenario :)

I disagree this is a bug. It can be a missing feature, ugly implementation, implementation you dislike, missing compatibility with 3rd party apps, but IMO not a bug.

If you can rewrite the request as feature request I'd be happy to push it trough but can't do it as a bug.

all best
Bogdan

Ok, done, I made this an S4, Feature Request.

Thanks!