Description:
Using unreleased 18947309.mysql-advanced-5.7.14-linux-glibc2.5-x86_64.tar.gz Group Replication for testing.
After configuring, I ran some tests to make sure all is working fine (stopping & restarting all nodes one by one with inserts inbetween each restart) and after killing 2nd &
3rd node, 1st node hang with an INSERT:
server1> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID |
MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 9aa83c31-20bc-11e6-aaa8-b86b23917877 |
127.0.0.1 | 3302 | ONLINE |
| group_replication_applier | 9aa840e3-20bc-11e6-aae3-b86b23917877 |
127.0.0.1 | 3303 | ONLINE |
| group_replication_applier | 9aa843a9-20bc-11e6-aa4d-b86b23917877 |
127.0.0.1 | 3301 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0,00 sec)
server1>
server1>
server1>
server1> INSERT INTO test.t1 VALUES (11),(12);
Query OK, 2 rows affected (3 min 48,27 sec)
Records: 2 Duplicates: 0 Warnings: 0
The insert only committed after bringing node 2 up again, after 3 mins &
48 secs.
**
* Up to here all ok.
**
But when I checked the status of GR views from another session, I get:
server1> SELECT * FROM
performance_schema.replication_connection_status\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
GROUP_NAME: 8a94f357-aab4-11df-86ab-c80aa9429562
SOURCE_UUID: 8a94f357-aab4-11df-86ab-c80aa9429562
THREAD_ID: NULL
SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
RECEIVED_TRANSACTION_SET:
8a94f357-aab4-11df-86ab-c80aa9429562:1-20:23-24
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_recovery
GROUP_NAME:
SOURCE_UUID:
THREAD_ID: NULL
SERVICE_STATE: OFF
COUNT_RECEIVED_HEARTBEATS: 0
LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
RECEIVED_TRANSACTION_SET:
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0,00 sec)
server1> SELECT * FROM
performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 14639909840688467:11
MEMBER_ID: 9aa843a9-20bc-11e6-aa4d-b86b23917877
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 2
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_VALIDATING: 3
TRANSACTIONS_COMMITTED_ALL_MEMBERS:
8a94f357-aab4-11df-86ab-c80aa9429562:1-20
LAST_CONFLICT_FREE_TRANSACTION:
8a94f357-aab4-11df-86ab-c80aa9429562:24
1 row in set (0,00 sec)
server1> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID |
MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 9aa83c31-20bc-11e6-aaa8-b86b23917877 |
127.0.0.1 | 3302 | ONLINE |
| group_replication_applier | 9aa840e3-20bc-11e6-aae3-b86b23917877 |
127.0.0.1 | 3303 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0,00 sec)
however, it was the mysqld's on 3302 & 3303 that I killed:
root@khollman-es:/usr/local/mysql-5.7.14/rapid/plugin/group_replication/src#
ps -ef | grep mysqld
mysql 8420 5877 0 10:21 pts/2 00:00:02 bin/mysqld
--defaults-file=my.cnf --user=mysql
root 10597 5879 0 10:52 pts/12 00:00:00 grep --color=auto mysqld
In the error log, less than 4 iterations of the following occur:
2016-05-23T08:40:18.891768Z 5 [Note] Plugin group_replication
reported: '[XCOM_BINDING_DEBUG] ::send_message()'
2016-05-23T08:40:18.891839Z 5 [Note] Plugin group_replication
reported: '[XCOM_BINDING_DEBUG] ::send_binding_message()::
Pipelining message with payload length 86'
2016-05-23T08:40:18.891857Z 5 [Note] Plugin group_replication
reported: '[XCOM_BINDING_DEBUG] ::send_binding_message():: Sending
message with payload length 86'
2016-05-23T08:40:18.891950Z 5 [Note] Plugin group_replication
reported: '[XCOM_BINDING_DEBUG] ::send_message()::
send_binding_message enum_gcs_error result= 0. Bytes sent:74'
How to repeat:
- once all 3 servers are up and running, kill 2 of them, and on the remaining server, run a single insert.
- in another session, query the performance_schema.replication_group_members view.
Suggested fix:
I was expecting the view to at least reflect the fact that 3302 & 3303 were down, or even, that 3301 (the remaining server) was frozen / waiting to become the majority again.
Description: Using unreleased 18947309.mysql-advanced-5.7.14-linux-glibc2.5-x86_64.tar.gz Group Replication for testing. After configuring, I ran some tests to make sure all is working fine (stopping & restarting all nodes one by one with inserts inbetween each restart) and after killing 2nd & 3rd node, 1st node hang with an INSERT: server1> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 9aa83c31-20bc-11e6-aaa8-b86b23917877 | 127.0.0.1 | 3302 | ONLINE | | group_replication_applier | 9aa840e3-20bc-11e6-aae3-b86b23917877 | 127.0.0.1 | 3303 | ONLINE | | group_replication_applier | 9aa843a9-20bc-11e6-aa4d-b86b23917877 | 127.0.0.1 | 3301 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0,00 sec) server1> server1> server1> server1> INSERT INTO test.t1 VALUES (11),(12); Query OK, 2 rows affected (3 min 48,27 sec) Records: 2 Duplicates: 0 Warnings: 0 The insert only committed after bringing node 2 up again, after 3 mins & 48 secs. ** * Up to here all ok. ** But when I checked the status of GR views from another session, I get: server1> SELECT * FROM performance_schema.replication_connection_status\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier GROUP_NAME: 8a94f357-aab4-11df-86ab-c80aa9429562 SOURCE_UUID: 8a94f357-aab4-11df-86ab-c80aa9429562 THREAD_ID: NULL SERVICE_STATE: ON COUNT_RECEIVED_HEARTBEATS: 0 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00 RECEIVED_TRANSACTION_SET: 8a94f357-aab4-11df-86ab-c80aa9429562:1-20:23-24 LAST_ERROR_NUMBER: 0 LAST_ERROR_MESSAGE: LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00 *************************** 2. row *************************** CHANNEL_NAME: group_replication_recovery GROUP_NAME: SOURCE_UUID: THREAD_ID: NULL SERVICE_STATE: OFF COUNT_RECEIVED_HEARTBEATS: 0 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00 RECEIVED_TRANSACTION_SET: LAST_ERROR_NUMBER: 0 LAST_ERROR_MESSAGE: LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00 2 rows in set (0,00 sec) server1> SELECT * FROM performance_schema.replication_group_member_stats\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 14639909840688467:11 MEMBER_ID: 9aa843a9-20bc-11e6-aa4d-b86b23917877 COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 2 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_VALIDATING: 3 TRANSACTIONS_COMMITTED_ALL_MEMBERS: 8a94f357-aab4-11df-86ab-c80aa9429562:1-20 LAST_CONFLICT_FREE_TRANSACTION: 8a94f357-aab4-11df-86ab-c80aa9429562:24 1 row in set (0,00 sec) server1> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 9aa83c31-20bc-11e6-aaa8-b86b23917877 | 127.0.0.1 | 3302 | ONLINE | | group_replication_applier | 9aa840e3-20bc-11e6-aae3-b86b23917877 | 127.0.0.1 | 3303 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 2 rows in set (0,00 sec) however, it was the mysqld's on 3302 & 3303 that I killed: root@khollman-es:/usr/local/mysql-5.7.14/rapid/plugin/group_replication/src# ps -ef | grep mysqld mysql 8420 5877 0 10:21 pts/2 00:00:02 bin/mysqld --defaults-file=my.cnf --user=mysql root 10597 5879 0 10:52 pts/12 00:00:00 grep --color=auto mysqld In the error log, less than 4 iterations of the following occur: 2016-05-23T08:40:18.891768Z 5 [Note] Plugin group_replication reported: '[XCOM_BINDING_DEBUG] ::send_message()' 2016-05-23T08:40:18.891839Z 5 [Note] Plugin group_replication reported: '[XCOM_BINDING_DEBUG] ::send_binding_message():: Pipelining message with payload length 86' 2016-05-23T08:40:18.891857Z 5 [Note] Plugin group_replication reported: '[XCOM_BINDING_DEBUG] ::send_binding_message():: Sending message with payload length 86' 2016-05-23T08:40:18.891950Z 5 [Note] Plugin group_replication reported: '[XCOM_BINDING_DEBUG] ::send_message():: send_binding_message enum_gcs_error result= 0. Bytes sent:74' How to repeat: - once all 3 servers are up and running, kill 2 of them, and on the remaining server, run a single insert. - in another session, query the performance_schema.replication_group_members view. Suggested fix: I was expecting the view to at least reflect the fact that 3302 & 3303 were down, or even, that 3301 (the remaining server) was frozen / waiting to become the majority again.