Bug #92690 | Group Replication split brain with faulty network | ||
---|---|---|---|
Submitted: | 5 Oct 2018 14:18 | Modified: | 25 Feb 2019 13:28 |
Reporter: | Przemyslaw Malkowski | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S2 (Serious) |
Version: | 5.7.23, 8.0.12 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[5 Oct 2018 14:18]
Przemyslaw Malkowski
[9 Oct 2018 14:31]
MySQL Verification Team
Hi, Took some hours but I reproduced the issue. Thanks for the report. Bogdan p.s. for some reason this required real hw to be reproduced, inside vm even with different tcp/ip shaping I was not able to reproduce it
[9 Oct 2018 15:00]
MySQL Verification Team
it would help also if you can upload your logs (both binary/relay and error log).
[10 Oct 2018 12:10]
Przemyslaw Malkowski
Bogdan, do you mean that if I had uploaded them before, reproducing would be easier or that still my logs will be useful?
[11 Oct 2018 10:09]
MySQL Verification Team
They are still useful, so if you can please upload yours. Having two sets (mine and yours) will help fixing it.
[19 Oct 2018 10:14]
Przemyslaw Malkowski
Ran same tests on MySQL 8.0.12 and ended up with same consistency disaster. More details in the attachment, but I think this tells us more how confusing the situation is: gr01 > select * from performance_schema.replication_group_member_stats\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15399414486931668:7 MEMBER_ID: 1f912efe-c880-11e8-9592-525400cae48b COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 0 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302651 LAST_CONFLICT_FREE_TRANSACTION: COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0 COUNT_TRANSACTIONS_REMOTE_APPLIED: 0 COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0 COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0 *************************** 2. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15399414486931668:7 MEMBER_ID: 20c6a737-c880-11e8-bf58-525400cae48b COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 7108 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302651 LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:302650 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0 COUNT_TRANSACTIONS_REMOTE_APPLIED: 7110 COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0 COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0 *************************** 3. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15399414486931668:7 MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 7115 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 18 TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302651 LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:302657 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0 COUNT_TRANSACTIONS_REMOTE_APPLIED: 4 COUNT_TRANSACTIONS_LOCAL_PROPOSED: 7115 COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0 3 rows in set (0.00 sec)
[19 Oct 2018 10:17]
Przemyslaw Malkowski
err logs and more details
Attachment: 92690-8.0.tgz (application/x-compressed-tar, text), 13.65 KiB.
[19 Oct 2018 12:38]
MySQL Verification Team
Thanks for the update, any way you can also compress and upload binary and relay logs?
[19 Oct 2018 13:10]
MySQL Verification Team
yes, please split into 3M parts
[19 Oct 2018 14:08]
Przemyslaw Malkowski
binlogs part1
Attachment: 92690-binlogs.tgz_00 (application/octet-stream, text), 2.92 MiB.
[19 Oct 2018 14:08]
Przemyslaw Malkowski
binlogs part2
Attachment: 92690-binlogs.tgz_01 (application/octet-stream, text), 2.92 MiB.
[19 Oct 2018 14:09]
Przemyslaw Malkowski
binlogs part3
Attachment: 92690-binlogs.tgz_02 (application/octet-stream, text), 2.92 MiB.
[19 Oct 2018 14:10]
Przemyslaw Malkowski
binlogs part4
Attachment: 92690-binlogs.tgz_03 (application/octet-stream, text), 2.92 MiB.
[19 Oct 2018 14:10]
Przemyslaw Malkowski
binlogs part5
Attachment: 92690-binlogs.tgz_04 (application/octet-stream, text), 2.92 MiB.
[19 Oct 2018 14:11]
Przemyslaw Malkowski
binlogs part6
Attachment: 92690-binlogs.tgz_05 (application/octet-stream, text), 116.04 KiB.
[1 Nov 2018 8:00]
jifa Xu
I have meet a same problem with 5.7.22 I use mgr with single primary mode primary node's mysqld process memory usage increase to 96%(about 310G,the innodb_buffer_pool_size is 47G), then, system kill mysqld process because of OOM After that, when start group_replication; the old primary node has more transaction than other two nodes. When check mysql binlog, I found that there are about 5 transaction only exists in primary node.
[1 Nov 2018 8:11]
MySQL Verification Team
> I have meet a same problem with 5.7.22 please, if you can, upload - mysql logs - relay logs - binary logs that will speed up resolution of the bug thanks Bogdan
[2 Nov 2018 10:48]
jifa Xu
I add some log to mysql and find some thing error in xcom_base.c executor_task code: else if ((ep->p)->learner.msg->msg_type != no_op) { execute_msg(find_site_def(delivered_msg), ep->p, ep->p->learner.msg); #if defined(TASK_DBUG_ON) && TASK_DBUG_ON DBGOUT(perf_dbg(&ep->n, &ep->old_n, &ep->old_t)); #endif } when error happened(with "tc qdisc add dev eth0 root netem loss 40%" ), there are one message which msg_type is normal in primary node, but on secondary node, the msg_type is no_op (I use msgno and nodeno to make sure there are same message). Then it cause data inconsistent.
[13 Nov 2018 5:54]
Gosin Gu
I found this location in the source code of mysql: https://github.com/dbgosin/mysql-server/blob/df0bc0a67b5cca06665e1d501a2f74b712af5cf5/rapi... code : if (gt_ballot(m->proposal, p->proposer.msg->proposal)) { /* greater */ replace_pax_msg(&p->proposer.msg, m); assert(p->proposer.msg); } When encountering such a scene: Three nodes: n0, n1, n2, one transaction only node 0 and node 1 accpeted, node 2 initiated prepare_op to 0, 1. at this time, node1's handle_prepare_op copies the accpetor of the state machine into the message reply to node 2 with ack_prepare_op, when node 2 accepts node 1 After the paxos msg, I think that a reasonable algorithm should look like this: gt_ballot(m->proposal, p->proposer.msg->proposal) == true Then :replace_pax_msg(&p->proposer.msg, m); In fact not the case, due to m->proposal == p->proposer.msg->proposa; so It will ignore the already accepted messages sent by node 1. Eventually, node 0 has completed learn, and node 1 is not completed. In addition, node 2 also causes node1 to be replaced by empty message of node2 because of check_propose, and finally the node1 and node2 transactions are lost.
[25 Feb 2019 13:28]
Margaret Fisher
Posted by developer: Changelog entry added for MySQL 8.0.16 and 5.7.26: In a replication group configured in single-primary mode (group_replication_single_primary_mode=ON, which is the default), if severe network delays affected the group, it was possible for the primary and the secondaries to reach different decisions on a transaction, which could lead to divergence in the gtid_executed sets on the members. The issue has now been fixed.