Bug #89938 Rejoin old primary node may duplicate key when recovery
Submitted: 7 Mar 3:13 Modified: 11 Jul 9:47
Reporter: Zhenghu Wen (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:5.7.20,8.0.4 OS:Any
Assigned to: CPU Architecture:Any
Tags: recovery

[7 Mar 3:13] Zhenghu Wen
Description:
When i do test case in "how to repeat", the rejoin old primary node print duplicate key error log accidentally:

2018-03-01T07:22:47.875590+08:00 13 [ERROR] Slave SQL for channel 'group_replication_applier': Worker 4 failed executing transaction 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:118961249' at master log , end_log_pos 1285; Could not execute Write_rows event on table sbtest.sbtest1; Duplicate entry '4838591' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 1285, Error_code: 1062
2018-03-01T07:22:47.879854+08:00 16 [ERROR] Slave SQL for channel 'group_replication_applier': Worker 7 failed executing transaction 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:118961252' at master log , end_log_pos 1285; Could not execute Write_rows event on table sbtest.sbtest7; Duplicate entry '4214858' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 1285, Error_code: 1062

after i add some debug msg in MGR code, the log show as below:
rejoin node:
2018-03-04T18:30:26.944863+08:00 31 [Note] Plugin group_replication reported: 'Set certifying_already_applied_transactions to true. 
group_gtid_executed: "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79351571:79351573-79352231:79753282-79753578"; 
group_gtid_extracted_string: "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79351571:79753282-79753578"'
...
2018-03-04T18:30:30.541516+08:00 0 [Note] Plugin group_replication reported: 'Certifier stable_set_handle: executed_set: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79284767:79351573-79351907:79753282-79753578'
2018-03-04T18:30:30.541524+08:00 6 [Warning] Plugin group_replication reported: 'start 0, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080616:79753282-79753561, gtids_assigned_in_blocks_counter 1'
2018-03-04T18:30:30.541691+08:00 6 [Note] Plugin group_replication reported: 'reserve_gtid_block(1) new Gtid_set::Interval 79351572-79753281, member_uuid 9e6e2471-1ece-11e8-b995-c81f66e48c6
e'
2018-03-04T18:30:30.547067+08:00 6 [Warning] Plugin group_replication reported: 'is local 0, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080628:79753282-79753565, gtids_assigned_in_blocks_counter 2'
2018-03-04T18:30:30.547107+08:00 6 [Warning] Plugin group_replication reported: ' certified item snapshot aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79312047:79312122, set 7Sbo48ZPDSw='
2018-03-04T18:30:30.547118+08:00 6 [Warning] Plugin group_replication reported: 'found conflict'
2018-03-04T18:30:30.551490+08:00 6 [Warning] Plugin group_replication reported: 'is local 0, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080627:79753282-79753565, gtids_assigned_in_blocks_counter 2'
2018-03-04T18:30:30.555898+08:00 6 [Warning] Plugin group_replication reported: 'is local 0, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080624:79753282-79753565, gtids_assigned_in_blocks_counter 3'
2018-03-04T18:30:30.556854+08:00 6 [Warning] Plugin group_replication reported: 'is local 0, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080624:79753282-79753565, gtids_assigned_in_blocks_counter 4'

new primary:
2018-03-04T18:28:37.122261+08:00 6 [Warning] Plugin group_replication reported: 'local trx 1, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080616:79753282-79753561, gtids_assigned_in_blocks_counter 1'
2018-03-04T18:28:37.122334+08:00 6 [Warning] Plugin group_replication reported: ' certified aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79161252:79161310, set VTCkGz65Jq0='
2018-03-04T18:28:37.122340+08:00 6 [Warning] Plugin group_replication reported: 'found conflict'
2018-03-04T18:28:37.122779+08:00 6 [Warning] Plugin group_replication reported: 'local trx 1, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080628:79753282-79753565, gtids_assigned_in_blocks_counter 1'
2018-03-04T18:28:37.122792+08:00 6 [Warning] Plugin group_replication reported: ' certified aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79312047:79312122, set 7Sbo48ZPDSw='
2018-03-04T18:28:37.122796+08:00 6 [Warning] Plugin group_replication reported: 'found conflict'
2018-03-04T18:28:37.122842+08:00 6 [Warning] Plugin group_replication reported: 'local trx 1, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080627:79753282-79753565, gtids_assigned_in_blocks_counter 1'
2018-03-04T18:28:37.122859+08:00 6 [Warning] Plugin group_replication reported: 
' certified aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79208894:79208901-79208915:79208917-79208932:79208934:79209008, set h8UODf3lnzo='
2018-03-04T18:28:37.122864+08:00 6 [Warning] Plugin group_replication reported: 'found conflict'
2018-03-04T18:28:37.122910+08:00 6 [Warning] Plugin group_replication reported: 'local trx 1, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080624:79753282-79753565, gtids_assigned_in_blocks_counter 1'
2018-03-04T18:28:37.122926+08:00 6 [Warning] Plugin group_replication reported: ' certified aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79096911:79096936, set Zt/mEWKhEhg='
2018-03-04T18:28:37.122930+08:00 6 [Warning] Plugin group_replication reported: 'found conflict'
2018-03-04T18:28:37.122975+08:00 6 [Warning] Plugin group_replication reported: 'local trx 1, trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080624:79753282-79753565, gtids_assigned_in_blocks_counter 1'
2018-03-04T18:28:37.122987+08:00 6 [Warning] Plugin group_replication reported: ' certified aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79223918:79223962, set eBxQ+I4NT08='
2018-03-04T18:28:37.122991+08:00 6 [Warning] Plugin group_replication reported: 'found conflict'

it seems that if certifying_already_applied_transactions is true, i.e the rejoin node has apply some trxs belong to current view through group_replication_recovery channel. some trxs are certified as yes in the rejoin node, but those trxs were certified as conflict in the original/primary node, such as "trx aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-79080616:79753282-79753561".

i checked the mysqld-relay-bin-group_replication_applier relay log file 
 of the rejoin node and the binlog file of the new primary node. 

the reason is the rejoin node could not found certification item for the primary key the trx has modified. those items were marked as stable and purged as this node has executed this gtid through group_replication_recovery channel, but the certifier still need those items to certify the buffered trxs in applier module incoming queue which come from paxos when do group_replication_recovery replication.

How to repeat:
1. Create a group with 3 members (node1,node2,node3), using single-primary mode
2. prepare database sbtest with sysbench(10tables each with 5000000 records)
3. and run oltp test with many threads in primary without time limit
4. sleep 5 minutes then kill -9 primary node, restart it and start group_replication
5. sysbench swith to new primary node, and continue running
6. wait untill the old primary node online, then goto step 4

In my test environment, when using 128 oltp threads, after hundreds of 4~6 loops the old primary will duplicate key

Suggested fix:
when a rejoin node is in recovering (old primary in single primary mode, any node in multi master mode) :

1、Certifier_broadcast_thread should broadcast gtid_extracted, but not gtid_executed if certifying_already_applied_transactions is true

2、or disable certification info garbage collect until gtid_extracted is equal to gtid_executed
[7 Mar 3:28] Zhenghu Wen
fix patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 89938.patch (application/octet-stream, text), 690 bytes.

[11 Apr 7:33] Zhenghu Wen
my sysbench test script

Attachment: run-sb.sh (text/x-sh), 1.02 KiB.

[11 Apr 7:33] Zhenghu Wen
mysql crash test script

Attachment: kill_primary_2.sh (text/x-sh), 3.56 KiB.

[11 Apr 7:41] Zhenghu Wen
PS:
1、the bugfix code i contribute still have same problem,and my local tested bugfix code is not very grace. so, just ignore it.
2、when i do this test, i have already patched bugfix code in https://bugs.mysql.com/bug.php?id=89582;
[11 Jul 9:47] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.23 / 8.0.12 changelog:
In certain situations, such as during distributed recovery procedure, the certification info garbage collection was purging more data than it should, resulting in conflicts not being detected. The garbage collection procedure has been improved to take this case in consideration.