Description:
In Group Replication, after a recovery where there is a significant amount of load being applied, the recovering node sometimes fails after recovery with an error on the binary log.
This happens immediately after finishing recovery as can be seen in this message log:
2017-10-26T14:15:37.315509Z 0 [Note] [000000] Plugin group_replication reported: 'This server was declared online within the replication group' (Recovering node)
2017-10-26T14:15:37.315598Z 0 [Note] [000000] Plugin group_replication reported: 'The member with address siv30:29543 was declared online within the replication group' (other node)
2017-10-26T14:15:37.316237Z 0 [Note] [000000] Plugin group_replication reported: 'The member with address siv30:29543 was declared online within the replication group' (other node)
2017-10-26T14:15:37.316508Z 441 [ERROR] [001782] Slave SQL for channel 'group_replication_applier': Worker 1 failed executing transaction 'NOT_YET_DETERMINED' at master log , end_log_pos 563; Error executing row event: '@@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.', Error_code: 1782
2017-10-26T14:15:37.316600Z 440 [Warning] [001756] Slave SQL for channel 'group_replication_applier': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2017-10-26T14:15:37.316815Z 440 [ERROR] [000000] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2017-10-26T14:15:37.316872Z 92 [ERROR] [000000] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2017-10-26T14:15:37.316946Z 92 [ERROR] [000000] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2017-10-26T14:15:37.318555Z 92 [Note] [000000] Plugin group_replication reported: 'The group replication applier thread was killed'
At this point the member trying to enter the group enters in ERROR state.
How to repeat:
This a follow up on BUG#26731317, the conditions are similar, although it seems to happen less frequently and only at the highest amounts of load.
Suggested fix:
The message "Worker 1 failed executing transaction 'NOT_YET_DETERMINED' at master log , end_log_pos 563; Error executing row event: '@@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.', Error_code: 1782" seems to indicate that there is corruption on the binary log, something that was not visible before the fix for BUG#26731317.