Description:
Hi,
In Bug#89141, I describe a situation generating an error in the Group Replication applier. We have the following in P_S:
> SELECT * FROM performance_schema.replication_applier_status_by_coordinator
-> WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
THREAD_ID: 4531
SERVICE_STATE: ON
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Coordinator stopped because there were error(s) in the worker(s).
The most recent failure being: Worker 2 failed executing transaction 'UUID:147' at
master log , end_log_pos 168. See error log and/or performance_schema.replication_applier_status_by_worker
table for more details about this failure or others, if any.
LAST_ERROR_TIMESTAMP: 2018-01-01 19:29:30
1 row in set (0.00 sec)
And we have the following in the error log:
2018-01-01T18:29:30.880298Z 4499 [ERROR] Slave SQL for channel 'group_replication_applier': Worker 2 failed executing transaction 'UUID:147' at master log , end_log_pos 168; Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'c' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062
None of these 2 messages include a position in the relay logs. For investigating the error, we can only rely on the GTID and on relay log parsing, which is not very practical.
Please add relay log positional information in Group Replication error messages. In addition to the relay log filename, this could include the position of the beginning of the failed transaction in the relay logs. Note that the offset of the transaction is already present with end_log_pos, but this is strangely named (I will open another bug/feature request for that and but the bug number in the comments).
Many thanks,
JFG
How to repeat:
Not a bug but a feature request.
See Bug#89141 to know how to get the errors quoted in the description.
Suggested fix:
Add relay log positional information in Group Replication error messages. The position could include the relay log filename and the position of the beginning of the failed transaction in the relay logs (the offset of the transaction is already present with end_log_pos).