Description:
Hi,
In Bug#89194, I describe a situation breaking Group Replication. In this bug, I report the following lines in the error log:
2018-01-01T19:30:23.914570Z 89 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'b' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062
2018-01-01T19:30:20.976713Z 86 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'B' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062
And those errors in P_S:
> SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
WORKER_ID: 0
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: UUID:111
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'b' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168
LAST_ERROR_TIMESTAMP: 2018-01-06 15:21:52
1 row in set (0.00 sec)
> SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
WORKER_ID: 0
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: UUID:1000003
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'B' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168
LAST_ERROR_TIMESTAMP: 2018-01-06 15:21:49
1 row in set (0.00 sec)
In P_S, we get a LAST_ERROR_MESSAGE very close to what we get in the error log, but with the addition of the LAST_SEEN_TRANSACTION which I understand to be the GTID of the failing transaction.
Notice that the GTID of the failing transaction is not in the error log. It would be very useful to have this GTID for investigation purposes.
So please add the GTID of the failing transaction to the error messages of Group Replication at least in the error log and maybe in the P_S tables.
Many thanks,
JFG
How to repeat:
Not a bug but a feature request.
See Bug#89194 to know how to get the errors quoted in the description.
Suggested fix:
Add the GTID of the failing transaction to the error messages of Group Replication.