Bug #89196 Please add GTID in applier error messages.
Submitted: 11 Jan 19:16 Modified: 19 Jan 5:37
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:5.7.20, 8.0.3 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D5 (Feature request)

[11 Jan 19:16] Jean-François Gagné
Description:
Hi,

In Bug#89194, I describe a situation breaking Group Replication.  In this bug, I report the following lines in the error log:

2018-01-01T19:30:23.914570Z 89 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'b' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062

2018-01-01T19:30:20.976713Z 86 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'B' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062

And those errors in P_S:

> SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
         CHANNEL_NAME: group_replication_applier
            WORKER_ID: 0
            THREAD_ID: NULL
        SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: UUID:111
    LAST_ERROR_NUMBER: 1062
   LAST_ERROR_MESSAGE: Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'b' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168
 LAST_ERROR_TIMESTAMP: 2018-01-06 15:21:52
1 row in set (0.00 sec)

> SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
         CHANNEL_NAME: group_replication_applier
            WORKER_ID: 0
            THREAD_ID: NULL
        SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: UUID:1000003
    LAST_ERROR_NUMBER: 1062
   LAST_ERROR_MESSAGE: Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'B' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168
 LAST_ERROR_TIMESTAMP: 2018-01-06 15:21:49
1 row in set (0.00 sec) 

In P_S, we get a LAST_ERROR_MESSAGE very close to what we get in the error log, but with the addition of the LAST_SEEN_TRANSACTION which I understand to be the GTID of the failing transaction.

Notice that the GTID of the failing transaction is not in the error log.  It would be very useful to have this GTID for investigation purposes.

So please add the GTID of the failing transaction to the error messages of Group Replication at least in the error log and maybe in the P_S tables.

Many thanks,

JFG

How to repeat:
Not a bug but a feature request.

See Bug#89194 to know how to get the errors quoted in the description.

Suggested fix:
Add the GTID of the failing transaction to the error messages of Group Replication.
[19 Jan 5:37] Umesh Shastry
Hello Jean,

Thank you for the report and feature request!

Thanks,
Umesh