Bug #104523 Improve logging of GCS communication issues.
Submitted: 3 Aug 2021 14:44 Modified: 22 Nov 2021 16:14
Reporter: Simon Mudd (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S4 (Feature request)
Version:8.0.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: communication, group replication, improved logging

[3 Aug 2021 14:44] Simon Mudd
Description:
I had an issue with a gr member not being able to join the group.

The logging shows:
2021-07-22T11:26:33.913088Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to the local group communication engine instance.'

This error is far from clear: it does not specify what the error is, and it does not specify what endpoint it was connecting to.

Improving the message would help.

I dived into the code and see one example of possible problems are TLS issues, in my case this was due to a firewall issue. In either case reporting the error in more detail would have helped diagnose and resolve the issue faster.

The same error message is also reported in 2 different places.

How to repeat:
start a gr member with connections blocked to the default port 33061.

Suggested fix:
Improve logging along the lines of:

2021-08-03T13:38:17.094549Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to the local group communication engine instance at <hostname>:<port>.  Reason: xxxxx.

Where xxx should be the error number and text reported by the OS.
[3 Aug 2021 14:51] Simon Mudd
See: https://github.com/mysql/mysql-server/pull/354
[4 Aug 2021 6:25] Simon Mudd
Related:  bug#104526
[4 Aug 2021 6:26] Simon Mudd
Adjusted category and made a FR.
[4 Aug 2021 13:36] MySQL Verification Team
Hi Simon,

Thanks for the info, FR and PR. I believe the change is very sensible and that should be merged but let's see what GR team will say.

all best
Bogdan
[22 Nov 2021 16:14] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.28:

Group Replication now logs operating system errors returned when there is a problem connecting to the local XCom instance, so it is easier to resolve issues such as a firewall blocking the connection.