Bug #109213 Can't connect to local MySQL server through router socket
Submitted: 27 Nov 2022 9:20 Modified: 1 Dec 2022 14:35
Reporter: zetang zeng Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Router Severity:S1 (Critical)
Version:8.0.30 OS:CentOS
Assigned to: CPU Architecture:Any

[27 Nov 2022 9:20] zetang zeng
Description:
After mysql master failover,server can't be connected by router socket.

How to repeat:
- deploy three node innodb cluster & router
- check status of cluster, all status is fine
- now we can login through router socket

mysql -ucluster -pXXX -S /xxx/mysql.sock

- sudo restart mysqld
- now 

  - [fail] mysql -ucluster -pXXX -S /xxx/mysql.sock
  - [ok] mysql -ucluster -pXXX -S /xxx/mysqlro.sock
  - [ok] mysql -ucluster -pXXX -h 127.0.0.1 -P 6447
[29 Nov 2022 1:09] MySQL Verification Team
Hi,

You can always reproduce this? I run the setup three times from scratch today (with 8.0.31 but should be no different) and I cannot reproduce this ?

Thanks
[29 Nov 2022 2:22] zetang zeng
yes, I can re-produce it in my env. What else do you need that you can re-produce it.
[29 Nov 2022 2:25] MySQL Verification Team
Hi,

Would be very helpful if you can do it with 8.0.31 so we are on the same page and using latest release.

Also, can you give me step by step what you are doing as I'm doing the basic setup and it works ok.

Thanks
[29 Nov 2022 2:26] zetang zeng
Ok, I will try 8.0.31 and give you a detailed steps.
[29 Nov 2022 2:27] MySQL Verification Team
Great, thanks.
[29 Nov 2022 3:01] zetang zeng
I managed to reproduce this problem but not exactly same behavior.

- Make a failure env by following the steps in this report(https://bugs.mysql.com/bug.php?id=109202). 

- Then recovering this env by sql from this doc(https://dev.mysql.com/doc/refman/5.7/en/group-replication-restarting-group.html):

ansible mysql -m shell -a "mysql -uroot -pXXX -S /opt/tmp/sock/mysql.sock -e 'STOP GROUP_REPLICATION '"

# on primary
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
SELECT * FROM performance_schema.replication_group_members;

# on secondary
START GROUP_REPLICATION;
SELECT * FROM performance_schema.replication_group_members;

- check status of cluster and it says it is ok
{
    "clusterName": "myCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "192.168.1.121:3406",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "192.168.1.121:3406": {
                "address": "192.168.1.121:3406",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.40"
            },
            "192.168.1.172:3406": {
                "address": "192.168.1.172:3406",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.40"
            },
            "192.168.1.26:3406": {
                "address": "192.168.1.26:3406",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.40"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "192.168.1.121:3406"
}

- now we can connect router via host but not sock

mysql -ucluster -pXXX -S /opt/tiger/mysql_router/router-config/mysql.sock
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/opt/tiger/mysql_router/router-config/mysql.sock' (111)

[ok] mysql -ucluster -pXXX -h 192.168.1.26 -P 6446
[29 Nov 2022 4:16] MySQL Verification Team
Hi,

This does not looks like original issue. I do not see you doing any reboots here? You set up the system and socket is not working? Are you sure you configured it properly? Is the socket present? is SELINUX enabled/disabled/properly configured?
[29 Nov 2022 6:17] zetang zeng
> I do not see you doing any reboots here? 
Yes.

> You set up the system and socket is not working? 
No, it is fine at the very beginning, but after a failure recovering, it is not working.

> Are you sure you configured it properly? 
Yes, because it can login before the failure & after reboot router.

> Is the socket present? 
Yes. Reboot will generate a new socket file and login by it will work

> is SELINUX enabled/disabled/properly configured?
yes.
[29 Nov 2022 9:48] MySQL Verification Team
Hi,

> > Is the socket present? 
> Yes. Reboot will generate a new socket file and login by it will work

I do not understand, if there is a socket after reboot and login by it will work then there is no problem ?!

> > is SELINUX enabled/disabled/properly configured?
> yes.

yes enabled or yes disabled? Can you try with selinux disabled just to remove that as possible issue

thanks
[29 Nov 2022 10:02] zetang zeng
it is disabled.

# sestatus
SELinux status:                 disabled
[30 Nov 2022 9:04] MySQL Verification Team
Hi,
A colleague of mine just hinted me that this could be SSL related.

Can you look at: https://bugs.mysql.com/bug.php?id=107291

and try to apply workaround explained in last note in that bug.
[1 Dec 2022 8:19] zetang zeng
I have tried to use `--ssl-mode=REQUIRED` with mysql client, but sadly not work.
[1 Dec 2022 14:35] MySQL Verification Team
Hi,

I verified this and I think this is mix of expected behavior and improper configuration.

Workaround could be not using "caching_sha2_password" as this is what most probably creates a problem here. Problem is that when you reboot the server it wipes the cache. 

We are in process of changing how socket+ssl work so this should not be a problem any more.
[1 Aug 2023 2:25] zetang zeng
I saw similar problem with 5.7.40, after some system fault(like memory is full, load is very high), when host port way is fine, we can't connect using socket file.