Bug #114263 Mysql innodb cluster hangs and processes are waiting for handler commit
Submitted: 7 Mar 2024 13:59 Modified: 8 Apr 2024 5:28
Reporter: Liang Cheng Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.31 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[7 Mar 2024 13:59] Liang Cheng
Description:
Today, we first noticed that the MySQL innodb cluster hangs.
1. Show full processlist, we can see many processes are on "waiting for handler commit" state, following are some of them
   530400	user1	db03:43736	test_db02	Query	5230	waiting for handler commit	DELETE FROM JGROUPSPING WHERE own_addr='5187608d-3145-5c15-de6f-5c3dfae1b09e' AND cluster_name='7fffffff-8305-e317-8000-0000175d3b33_aaa'
329681185	user1	db03:35106	test_db02	Query	5175	waiting for handler commit	DELETE FROM JGROUPSPING WHERE own_addr='4c5392f1-5cfe-fbf3-0d79-ee6c98a44b69' AND cluster_name='7fffffff-8305-e317-8000-0000175d3b33_aaa'
329709428	user1	db03:58476	test_db02	Query	5176	waiting for handler commit	DELETE FROM JGROUPSPING WHERE own_addr='edb7cfa3-89e0-96e6-d534-58e7f25db8af' AND cluster_name='7fffffff-8305-e317-8000-0000175d3b33_aaa'
329867929	user1	db03:51294	test_db02	Query	5221	waiting for handler commit	commit
332720693	user1	db03:33494	test_db03	Query	4391	waiting for handler commit	commit
333021837	user1	db03:36458	test_db02	Query	31	updating	UPDATE ....
333021838	user1	db03:36462	test_db02	Query	5216	waiting for handler commit	commit

2. On two slaves logs, they are full of message as below. And mysqld.log takes more than 5 Gb within two hours
    2024-03-07T01:03:12.440902Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451309Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451333Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451343Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451363Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451370Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451378Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451386Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.
2024-03-07T01:03:12.451393Z 23 [Warning] [MY-010590] [Repl] An unexpected event sequence was detected by the IO thread while queuing the event received from master '' binary log file, at position 3811062876200.

3. We tried to stop mysql server on slaves, stopping hangs. We have to kill them
4. We tried to stop mysql server on primary db, and stopping hangs as well. We have to kill them
5. We tried to start primary db, it succeed.
6. We tried to start slaves, the group_replication fails. 
    2024-03-07T03:40:27.230448Z 0 [ERROR] [MY-013467] [Repl] Plugin 
   group_replication reported: 'No valid or ONLINE members exist to get the 
    missing data from the group. For cloning check if donors of the same version 
   and with clone plugin installed exist. For incremental recovery check if you 
   have donors where the required data was not purged from the binary logs.'

2024-03-07T03:40:27.230542Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
7.  Then, we installed clone plugin on primary db and one of the slave dbs. Slaves can join to the cluster. (NOTE: I don't know why one slave has clone plugin installed, but another slave and primary db doesn't)
    INSTALL PLUGIN clone SONAME 'mysql_clone.so';

In the past 3 years, we used mysql5.7 innodb cluster, and we haven't encounter such issue before. Last month, we steup this new MySQL8 cluster environment, and  meet this issue for the first time.

We wonder if it's one bug. More important, if we meet this issue again, how could we workaround it ASAP. 

Thanks 

How to repeat:
We don't know how to reproduce it yet, because this is the first time we meet this issue.
[7 Mar 2024 14:13] MySQL Verification Team
Hi Mr. Cheng,

Thank you for your bug report.

However, this is a forum only for the bug with fully reproducible test cases.

Hence, we are waiting on your instructions .....
[8 Mar 2024 1:04] Liang Cheng
Thanks for your reply
so far, I think it's one intermittent issue, and not easy to reproduce it. So we really wonder if your team can provide some idea that what could be the cause.
[8 Mar 2024 5:28] MySQL Verification Team
Hi,

Looking at the data you provided this does not look like a bug but a misconfiguration. You might want to upgrade to latest 8.0 too but with regards to configuration of group replication and starting group replication from a "crash"/"full stop" please check documentation / talk to our support team.

Thanks
[9 Apr 2024 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".