Bug #98948 | Recovering Innodb cluster from complete outage hangs and fails to recover | ||
---|---|---|---|
Submitted: | 13 Mar 2020 20:48 | Modified: | 24 Jul 2020 15:51 |
Reporter: | Bradley Pearce | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S2 (Serious) |
Version: | 8.0.19 | OS: | Windows |
Assigned to: | CPU Architecture: | Any |
[13 Mar 2020 20:48]
Bradley Pearce
[23 Mar 2020 20:47]
MySQL Verification Team
Hi Thanks for the report. I could not reproduce this on Linux. Let me try to reproduce this on Windows. Bogdan
[23 Mar 2020 21:07]
Bradley Pearce
Thanks for your response. Unfortunately, i haven't verified if this happens in Linux as we haven't got a linux environment to use for this purpose. If it does occur in Windows could this be a bug in the Windows version of 8.0.19?
[24 Mar 2020 0:01]
MySQL Verification Team
Hi, Let's see if I can reproduce it on Windows and we'll go from there. Like you don't have Linux handy I don't really use Windows so need to make one system for this purpose :) all best Bogdan
[1 Apr 2020 10:12]
Peter Johansson
I have the same problem running sandbox 8.0.19 on windows 10. After running dba.dba.rebootClusterFromCompleteOutage the shell hangs on "NOTE: Cancelling active GR auto-initialization at HP-computer:3310" Peter
[2 Apr 2020 8:56]
Peter Johansson
I waited for the start of the cluster and finally after about 30 minutes it started. Attached error file.
[2 Apr 2020 16:53]
MySQL Verification Team
Hi Bradley, Peter, I'm not reproducing this on Windows 10. Peter, I don't see much of useful info in the log, except it did start. The issues it might have connecting, I can't say, could be Windows related (antivirus, firewall..) but I can't reproduce this on my Win10 system. all best Bogdan
[3 Apr 2020 4:28]
MySQL Verification Team
Hi Bradley, Peter, A colleague of mine actually managed to reproduce this, we are working on a fix. Thanks Bogdan
[8 Jun 2020 6:30]
vijayakumar kommula
Hi , We are using windows 2019 with 8.0.20 commercial edition. Encounter the same error. DO you have any alternative solution or minimize the errors. Regards, Vijay
[17 Jun 2020 19:02]
Omer Niah
Hey guys, I can still see this on version": "8.0.20" on centos 8.0, this happened if you don't stop mysql properly and shutdown all node. regards omer
[24 Jul 2020 15:51]
Bradley Pearce
Hello, Has this been fixed in 8.0.20 or 8.0.21? Thanks for your help Kind regards, Brad
[13 Aug 2020 16:03]
Robert Azzopardi
Hi, we have the same situation where when recovering from an outage using command dba.rebootClusterFromCompleteOutage(), it states, NOTE: Cancelling active GR auto-initialization at xxx.xxx.xxx.xxx:3306 After 30 minutes it completes and cluster will be successfully rebooted. This issue happened as soon as we upgraded cluster from 8.0.18 to 8.0.21. We have replicated this in various windows environments and always produce the same behavior. Is there a fix for this issue? Thanks Robert
[11 Sep 2020 15:28]
lionel mazeyrat
we upgrade from 8.0.20 to 8.0.21, 3 nodes with wondows server 2016 nad we have the same behavior. dba.rebootClusterFromCompleteOutage take 30 minutes
[14 Sep 2020 20:08]
Romain Brenet
Hello, we have a same issue with Windows 2019 and MySQL server 8.0.21. rebootClusterFromCompleteOutage() => 30 minutes Thanks. Have a nice day
[7 Oct 2020 9:19]
Frieder Mentele
Hello, we have a same issue with Windows 2019 and MySQL server 8.0.21. I installed 8.0.18 and tested - working like a charm. 8.0.21 rebootClusterFromCompleteOutage() => 30 minutes 8.0.18 rebootClusterFromCompleteOutage() => 1 minute
[11 Feb 2021 3:54]
Keith Lammers
Just adding a note to mention that I am running into this issue with MySQL on Windows as well. MySQL Server and Shell are both 8.0.23 on all 3 cluster instances.
[7 May 2021 13:08]
Eduardo Ortega
Affects me on MySQL 8.0.23 for Linux
[7 May 2021 18:35]
Andrew Garner
This also affects me using MySQL 8.0.24 on Linux.
[9 Dec 2021 10:45]
Florian Apolloner
We are also seeing this on 8.0.26; we are testing in docker-compose with this setup: ``` version: "3" services: mysql1: image: docker.io/mysql/mysql-server:8.0.26 env_file: mysql.env stop_grace_period: 1m command: --server_id=1 volumes: - ./my.cnf:/etc/my.cnf:ro,z - ./data1:/var/lib/mysql:rw,z hostname: mysql1 mysql2: image: docker.io/mysql/mysql-server:8.0.26 env_file: mysql.env stop_grace_period: 1m command: --server_id=2 volumes: - ./my.cnf:/etc/my.cnf:ro,z - ./data2:/var/lib/mysql:rw,z hostname: mysql2 mysql3: image: docker.io/mysql/mysql-server:8.0.26 env_file: mysql.env stop_grace_period: 1m command: --server_id=3 volumes: - ./my.cnf:/etc/my.cnf:ro,z - ./data3:/var/lib/mysql:rw,z hostname: mysql3 ``` and this my.cnf: ``` [mysqld] skip-host-cache skip-name-resolve datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock secure-file-priv=/var/lib/mysql-files user=mysql pid-file=/var/run/mysqld/mysqld.pid binlog_transaction_dependency_tracking=WRITESET replica_preserve_commit_order=ON replica_parallel_type=LOGICAL_CLOCK enforce_gtid_consistency=ON gtid_mode=ON #plugin_load = group_replication.so #group_replication_autorejoin_tries=0 #group_replication_components_stop_timeout=2 #group_replication_communication_debug_options=GCS_DEBUG_ALL ```
[5 Sep 2022 12:41]
ASP Serveur
Hello, I have the same problem running MySQL 8.0.20 on Debian 10. After running dba.dba.rebootClusterFromCompleteOutage the shell hangs on "NOTE: Cancelling active GR auto-initialization at mysql_node1:3310" I waited more than 45 minutes but nothing happened ! Do you have any news on this subject ? Thanks. Have a nice day
[9 Nov 2022 5:14]
Khanh Van Chu
I can still see this on version": "8.0.31" on centos 9.0 I setup 3 new instances, createCluster and addInstance is ok, then I restart all 3 instances => mysql hangs on: NOTE: Cancelling active GR auto-initialization at mysql-node1:3306 Regards
[14 Jan 2023 1:11]
Aray Chou
I suffered the same problem. And I found another clue. reboot all the linux servers, the mysql will use 100% of cpu ( one core of cpu) . Tasks: 113 total, 2 running, 111 sleeping, 0 stopped, 0 zombie %Cpu(s): 27.9 us, 23.2 sy, 0.0 ni, 48.7 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 4907100 total, 3414200 free, 1141520 used, 351380 buff/cache KiB Swap: 2097148 total, 2093556 free, 3592 used. 3692584 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9968 polkitd 20 0 3995296 721876 24552 S 101.7 14.7 23:03.92 mysqld 1 root 20 0 125764 3620 1908 S 0.0 0.1 0:01.27 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd [root@centos-61 ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@centos-61 ~]# uname -a Linux centos-61 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux MySQL version: 8.0.31 I use following commands to start docker containers to start mysql. mkdir -p /opt/mysql/data chown -R 999:999 /opt/mysql chmod 700 /opt/mysql docker run -d --name mysql \ --cpus 2 \ --memory 1.5GB \ --network host \ --restart unless-stopped \ -v /opt/mysql/data:/var/lib/mysql \ -e MYSQL_ROOT_PASSWORD=somePassword\ --security-opt seccomp=unconfined \ mysql:8 \ --innodb-dedicated-server=ON \ --group-replication-consistency=AFTER docker exec -it mysql mysql -p CREATE USER 'cluster_root'@'%' IDENTIFIED BY 'somePassword'; GRANT ALL PRIVILEGES ON *.* TO 'cluster_root'@'%' WITH GRANT OPTION; show global variables like 'innodb_dedicated_server'; show global variables like 'group_replication_consistency'; mysqlsh dba.configureInstance('cluster_root@debian-101:3306') dba.configureInstance('cluster_root@debian-102:3306') dba.configureInstance('cluster_root@debian-103:3306') docker restart mysql mysqlsh shell.connect('cluster_root@debian-101:3306') cluster = dba.createCluster('my_innodb_cluster'); cluster.addInstance('debian-102:3306') cluster.addInstance('debian-103:3306')
[7 Sep 2023 14:21]
Christos Vlachos
I still encounter this problem (stack in "NOTE: Cancelling active GR auto-initialization at nodename:port"), running mysql Ver 8.0.30 for Linux on x86_64 (MySQL Community Server - GPL) on Cent OS 7. But I have found a solution! This problem occurs to me when the MySQL service is running (after a restart in all the nodes) and I try to run the command dba.rebootClusterFromCompleteOutage() in the lest active node in order to power up the cluster. But! If I stop the services on the other 2 nodes and reboot the cluster only on the last primary node (keeping the metadata of the other 2 nodes) the cluster is rebooted successfully, and after that I start the services on the other 2 nodes and they rejoin the cluster immediately and VoilĂ ! I hope this approach will help you too.