MySQL Bugs: #100242: Upgrade from MySQL 8.0.16 to MySQL 8.0.18 failed

Bug #100242	Upgrade from MySQL 8.0.16 to MySQL 8.0.18 failed
Submitted:	17 Jul 2020 5:48	Modified:	7 May 14:14
Reporter:	lyp tennyson	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	8.0.18	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
My mysql instance had a master server and a slave node (both in version 8016), the slave node worked with multiple slave-workers.

I wanted to upgrade to version 8018. So upgrade the slave node first.

In order to upgrade the version, i executed the 'stop slave' command, but not stopped properly, it may be blocked by some threads.

Then i killed the mysqld process with 'kill -9', and tried to launch the slave node using new mysqld version of 8018 (in-Place upgrade).

Startup failed.

[System] [MY-011090] [Server] Data dictionary upgrading from version '80016' to '80017'.
[System] [MY-013413] [Server] Data dictionary upgrade from version '80016' to '80017' completed.
[System] [MY-013381] [Server] Server upgrade from '80016' to '80018' started.
[ERROR] [MY-013178] [Server] Execution of server-side SQL statement 'ALTER TABLE slave_worker_info STATS_PERSISTENT=0;
' failed with error code = 1205, error message = 'Lock wait timeout exceeded; try restarting transaction'.
[ERROR] [MY-013380] [Server] Failed to upgrade server.

How to repeat:
It is difficult to reproduce the scene above. 

I use the debug version of mysqld to simulate the previous scenario and reproduce the failure.

step1, initialize a new database with mysql version of 8016 (debug).
  ~/mysql/bin/mysqld --defaults-file=./my.cnf --initialize

step2, start up the instance.
  ~/mysql/bin/mysqld --defaults-file=./my.cnf &

step3, client establishes a connection to the server with root, run sqls:

  #3.1 generate some data in table mysql.slave_master_info
  > CHANGE MASTER TO MASTER_HOST = '10.*.*.36', MASTER_USER = 'repl', MASTER_PASSWORD = 'replpassword', MASTER_PORT = 3306;
    Query OK, 0 rows affected, 2 warnings
  > create database test;
    Query OK, 1 row affected (5.64 sec)
  > create table test.t(a int);
    Query OK, 0 rows affected (2.97 sec)
  
  #3.2 simulate kill command
  > set debug='+d,crash_after_flush_binlog';
    Query OK, 0 rows affected (0.00 sec)

  #3.3 simulates slave applying relay log events, may modify both user and system tables in the same transaction
  > begin;
    Query OK, 0 rows affected (0.00 sec)
  > insert into test.t values(1);
    Query OK, 1 row affected (4.35 sec)
  > update mysql.slave_master_info set Host='cda';
    Query OK, 1 row affected (2.82 sec)
  > commit;
    ERROR 2013 (HY000): Lost connection to MySQL server during query

  # Now some transactions remain in PREPARED state. 
  # They will resurrect during recovery in step4, and add IX locks to tables 
  # modified by them.
  # But this cause the following upgrade-sql failed, which can not acquire X lock

step4. in-place upgrade, launch the mysqld-8018 on this database directory.
  [Server] Execution of server-side SQL statement 'ALTER TABLE slave_master_info STATS_PERSISTENT=0;
' failed with error code = 1205, error message = 'Lock wait timeout exceeded; try restarting transaction'.
  [Server] Failed to upgrade server.
  [Server] Aborting

Suggested fix:
Perhaps the execution of ‘kill -9' is wrong, which leading to a non-normal shutdown of the service.

I'm not sure if the above analysis is the root cause of the problem.
But my database does run into this situation, and the data on the slave node is no longer available.

Can mysql handle this situation and let the upgrade work, or prevent the upgrade (I gave up the upgrade and stuck with the old version)?

Hi,

I tried number of times but I cannot reproduce this problem.

Basically you should of waited for slave stop to finish properly and do a shutdown. By kill -9 you brought a server in an "unknown" state that was improper for the upgrade.

all best
Bogdan

FWIW: I have seen similar issues in:
- 8.0.30 -> 8.0.34
- 8.4.4 -> 8.4.5
upgrades. This sort of thing happens very infrequently but I've seen it twice now.

Thanks for the update.

I'm still having issue reproducing this. If anyone can give us the db structure that fails it will help speed up finding the solution.

I encountered the same problem in the production environment. When I was performing a cross-version backup recovery, the upgrade from 8025 to 8028 failed, causing the recovery failure.
2025-06-09T18:17:03.117990+08:00 4 [System] [MY-013381] [Server] Server upgrade from '80025' to '80028' started.
2025-06-09T19:40:25.654835+08:00 4 [ERROR] [MY-013178] [Server] Execution of server-side SQL statement 'ALTER TABLE slave_worker_info STATS_PERSIS
ut exceeded; try restarting transaction".

You need to be sure a clean shutdown is performed before upgrading.
Otherwise there may be lingering internal XA transactions created by the replication threads.

I've filed a report internally but we cannot give ETA:
 Bug 37922965 - Refuse to begin upgrade when last shutdown was unclean