Bug #100242 Upgrade from MySQL 8.0.16 to MySQL 8.0.18 failed
Submitted: 17 Jul 2020 5:48 Modified: 12 Aug 2020 16:52
Reporter: lyp tennyson Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.18 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any

[17 Jul 2020 5:48] lyp tennyson
Description:
My mysql instance had a master server and a slave node (both in version 8016), the slave node worked with multiple slave-workers.

I wanted to upgrade to version 8018. So upgrade the slave node first.

In order to upgrade the version, i executed the 'stop slave' command, but not stopped properly, it may be blocked by some threads.

Then i killed the mysqld process with 'kill -9', and tried to launch the slave node using new mysqld version of 8018 (in-Place upgrade).

Startup failed.

[System] [MY-011090] [Server] Data dictionary upgrading from version '80016' to '80017'.
[System] [MY-013413] [Server] Data dictionary upgrade from version '80016' to '80017' completed.
[System] [MY-013381] [Server] Server upgrade from '80016' to '80018' started.
[ERROR] [MY-013178] [Server] Execution of server-side SQL statement 'ALTER TABLE slave_worker_info STATS_PERSISTENT=0;
' failed with error code = 1205, error message = 'Lock wait timeout exceeded; try restarting transaction'.
[ERROR] [MY-013380] [Server] Failed to upgrade server.

How to repeat:
It is difficult to reproduce the scene above. 

I use the debug version of mysqld to simulate the previous scenario and reproduce the failure.

step1, initialize a new database with mysql version of 8016 (debug).
  ~/mysql/bin/mysqld --defaults-file=./my.cnf --initialize

step2, start up the instance.
  ~/mysql/bin/mysqld --defaults-file=./my.cnf &

step3, client establishes a connection to the server with root, run sqls:

  #3.1 generate some data in table mysql.slave_master_info
  > CHANGE MASTER TO MASTER_HOST = '10.*.*.36', MASTER_USER = 'repl', MASTER_PASSWORD = 'replpassword', MASTER_PORT = 3306;
    Query OK, 0 rows affected, 2 warnings
  > create database test;
    Query OK, 1 row affected (5.64 sec)
  > create table test.t(a int);
    Query OK, 0 rows affected (2.97 sec)
  
  #3.2 simulate kill command
  > set debug='+d,crash_after_flush_binlog';
    Query OK, 0 rows affected (0.00 sec)

  #3.3 simulates slave applying relay log events, may modify both user and system tables in the same transaction
  > begin;
    Query OK, 0 rows affected (0.00 sec)
  > insert into test.t values(1);
    Query OK, 1 row affected (4.35 sec)
  > update mysql.slave_master_info set Host='cda';
    Query OK, 1 row affected (2.82 sec)
  > commit;
    ERROR 2013 (HY000): Lost connection to MySQL server during query

  # Now some transactions remain in PREPARED state. 
  # They will resurrect during recovery in step4, and add IX locks to tables 
  # modified by them.
  # But this cause the following upgrade-sql failed, which can not acquire X lock

step4. in-place upgrade, launch the mysqld-8018 on this database directory.
  [Server] Execution of server-side SQL statement 'ALTER TABLE slave_master_info STATS_PERSISTENT=0;
' failed with error code = 1205, error message = 'Lock wait timeout exceeded; try restarting transaction'.
  [Server] Failed to upgrade server.
  [Server] Aborting

Suggested fix:
Perhaps the execution of ‘kill -9' is wrong, which leading to a non-normal shutdown of the service.

I'm not sure if the above analysis is the root cause of the problem.
But my database does run into this situation, and the data on the slave node is no longer available.

Can mysql handle this situation and let the upgrade work, or prevent the upgrade (I gave up the upgrade and stuck with the old version)?
[12 Aug 2020 16:52] MySQL Verification Team
Hi,

I tried number of times but I cannot reproduce this problem.

Basically you should of waited for slave stop to finish properly and do a shutdown. By kill -9 you brought a server in an "unknown" state that was improper for the upgrade.

all best
Bogdan