Description:
My mysql instance had a master server and a slave node (both in version 8016), the slave node worked with multiple slave-workers.
I wanted to upgrade to version 8018. So upgrade the slave node first.
In order to upgrade the version, i executed the 'stop slave' command, but not stopped properly, it may be blocked by some threads.
Then i killed the mysqld process with 'kill -9', and tried to launch the slave node using new mysqld version of 8018 (in-Place upgrade).
Startup failed.
[System] [MY-011090] [Server] Data dictionary upgrading from version '80016' to '80017'.
[System] [MY-013413] [Server] Data dictionary upgrade from version '80016' to '80017' completed.
[System] [MY-013381] [Server] Server upgrade from '80016' to '80018' started.
[ERROR] [MY-013178] [Server] Execution of server-side SQL statement 'ALTER TABLE slave_worker_info STATS_PERSISTENT=0;
' failed with error code = 1205, error message = 'Lock wait timeout exceeded; try restarting transaction'.
[ERROR] [MY-013380] [Server] Failed to upgrade server.
How to repeat:
It is difficult to reproduce the scene above.
I use the debug version of mysqld to simulate the previous scenario and reproduce the failure.
step1, initialize a new database with mysql version of 8016 (debug).
~/mysql/bin/mysqld --defaults-file=./my.cnf --initialize
step2, start up the instance.
~/mysql/bin/mysqld --defaults-file=./my.cnf &
step3, client establishes a connection to the server with root, run sqls:
#3.1 generate some data in table mysql.slave_master_info
> CHANGE MASTER TO MASTER_HOST = '10.*.*.36', MASTER_USER = 'repl', MASTER_PASSWORD = 'replpassword', MASTER_PORT = 3306;
Query OK, 0 rows affected, 2 warnings
> create database test;
Query OK, 1 row affected (5.64 sec)
> create table test.t(a int);
Query OK, 0 rows affected (2.97 sec)
#3.2 simulate kill command
> set debug='+d,crash_after_flush_binlog';
Query OK, 0 rows affected (0.00 sec)
#3.3 simulates slave applying relay log events, may modify both user and system tables in the same transaction
> begin;
Query OK, 0 rows affected (0.00 sec)
> insert into test.t values(1);
Query OK, 1 row affected (4.35 sec)
> update mysql.slave_master_info set Host='cda';
Query OK, 1 row affected (2.82 sec)
> commit;
ERROR 2013 (HY000): Lost connection to MySQL server during query
# Now some transactions remain in PREPARED state.
# They will resurrect during recovery in step4, and add IX locks to tables
# modified by them.
# But this cause the following upgrade-sql failed, which can not acquire X lock
step4. in-place upgrade, launch the mysqld-8018 on this database directory.
[Server] Execution of server-side SQL statement 'ALTER TABLE slave_master_info STATS_PERSISTENT=0;
' failed with error code = 1205, error message = 'Lock wait timeout exceeded; try restarting transaction'.
[Server] Failed to upgrade server.
[Server] Aborting
Suggested fix:
Perhaps the execution of ‘kill -9' is wrong, which leading to a non-normal shutdown of the service.
I'm not sure if the above analysis is the root cause of the problem.
But my database does run into this situation, and the data on the slave node is no longer available.
Can mysql handle this situation and let the upgrade work, or prevent the upgrade (I gave up the upgrade and stuck with the old version)?