MySQL Bugs: #107462: Failed upgrade attempt to 8.0.29 corrupts the data dictionary

Bug #107462	Failed upgrade attempt to 8.0.29 corrupts the data dictionary
Submitted:	2 Jun 2022 9:02	Modified:	2 Jun 2022 15:57
Reporter:	Luis Donoso	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Data Dictionary	Severity:	S1 (Critical)
Version:	8.0.29	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	upgrade

Description:
When an upgrade from 8.0.27 to 8.0.29 is not possible the data
dictionary gets corrupted and 8.0.27 does not start with the used
data_dir.

How to repeat:
*How to repeat:*

1. Start a 8.0.27 instance.

2. Modify 8.0.29 so the upgrade fails by introducing the following patch:

```
diff --git a/sql/dd/impl/upgrade/server.cc b/sql/dd/impl/upgrade/server.cc
index 05cc418316f..58a97e9b038 100644
--- a/sql/dd/impl/upgrade/server.cc
+++ b/sql/dd/impl/upgrade/server.cc
@@ -554,6 +554,7 @@ static bool get_shared_tablespace_names(
 static bool check_tables(THD *thd, std::unique_ptr<Schema> &schema,
                          const std::set<dd::String_type> *shared_spaces,
                          Upgrade_error_counter *error_count) {
   std::unique_ptr<Object_key> table_key(
       dd::Table::DD_table::create_key_by_schema_id(schema->id()));

@@ -583,6 +584,10 @@ static bool check_tables(THD *thd, std::unique_ptr<Schema> &schema,
         }
       }
     }
+
+    (*error_count)++;  // Force upgrade error
+
     return error_count->has_too_many_errors();
   };
```

3. Start 8.0.29 with the `data_dir` of 8.0.27. The start fails with:

```
2022-05-25T10:06:25.650972Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2022-05-25T10:06:25.651198Z 0 [ERROR] [MY-010119] [Server] Aborting
```

4. Start 8.0.27 with the same `data_dir`. The start fails with an assertion error:

```
2022-05-25T10:07:09.582930Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2022-05-25T10:07:09.752854Z 1 [ERROR] [MY-013183] [InnoDB] Assertion failure: mtr0log.cc:135:*type <= MLOG_BIGGEST_TYPE thread 139626150885120
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
10:07:09 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7efd538f3000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7efd3f1fd448 thread_stack 0x100000
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x42) [0x62e2502]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(handle_fatal_signal+0x232) [0x5176322]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7efd567f7420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7efd53dd200b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7efd53db1859]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x293) [0x6aa0763]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(mlog_parse_initial_log_record(unsigned char const*, unsigned char const*, mlog_id_t*, unsigned int*, unsigned int*)+0x89) [0x686fa89]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x684b952]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x684af32]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x68498a1]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x68490d9]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x684280d]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(recv_recovery_from_checkpoint_start(log_t&, unsigned long)+0x71b) [0x684136b]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(srv_start(bool)+0x2733) [0x69e8143]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x637d9e1]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x636da81]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int)+0x66) [0x5f90176]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld(dd::upgrade_57::do_pre_checks_and_initialize_dd(THD*)+0x87c) [0x6291c0c]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x4b4e05a]
/home/ldonoso/src/mysql-8.0.27-bld/install/bin/mysqld() [0x6b8fae0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7efd567eb609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7efd53eae133]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
Aborted (core dumped)
```

*Notes:*

The problem is caused we the addition of the following redo log types:

```
MLOG_REC_INSERT = 67,
MLOG_REC_CLUST_DELETE_MARK = 68,
MLOG_REC_DELETE = 69,
MLOG_REC_UPDATE_IN_PLACE = 70,
MLOG_LIST_END_COPY_CREATED = 71,
MLOG_PAGE_REORGANIZE = 72,
MLOG_ZIP_PAGE_REORGANIZE = 73,
MLOG_ZIP_PAGE_COMPRESS_NO_DATA = 74,
MLOG_LIST_END_DELETE = 75,
MLOG_LIST_START_DELETE = 76,
```

Due to the DD initialization error, the upgrade process itself is not
started, hence the recovery mechanism in place for an unsuccessful
upgrade (which would fix the DD) does not kick in.

Suggested fix:
*Solution:*

If the DD couldn't be init during an upgrade, hence is in an state
incompatible with previous version of the server, apply the same logic
to the redo logs as when the the upgrade was not successful:

- Downgrade the redo log files.
- Flush the redo log files.

```
diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index 36c93e96d74..c37785e5401 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -1207,6 +1207,7 @@ bool opt_no_monitor = false;
 bool opt_no_dd_upgrade = false;
 long opt_upgrade_mode = UPGRADE_AUTO;
 bool opt_initialize = false;
+bool dd_init_failed_during_upgrade = false;
 bool opt_skip_replica_start = false;  ///< If set, slave is not autostarted
 bool opt_enable_named_pipe = false;
 bool opt_local_infile, opt_replica_compressed_protocol;
@@ -6328,6 +6329,10 @@ static int init_server_components() {
         dd::init(dd::enum_dd_init_type::DD_RESTART_OR_UPGRADE)) {
       LogErr(ERROR_LEVEL, ER_DD_INIT_FAILED);
 
+      if (!dd::upgrade::no_server_upgrade_required()) {
+        dd_init_failed_during_upgrade = true;
+      }
+
       /* If clone recovery fails, we rollback the files to previous
       dataset and attempt to restart server. */
       int exit_code =
diff --git a/sql/mysqld.h b/sql/mysqld.h
index 20eaab6a294..913b56f13f8 100644
--- a/sql/mysqld.h
+++ b/sql/mysqld.h
@@ -181,6 +181,7 @@ extern MYSQL_PLUGIN_IMPORT std::atomic<int32>
 extern bool opt_no_dd_upgrade;
 extern long opt_upgrade_mode;
 extern bool opt_initialize;
+extern bool dd_init_failed_during_upgrade;
 extern bool opt_safe_user_create;
 extern bool opt_local_infile, opt_myisam_use_mmap;
 extern bool opt_replica_compressed_protocol;
diff --git a/storage/innobase/srv/srv0start.cc b/storage/innobase/srv/srv0start.cc
index fead6001b0a..b63682d2dee 100644
--- a/storage/innobase/srv/srv0start.cc
+++ b/storage/innobase/srv/srv0start.cc
@@ -3432,6 +3432,7 @@ forced to flush all dirty pages in the last stages of page cleaners activity
 (unless it was fast shutdown). After checkpoint is written, the flushed_lsn is
 updated within header of the system tablespace. This is lsn of the last clean
 shutdown. */
+
 static lsn_t srv_shutdown_log() {
   ut_a(srv_shutdown_state.load() == SRV_SHUTDOWN_FLUSH_PHASE);
   ut_a(!buf_flush_page_cleaner_is_active());
@@ -3488,7 +3489,7 @@ static lsn_t srv_shutdown_log() {
 
   srv_shutdown_set_state(SRV_SHUTDOWN_LAST_PHASE);
 
-  if (srv_downgrade_logs) {
+  if (srv_downgrade_logs || dd_init_failed_during_upgrade) {
     ut_a(!srv_read_only_mode);
 
     log_files_downgrade(*log_sys);
```

Hi Luis,

While some of my colleagues consider this to not be a bug I do agree that if fails, upgrade procedure should revert the changes made, so that 8.0.27 can start.

Thank you for the report
Bogdan

fix by recreating the redo log file compatible with lower version (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug#107462.patch (application/octet-stream, text), 12.25 KiB.

Contribution submitted via Github - Bug#107462 Failed upgrade attempt to 8.0.29 corrupts the data dictionary (*) Contribution by Rahul Malik (Github rahulmalik87, mysql-server/pull/411#issuecomment-1157472469): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_969097968.txt (text/plain), 12.69 KiB.