MySQL Bugs: #72351: mysqld failed recovery for in-place upgrading from 5.5 to 5.6

Bug #72351	mysqld failed recovery for in-place upgrading from 5.5 to 5.6
Submitted:	15 Apr 2014 12:10	Modified:	16 Apr 2014 14:33
Reporter:	liu hickey (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB Plugin storage engine	Severity:	S2 (Serious)
Version:	5.6.16+	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	crash-recovery, sys_datafiles

Description:
Upgrade mysql from 5.5 to 5.6, but keeping the datadir the same, mysqld crashed due to accessing sys_datafiles->indexes:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000d4f4ea in dict_get_first_path (space=12, name=0x5cc85488 "drc/heartbeat") at /tmp/xy/storage/innobase/dict/dict0load.cc:809
809 sys_index = UT_LIST_GET_FIRST(sys_datafiles->indexes);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64
(gdb) p sys_datafiles
$1 = (dict_table_t *) 0x0

How to repeat:
With gdb, it's clear that, when looking for SYS_DATAFILES field in table INNODB_SYS_TABLES during start-up, it failed as the first field is SYS_FOREIGN.

Breakpoint 1, dict_get_first_path (space=13, name=0xc7aed928 "drc/tx_begin4unit_mark_1") at /tmp/xy/storage/innobase/dict/dict0load.cc:801
801 char* dict_filepath = NULL;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 nss-softok
n-freebl-3.12.9-11.el6.x86_64
802 mem_heap_t* heap = mem_heap_create(1024);
804 ut_ad(mutex_own(&(dict_sys->mutex)));
806 mtr_start(&mtr);
808 sys_datafiles = dict_table_get_low("SYS_DATAFILES");
dict_table_get_low (table_name=0x106c0ea "SYS_DATAFILES") at /tmp/xy/storage/innobase/include/dict0priv.ic:43
43 ut_ad(table_name);
44 ut_ad(mutex_own(&(dict_sys->mutex)));
46 table = dict_table_check_if_in_cache_low(table_name);
48 if (table && table->corrupted) {
60 if (table == NULL) {
61 table = dict_load_table(table_name, TRUE, DICT_ERR_IGNORE_NONE);
dict_load_table (name=0x106c0ea "SYS_DATAFILES", cached=1, ignore_err=DICT_ERR_IGNORE_NONE) at /tmp/xy/storage/innobase/dict/dict0load.cc:2291

2316 dfield = dtuple_get_nth_field(tuple, 0);
2318 dfield_set_data(dfield, name, ut_strlen(name));
2319 dict_index_copy_types(tuple, sys_index, 1);
2322 BTR_SEARCH_LEAF, &pcur, &mtr);
2323 rec = btr_pcur_get_rec(&pcur);
2325 if (!btr_pcur_is_on_user_rec(&pcur)
2337 rec, DICT_FLD__SYS_TABLES__NAME, &len);
2340 if (len != ut_strlen(name) || ut_memcmp(name, field, len) != 0) {
$1 = 11
$2 = 13
$3 = 0x106c0ea "SYS_DATAFILES"
$4 = (const unsigned char *) 0x2aaf78f2008d "SYS_FOREIGN"
$5 = 11

And here is the backtrace:
#0 dict_table_get_low (table_name=0x106c08f "SYS_TABLES") at /tmp/xy/storage/innobase/include/dict0priv.ic:43
#1 0x0000000000d4fc12 in dict_check_tablespaces_and_store_max_id (dict_check=DICT_CHECK_ALL_LOADED)
    at /tmp/xy/storage/innobase/dict/dict0load.cc:977
#2 0x0000000000c8673b in innobase_start_or_create_for_mysql () at /tmp/xy/storage/innobase/srv/srv0start.cc:2418 
#3 0x0000000000b39ffa in innobase_init (p=0x18d18c0) at /tmp/xy/storage/innobase/handler/ha_innodb.cc:3264
#4 0x000000000063e515 in ha_initialize_handlerton (plugin=0x18cf4d0) at /tmp/xy/sql/handler.cc:661
(More stack frames follow...)

After looking into the backtrace, it's not hard to locate the problem is related to crash-recovery case:

innobase_start_or_create_for_mysql
    
                        if (recv_needed_recovery) {
                                dict_check = DICT_CHECK_ALL_LOADED;
                        } else if (n_recovered_trx) {
                                dict_check = DICT_CHECK_SOME_LOADED;
                        } else {
                                dict_check = DICT_CHECK_NONE_LOADED;
                        }
                        dict_check_tablespaces_and_store_max_id(dict_check); <<-- DICT_CHECK_ALL_LOADED

dict_check_tablespaces_and_store_max_id

         switch (dict_check) {
                case DICT_CHECK_ALL_LOADED:
                        /* All tablespaces should have been found in
                        fil_load_single_table_tablespaces(). */
                        if (fil_space_for_table_exists_in_mem(
                                space_id, name, TRUE, !(is_temp || discarded),
                                false, NULL, 0)
                            && !(is_temp || discarded)) {
                                /* If user changes the path of .ibd files in
                                   *.isl files before doing crash recovery ,
                                   then this leads to inconsistency in
                                   SYS_DATAFILES system table because the
                                   tables are loaded from the updated path
                                   but the SYS_DATAFILES still points to the
                                   old path.Therefore after crash recovery
                                   update SYS_DATAFILES with the updated path.*/
                                ut_ad(space_id);
                                ut_ad(recv_needed_recovery);
                                char *dict_path = dict_get_first_path(space_id,  <<-- Oops! Can't find SYS_DATAFILES
                                                                      name);
                                char *remote_path = fil_read_link_file(name);
                                if(dict_path && remote_path) {
                                        if(strcmp(dict_path,remote_path)) {
                                                dict_update_filepath(space_id,
                                                                     remote_path);
                                                }
                                }

Suggested fix:
I looked into recent commit logs, and it should be introduced by fixing #17448389:

$ bzr log -r 5704
------------------------------------------------------------
revno: 5704
committer: Aditya A <aditya.a@oracle.com>
branch nick: mysql-5.6
timestamp: Thu 2013-12-19 16:06:45 +0530
message:
  Bug#17448389 SYS_DATAFILES TABLE IS NOT UPDATED AFTER
                   RECOVERY FOR TABLES WITH .ISL PATH
  PROBLEM
  -------
  If user changes updates the path of .ibd files in
  *.isl files and does a crash recovery then tables
  are loaded from the updated path,but SYS_DATAFILES
  are not updated. Now if the user stops and restarts
  the server in normal mode ,it will detect that the
  there are two different copies of data directory
  (one in isl file and one in SYS_DATAFILES table)
  and both are valid ( we have not deleted the ibd
  file from the original path) and refuse to load the
  table and ask user to resolve the conflict.
  FIX
  ---
  Update the SYS_DATFILES to reflect the updated path
  after crash recovery.
  [Aprroved by Kevin rb#4177]

$ bzr diff -r5703..5704
=== modified file 'storage/innobase/dict/dict0load.cc'
--- storage/innobase/dict/dict0load.cc 2013-10-21 03:56:33 +0000
+++ storage/innobase/dict/dict0load.cc 2013-12-19 10:36:45 +0000
@@ -1092,10 +1092,34 @@
                case DICT_CHECK_ALL_LOADED:
                        /* All tablespaces should have been found in
                        fil_load_single_table_tablespaces(). */
-
- fil_space_for_table_exists_in_mem(
+ if (fil_space_for_table_exists_in_mem(
                                space_id, name, TRUE, !(is_temp || discarded),
- false, NULL, 0);
+ false, NULL, 0)
+ && !(is_temp || discarded)) {
+ /* If user changes the path of .ibd files in
+ *.isl files before doing crash recovery ,
+ then this leads to inconsistency in
+ SYS_DATAFILES system table because the
+ tables are loaded from the updated path
+ but the SYS_DATAFILES still points to the
+ old path.Therefore after crash recovery
+ update SYS_DATAFILES with the updated path.*/
+ ut_ad(space_id);
+ ut_ad(recv_needed_recovery);
+ char *dict_path = dict_get_first_path(space_id,
+ name);
+ char *remote_path = fil_read_link_file(name);
+ if(dict_path && remote_path) {
+ if(strcmp(dict_path,remote_path)) {
+ dict_update_filepath(space_id,
+ remote_path);
+ }
+ }
+ if(dict_path)
+ mem_free(dict_path);
+ if(remote_path)
+ mem_free(remote_path);
+ }
                        break;
                case DICT_CHECK_SOME_LOADED:

Thank you for the bug report. Please re-open this bug report if after
following the below instructions still you get the same issue. Thanks.

https://dev.mysql.com/doc/refman/5.6/en/upgrading-from-previous-series.html

This is a bug due to in-place upgrade, that is not using dumping/restoring all the data and schema with mysqldump, which is official recommended. 

You can close it, of-course, but most of case we can not afford backup/restore by logical or physical for upgrading, as the huge data for clusters.

What we did is upgrade mysqld (scripts also), and run upgrade scripts, that's all we upgraded thousands of mysql boxes from 5.1 to 5.5. Now, we also want to in-place upgrade from 5.5 to 5.6. As for this bug, it's easy to start up mysql-5.6 just for workaround, and then run mysql-upgrade scripts, and this bug has gone.

Any way, not all users can follow official step-by-step, just like us:)

Thank you for the feedback. Please provide the my.cnf file of 5.5 and 5.6 server. Thanks.

my.cnf is attached, and mysqld is mysql-5.6.16.

It should be exist since mysql-5.6.16, seen from the code analyze. Therefore, it should rather easy to repeat: Run mysql-5.5(for example, 5.5.18), kill it during some load, upgrade mysql, but failed when running mysql-upgrade scripts as mysqld can not start.

Thank you for the feedback.

Bug #77804 marked as duplicate of this