Bug #62146 Tables are lost for DDL and different behaviors for alter table failed situation
Submitted: 11 Aug 2011 9:54 Modified: 22 Feb 2012 23:28
Reporter: liu hickey (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S2 (Serious)
Version:5.1.48 (1.0.9), 5.6.3 (1.2.3) OS:Linux (2.6.18-164.el5)
Assigned to: CPU Architecture:Any
Tags: ddl; lost table; fil_rename_tablespace; timeout, regression

[11 Aug 2011 9:54] liu hickey
Description:
summary:

Recently we hit lost tables during DDL for online products, and after some observersion, we found some interesting hehaviors if fil_rename_tablespace failed as retry > 25000. 
That might be another issue that not discussed here.

THERE MUST BE SINGLE TABLESPACE

How to repeat:
Here is the steps to show the behavior:

1. compile 5.1.48 source code.

CFLAGS="-O0 -g" CXX=gcc CXXFLAGS="-O0 -g -felide-constructors \
-fno-exceptions -fno-rtti"

export CFLAGS CXXFLAGS

./configure --with-plugins=innodb_plugin

make -j32

2. debug with plugin innodb
2.1 # attach a running pid
(gdb) b fil_rename_tablespace
Breakpoint 1 at 0x2aaaaabfed30: file fil/fil0fil.c, line 2435.
(gdb) c
Continuing.
[Switching to Thread 0x4944a940 (LWP 7069)]

Breakpoint 1, fil_rename_tablespace (old_name=0xeacdc20 "sbtest/sb1", id=51, new_name=0xea96fa0 "sbtest/#sql2-1b8b-1") at fil/fil0fil.c:2435
2435    {
(gdb) c
Continuing.

Breakpoint 1, fil_rename_tablespace (old_name=0xeabb3f0 "sbtest/#sql-1b8b_1", id=52, new_name=0xea96fa0 "sbtest/sb1") at fil/fil0fil.c:2435
2435    {
(gdb) return 0
Make fil_rename_tablespace return now? (y or n) y
#0  0x00002aaaaabea8c4 in dict_table_rename_in_cache (table=0xeabb288, new_name=0xea96fa0 "sbtest/sb1", 
    rename_also_foreigns=1) at dict/dict0dict.c:951
951                     } else if (!fil_rename_tablespace(old_name, table->space,
(gdb) 

(gdb) l
946                             fputs(" (", stderr);
947                             ut_print_filename(stderr,
948                                               table->dir_path_of_temp_table);
949                             fputs(" )\n", stderr);
950                             return(FALSE);
951                     } else if (!fil_rename_tablespace(old_name, table->space,
952                                                       new_name)) {
953                             return(FALSE);
954                     }
955             }
(gdb) finish
Run till exit from #0  0x00002aaaaabea8c4 in dict_table_rename_in_cache (table=0xeabb288, new_name=0xea96fa0 "sbtest/sb1", rename_also_foreigns=1)
    at dict/dict0dict.c:951
0x00002aaaaac6527e in row_rename_table_for_mysql (old_name=0xea97000 "sbtest/#sql-1b8b_1", new_name=0xea96fa0 "sbtest/sb1", trx=0xeacf728, commit=1)
    at row/row0mysql.c:3905
3905                    if (!dict_table_rename_in_cache(table, new_name,
Value returned is $1 = 0
(gdb) l
3900                    trx->error_state = DB_SUCCESS;
3901            } else {
3902                    /* The following call will also rename the .ibd data file if
3903                    the table is stored in a single-table tablespace */
3904    
3905                    if (!dict_table_rename_in_cache(table, new_name,
3906                                                    !new_is_tmp)) {
3907                            trx->error_state = DB_SUCCESS;
3908                            trx_general_rollback_for_mysql(trx, NULL);
3909                            trx->error_state = DB_SUCCESS;
(gdb) l
3910                            goto funct_exit;
3911                    }
3912    
3913                    /* We only want to switch off some of the type checking in
3914                    an ALTER, not in a RENAME. */
3915    
3916                    err = dict_load_foreigns(
3917                            new_name, !old_is_tmp || trx->check_foreigns);
3918    
3919                    if (err != DB_SUCCESS) {
(gdb) finish
Run till exit from #0  0x00002aaaaac6527e in row_rename_table_for_mysql (old_name=0xea97000 "sbtest/#sql-1b8b_1", new_name=0xea96fa0 "sbtest/sb1", 
    trx=0xeacf728, commit=1) at row/row0mysql.c:3905
0x00002aaaaac14e0b in ha_innodb::rename_table () from /u01/mysql-5.1.48/storage/innodb_plugin/.libs/ha_innodb_plugin.so
Value returned is $2 = 10

2.2 here is the key routines:
storage/innodb_plugin/row/row0mysql.c: row_rename_table_for_mysql
3905                 if (!dict_table_rename_in_cache(table, new_name,
3906                                                 !new_is_tmp)) {
3907                         trx->error_state = DB_SUCCESS;
3908                         trx_general_rollback_for_mysql(trx, NULL);
3909                         trx->error_state = DB_SUCCESS;
3910                         goto funct_exit;
3911                 }

2.3 client's output:
root@sbtest 05:00:24>alter table sb1 add column d2 int;
Query OK, 0 rows affected (9 min 47.97 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@sbtest 05:11:26>desc sb1;
ERROR 1146 (42S02): Table 'sbtest.sb1' doesn't exist

Suggested fix:

3. different code between innodb and plugin

3.1 innodb add  err = DB_ERROR; before goto funct_exit; when dict_table_rename_in_cache failed.

3.2 test innodb work and result could be refered to modified innodb plugin
just like innobase behavior, add  err = DB_ERROR; before exit

3905                 if (!dict_table_rename_in_cache(table, new_name,
3906                                                 !new_is_tmp)) {
3907                         trx->error_state = DB_SUCCESS;
3908                         trx_general_rollback_for_mysql(trx, NULL);
3909                         trx->error_state = DB_SUCCESS;
3910                         err = DB_ERROR;
3911                         goto funct_exit;
3912                 }

3.3 re-debug with modified innodb plugin

Breakpoint 1, fil_rename_tablespace (old_name=0x15b19620 "sbtest/sb2", id=53, new_name=0x15b26250 "sbtest/#sql2-1eb9-1") at fil/fil0fil.c:2435
2435    {
(gdb) c
Continuing.

Breakpoint 1, fil_rename_tablespace (old_name=0x15ae4600 "sbtest/#sql-1eb9_1", id=54, new_name=0x15b26250 "sbtest/sb2") at fil/fil0fil.c:2435
2435    {
(gdb) return 0
Make fil_rename_tablespace return now? (y or n) y
#0  0x00002aaaaabea8c4 in dict_table_rename_in_cache (table=0x15ae4498, new_name=0x15b26250 "sbtest/sb2",
    rename_also_foreigns=1) at dict/dict0dict.c:951
951                     } else if (!fil_rename_tablespace(old_name, table->space,
(gdb) finish
Run till exit from #0  0x00002aaaaabea8c4 in dict_table_rename_in_cache (table=0x15ae4498, new_name=0x15b26250 "sbtest/sb2", rename_also_foreigns=1)
    at dict/dict0dict.c:951
0x00002aaaaac6527e in row_rename_table_for_mysql (old_name=0x15b1a7a0 "sbtest/#sql-1eb9_1", new_name=0x15b26250 "sbtest/sb2", trx=0x15b25998, commit=1)
    at row/row0mysql.c:3905
3905                    if (!dict_table_rename_in_cache(table, new_name,
Value returned is $1 = 0
(gdb) finish
Run till exit from #0  0x00002aaaaac6527e in row_rename_table_for_mysql (old_name=0x15b1a7a0 "sbtest/#sql-1eb9_1", new_name=0x15b26250 "sbtest/sb2",
    trx=0x15b25998, commit=1) at row/row0mysql.c:3905
0x00002aaaaac14e0b in ha_innodb::rename_table () from /u01/mysql-5.1.48/storage/innodb_plugin/.libs/ha_innodb_plugin.so
Value returned is $2 = 11
(gdb) c
Continuing.

Breakpoint 1, fil_rename_tablespace (old_name=0x15b19770 "sbtest/#sql2-1eb9-1", id=53, new_name=0x15b1a7a0 "sbtest/sb2") at fil/fil0fil.c:2435
2435    {
(gdb) c
Continuing.
c

3.4 client's output

root@sbtest 05:15:17>create table sb2 (a int);
Query OK, 0 rows affected (0.01 sec)

root@sbtest 05:15:20>alter table sb2 add column d2 int;
ERROR 1025 (HY000): Error on rename of './sbtest/#sql-1eb9_1' to './sbtest/sb2' (errno: -1)
root@sbtest 05:16:28>
[17 Aug 2011 19:45] Sveta Smirnova
Thank you for the report.

Verified as described.
[22 Feb 2012 23:28] John Russell
Added to changelog for 5.1.61, 5.5.19, 5.6.4:

If an ALTER TABLE statement failed for an InnoDB table due to an
error code from an underlying file-renaming system call, the .ibd
file for the table could be lost. This issue only occurred when the
innodb_file_per_table configuration option was enabled, and when the
low-level error persisted through thousands of retry attempts. In
MySQL 5.1, this issue applied to the InnoDB Plugin but not the
built-in InnoDB storage engine.
[8 Mar 2012 8:07] John Russell
Updated the changelog entry like so, based on feedback from InnoDB development:

If an ALTER TABLE statement failed for an InnoDB table due to an
 error code from an underlying file-renaming system call, InnoDB could
 lose track of the .ibd file for the table. This issue only occurred
 when the innodb_file_per_table configuration option was enabled, and
 when the low-level error persisted through thousands of retry
 attempts. In MySQL 5.1, this issue applied to the InnoDB Plugin but
 not the built-in InnoDB storage engine.

For example, if you encounter an error like the following:

mysql> alter table sb2 add column d2 int;
ERROR 1025 (HY000): Error on rename of ./sbtest/#sql-1eb9_1 to
 ./sbtest/sb2 (errno: -1)

you might be able to access the #sql* table by copying a .frm file from a table with an identical schema.  The table name to use for the .frm file would be `sbtest.#mysql50##sql-1eb9_1` in the preceding example.