Bug #41609 Crash recovery does not work for InnoDB temporary tables
Submitted: 19 Dec 2008 0:00 Modified: 7 Jul 8:28
Reporter: Mark Callaghan
Status: Verified
Category:Server: InnoDB Severity:S2 (Serious)
Version:5.0.67,5.1.30 OS:Any
Assigned to: Marko Mäkelä Target Version:5.1+
Tags: innodb, temporary, table, ibbackup, recovery, create
Triage: Triaged: D3 (Medium) / R2 (Low) / E2 (Low)

[19 Dec 2008 0:00] Mark Callaghan
Description:
I don't think that InnoDB crash recovery code doesn't work for temporary tables when
innodb_file_per_table and datadir are set. I think that ibbackup has the same problem,
but I don't expect you to fix that.

We noticed the problem when ibbackup restores failed with error messages like:
InnoDB: Doing recovery: scanned up to log sequence number 77 2578521600
081218  8:11:09  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: File name /var/lib/mysql//export/hda3/tmp/#sql14c7_a9fa97_0.ibd
InnoDB: File operation call: 'create'.
InnoDB: Cannot continue operation.

The pathname is odd: /var/lib/mysql//export/hda3/tmp/#sql14c7_a9fa97_0.ibd
'/var/lib/mysql/' == datadir from my.cnf
'/export/hda3/tmp/#sql14c7_a9fa97_0.ibd' is the pathname for the temp table.

The problem appears to be this code in fil_op_log_parse_or_replay that is run to replay
file creation operations. The third argument in the call to
fil_create_new_single_table_tablespace is FALSE. That is the 'is_temp' arg, so recovery
is always done as if the file is for a non-temporary table. Unfortunately,
fil_create_new_single_table_tablespace calls 
fil_make_ibd_name and passes on the value of 'is_temp'

        } else {
                ut_a(type == MLOG_FILE_CREATE);

                if (fil_tablespace_exists_in_mem(space_id)) {
                        /* Do nothing */
                } else if (fil_get_space_id_for_table(name) !=
                                                        ULINT_UNDEFINED) {
                        /* Do nothing */
                } else {
                        /* Create the database directory for name, if it does
                        not exist yet */
                        fil_create_directory_for_tablename(name);

                        ut_a(space_id != 0);

                        ut_a(DB_SUCCESS ==
                                fil_create_new_single_table_tablespace(
                                                &space_id, name, FALSE,
                                                FIL_IBD_FILE_INITIAL_SIZE));

And file_make_ibd_name has this code that will prepend the value of datadir to the
pathname written to the log entry which explains the pathname I listed above. I think
this is also a problem for ibbackup and the storage engine. The default value of
fil_path_to_mysql_datadir is '.' and can be changed by datadir. For
http://bugs.mysql.com/bug.php?id=20867 the bad pathname starts with './tmp/'

        if (is_temp) {
                memcpy(filename, name, namelen);
                memcpy(filename + namelen, ".ibd", sizeof ".ibd");
        } else {
                memcpy(filename, fil_path_to_mysql_datadir, dirlen);
                filename[dirlen] = '/';

                memcpy(filename + dirlen + 1, name, namelen);
                memcpy(filename + dirlen + namelen + 1, ".ibd", sizeof ".ibd");
        }

Some of this is evident in 
http://bugs.mysql.com/bug.php?id=20867. Look at the pathname for the tmp table. It starts
with '.':
InnoDB: open the tablespace file './tmp/@0023sql14dc_1607_0.ibd

How to repeat:
It is easy to reproduce this with ibbackup. While ibbackup is running, create InnoDB
temporary tables. It is harder to reproduce without ibbackup. I create an InnoDB temp
table and then kill mysqld. During crash recovery an error message is printed with the
incorrect pathname for the temp table (starts with '.')

datadir and innodb_file_per_table should be set to reproduce this.

But I think that someone from InnoDB can save you some time because the problem looks
obvious to me from reading the code.

Suggested fix:
Crash recovery code must determine whether the MLOG_FILE_CREATE log entry is for a
temporary or non-temporary table.
[19 Dec 2008 0:31] Gary Pendergast
Crash recovery problem confirmed on:
5.0.74
5.1.30
[19 Dec 2008 1:06] Mark Callaghan
Actually,the workaround is to use skip_innodb_file_per_table
[19 Dec 2008 17:13] Michael Izioumtchenko
Assigning to Inaam tentatively. 
Note that the hotbackup issues is known and I think there's a bug number
for that. The workaround for HotBackup should be creating the directory
with the long strange looking name, before the restore, e.g.

/var/lib/mysql/export/hda3/tmp

status 'Need Feedback': could you provide us with the exact steps to reproduce
this without involving HotBackup? My impression is that if we 
just restart the server everything should work since the directories names
are all the same.
[19 Dec 2008 18:11] Mark Callaghan
To reproduce:
* configure server with innodb_file_per_table and datadir set
* execute sql 'create temporary table tt(i int) engine=innodb; commit'
* kill mysqld
* look at error log

When innodb_file_per_table is set there will be error messages in the error log. When it
is not set there will not be error messages.

But the more serious problem is to construct a case where the MLOG_FILE_CREATE log entry
is replayed during crash recovery. If you can construct that case, then I think the
server is dead as it will crash on an assert failure in file_op_log_parse_or_replay()
when file_create_new_single_table_tablespace() fails.

Alas, I was not able to construct that case using the test above. 

Creating missing directories does not work as the bogus names have an extra '/'
/var/lib/mysql//export/hda3/...
[19 Dec 2008 19:44] Michael Izioumtchenko
How does an extra // hurt? It's supposed to be treated as a single
/ by anything that work with files.
[19 Dec 2008 19:56] Mark Callaghan
It doesn't hurt. That was a mistake by me. Your workaround works.
[22 Dec 2008 11:04] Marko Mäkelä
The double // or \\ could have a special meaning on Microsoft Windows. On Unix-like
systems, multiple successive / should not hurt.
[5 Mar 14:07] Heikki Tuuri
Pekka, ibbackup should skip .ibd files it does not find.
--Heikki
[5 Mar 14:08] Heikki Tuuri
...and InnoDB itself should skip .ibd files it does not find.
[5 Mar 14:09] Heikki Tuuri
Workaround: use MyISAM tables as temp tables.
[5 Mar 14:17] Heikki Tuuri
Marko wants to take care of this.
[28 Apr 13:42] Marko Mäkelä
It turns out that we can pass the is_temp flag in the page_no parameter to
mlog_write_initial_log_record_for_file_op(). Luckily, it looks like the page_no has been
passed as 0 ever since MySQL 4.1. Old versions of InnoDB would ignore the page_no
parameter, and new ones would assign a bit of page_no to the is_temp flag.
[29 Apr 13:27] Marko Mäkelä
As far as I can tell, the redo log entries for file operations are never replayed in
InnoDB crash recovery. They are replayed in InnoDB Hot Backup, and in ibbackup
--apply-log, it makes sense to skip the replaying when the is_temp flag is set.

The error messages about lost *.ibd files for temporary tables are a different matter. I
haven't investigated the fix yet. One possibility would be to store the is_temp flag in
the InnoDB data dictionary, if we can find a good way of doing it. At least for
ROW_FORMAT=COMPACT (the default format), it should be doable. Old versions of InnoDB
didn't initialize unused fields to 0, and fields that could contain random garbage cannot
be used for new flags.
[14 Jul 17:29] Heikki Tuuri
The reason why this works in most MySQL installations may be that the temp dir is normally
/tmp. Then InnoDB replays temp table operations in a 'tmp' subdir that it creates under
the MySQL datadir.

That is also a workaround: set /tmp as your temp dir for mysqld.

The root cause of these temp table recovery problems is that MySQL puts temp tables to a
temp dir of the computer, not to a database directory. Backup programs have trouble
finding the temp tables from the temp dir of the computer.
[3 Sep 15:42] Marko Mäkelä
The fix that I described on [28 Apr 13:42] is included in InnoDB Plugin 1.0.4. Part of
this bug remains unfixed. Quoting my comment from [29 Apr 13:27]:

The error messages about lost *.ibd files for temporary tables are a different matter. I
haven't investigated the fix yet. One possibility would be to store the is_temp flag in
the InnoDB data dictionary, if we can find a good way of doing it. At least for
ROW_FORMAT=COMPACT (the default format), it should be doable. Old versions of InnoDB
didn't initialize unused fields to 0, and fields that could contain random garbage cannot
be used for new flags.