MySQL Bugs: #47343: InnoDB fails to clean-up after lock wait timeout on REORGANIZE PARTITION

Bug #47343	InnoDB fails to clean-up after lock wait timeout on REORGANIZE PARTITION
Submitted:	16 Sep 2009 1:23	Modified:	20 Jun 2010 22:54
Reporter:	Lachlan Mulcahy	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Partitions	Severity:	S1 (Critical)
Version:	5.1.38	OS:	Any
Assigned to:	Mattias Jonsson	CPU Architecture:	Any
Tags:	crash, innodb, InnoDB plugin, partition, partitioning

Description:
If an ALTER TABLE ... REORGANIZE PARTITION.. operation on an InnoDB table fails due to innodb_lock_wait_timeout expiring while waiting for a lock, InnoDB does not clean-up the temporary files/tables it created.

An attempt to reissue the ALTER operation after the timeout gives different results depending on whether innodb-file-per-table is in use:

* With file per table error -1 from storage engine is returned.
* Without file per table, the server crashes.

How to repeat:
I have been able to repeat this on 5.1.38 enterprise advanced binary using both the built-in InnoDB and bundled InnoDB beta plug-in.

Connection #1:

USE test;

CREATE TABLE t1 (
	id INT AUTO_INCREMENT NOT NULL,
	name CHAR(50) NOT NULL,
	myDate DATE NOT NULL,
	PRIMARY KEY (id, myDate),
	INDEX idx_date (myDate)
) ENGINE=InnoDB
PARTITION BY RANGE ( TO_DAYS(myDate) ) (
	PARTITION p0 VALUES LESS THAN (734028), # Sep 13
	PARTITION p1 VALUES LESS THAN (734029), # Sep 14
	PARTITION p2 VALUES LESS THAN (734030), # Sep 15
	PARTITION p3 VALUES LESS THAN MAXVALUE
) ;

INSERT INTO t1 VALUES 
	(NULL, 'Lachlan', '2009-09-13'),
	(NULL, 'Clint', '2009-09-13'),
	(NULL, 'John', '2009-09-14'),
	(NULL, 'Dave', '2009-09-14'),
	(NULL, 'Jeremy', '2009-09-15'),
	(NULL, 'Scott', '2009-09-15'),
	(NULL, 'Jeff', '2009-09-16'),
	(NULL, 'Joe', '2009-09-16');

SET AUTOCOMMIT=0;
SELECT * FROM t1 WHERE id IN (7,8) FOR UPDATE;

Connection #2:

USE test;
ALTER TABLE t1 REORGANIZE PARTITION p3 INTO (
	PARTITION p3 VALUES LESS THAN (734031),
	PARTITION p4 VALUES LESS THAN MAXVALUE
);
# Wait 50 seconds and observe lock wait timeout error
# ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

# Re-issue the ALTER...
ALTER TABLE t1 REORGANIZE PARTITION p3 INTO ( PARTITION p3 VALUES LESS THAN (734031), PARTITION p4 VALUES LESS THAN MAXVALUE );

If innodb-file-per-table was set:
ERROR 1030 (HY000): Got error -1 from storage engine

Contents of the 'test' database directory:

#sql-t1.frm		t1#P#p2.ibd		t1.frm
#sql-t1.par		t1#P#p3#TMP#.ibd	t1.par
t1#P#p0.ibd		t1#P#p3.ibd
t1#P#p1.ibd		t1#P#p4#TMP#.ibd

If innodb-file-per-table was *not* set:
ERROR 2006 (HY000): MySQL server has gone away

Contents of the 'test' database directory:

#sql-t1.frm	t1.frm
#sql-t1.par	t1.par

Error log looks like:

090915 18:20:23  InnoDB: Error: table `test`.090915 18:20:23 [ERROR] Invalid (ol
d?) table or database name 't1#P#p3#TMP#'
`#mysql50#t1#P#p3#TMP#` already exists in InnoDB internal
InnoDB: data dictionary. Have you deleted the .frm file
InnoDB: and not used DROP TABLE? Have you used DROP DATABASE
InnoDB: for InnoDB tables in MySQL version <= 3.23.43?
InnoDB: See the Restrictions section of the InnoDB manual.
InnoDB: You can drop the orphaned table inside InnoDB by
InnoDB: creating an InnoDB table with the same name in another
InnoDB: database and copying the .frm file to the current database.
InnoDB: Then MySQL thinks the table exists, and DROP TABLE will
InnoDB: succeed.
InnoDB: You can look for further help from
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html
090915 18:20:23  InnoDB: Assertion failure in thread 4348866560 in file row/row0
mysql.c line 1682
InnoDB: Failing assertion: index
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
090915 18:20:23 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=3
max_threads=151
threads_connected=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338286 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x400000   mysqld                              0x00000000002c8176 my_print_stacktrace +
 57^@1   mysqld                              0x00000000000916ec handle_segfault + 548
^@
2   libSystem.B.dylib                   0x00000000837ae3fa _sigtramp + 26^@
3   ???                                 0x000000005fbff210 0x0 + 1606414864^@
4   mysqld                              0x0000000000092d6e main + 2128^@
5   mysqld                              0x0000000000000ff0 start + 52^@
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
090915 18:20:23 mysqld_safe mysqld restarted

Suggested fix:
Ensure InnoDB correctly cleans up any temporary files or tablespace allocations if an ALTER TABLE... PARTITION REORGANIZE fails due to lock wait timeout (or any other reason).

Lachlan, this is most likely bug #45961 or one of it's friends..

Shane - Thanks.. a lot of those bugs seem to centre around the same kind of error, though with slightly different situations triggering them.

I'd suggest to test this against 5.4.4 with the fix for Bug 47029 to see if it is already addressed.

This bug is due to tmp tables created in the ha_partition::change_partitions() did not get cleaned up when the its last step copy_partitions() failed and exits. Subsequent trials will find the temp tables already exist and fail:

The change partition created the temp table, but in copy_partition step, it hits lock timeout:

int ha_partition::change_partitions
{
        do
        {
		...
	 	prepare_new_partition()  <== Temp table created
                ...
 	} while (++i < no_parts);

	DBUG_RETURN(copy_partitions(copied, deleted)); <== Timeout 
}

The lock timeout happens during its copy_partition operation, stack is:
 
srv_suspend_mysql_thread
row_mysql_handle_errors 
row_search_for_mysql 
ha_innobase::general_fetch
ha_innobase::rnd_next
ha_partition::copy_partitions 
ha_partition::change_partitions
handler::ha_change_partitions 

It returns DB_LOCK_WAIT_TIMEOUT (35), and subsequently translated to HA_ERR_LOCK_WAIT_TIMEOUT:

(gdb) where
#0  row_mysql_handle_errors (new_err=0x46c405a8, trx=0x2aaaaada70b8,
    thr=0x2aaaaada6350, savept=0x0) at row/row0mysql.c:461
...
#4  ha_partition::copy_partitions 

(gdb) print err
$9 = 35
#define DB_LOCK_WAIT_TIMEOUT    35      /* lock wait lasted too long */

The  ha_partition::copy_partitions() did not handle the TIMEOUT error and directly returned, if we cleanup the table before its return, everything runs fine. I tried so in a debugger:

int ha_partition::change_partitions
{
        do
        {
		...
	 	prepare_new_partition()  <== Temp table created
                ...
 	} while (++i < no_parts);

1591	DBUG_RETURN(copy_partitions(copied, deleted)); <== Break before return
1592 }

Breakpoint 1, ha_partition::change_partitions (this=0x16a55da8,
    create_info=0x45714740, path=0x45712bc0 "./test/t1", copied=0x45713190,
    deleted=0x45713198, pack_frm_data=0x0, pack_frm_len=0)
    at ha_partition.cc:1592
1592    }
(gdb)  call new_file_array[4]->ha_delete_table("./test/t1#P#p4#TMP#")
$3 = 0
(gdb)  call new_file_array[3]->ha_delete_table("./test/t1#P#p3#TMP#")
$4 = 0

By destroying the temp table before exits, Server runs fine without repeated trial:

mysql> ALTER TABLE t1 REORGANIZE PARTITION p3 INTO ( PARTITION p3 VALUES LESS THAN (734031), PARTITION p4 VALUES LESS THAN MAXVALUE );
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql> ALTER TABLE t1 REORGANIZE PARTITION p3 INTO ( PARTITION p3 VALUES LESS THAN (734031), PARTITION p4 VALUES LESS THAN MAXVALUE );
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

If change_partition() does not cleanup the table, we will hit the dup key error later, however, it is in a different path with that of in 45961/47029, which we go around the problem by muffle the dup key error in rename table call path. This time, it is raised in the create table call from change_partition(): 

row_ins_duplicate_error_in_clust <== Dup key error
row_ins_index_entry_low
row_ins_index_entry
row_ins_index_entry_step
create_table_def 
ha_innobase::create
handler::ha_create 
ha_partition::prepare_new_partition 
ha_partition::change_partitions

Again, there is signal when mysql_change_partitions() try to handle the dup key error by re-entering the storage code to fetch information. And storage infomation (prebuilt) might not be available / stale:

raise
abort
row_get_mysql_key_number_for_index <== index structure is null
ha_innobase::info
...
handler::get_dup_key
ha_partition::print_error
mysql_change_partitions

(gdb) up
#2  0x00000000008a24ae in row_get_mysql_key_number_for_index (index=0x0)
    at row/row0mysql.c:1682
1682            ut_a(index);
(gdb) print index
$1 = (dict_index_t *) 0x0
(gdb) up
#3  0x000000000083eab9 in ha_innobase::info (this=0x12deeb98, flag=32)
    at handler/ha_innodb.cc:6313
6313                            (dict_index_t*) trx_get_error_info(prebuilt->trx));

So suggested fix is to handle the error from copy_partitions() in ha_partition::change_partitions(), and drop the newly created temp table before exit. For error handling, we still do not recommend a entry to storage layer in that late stage.

Thanks
Jimmy

One statement needs correction:

By destroying the temp table before exits, 
Server runs fine without repeated trials
                 ^^^^^^^
By destroying the temp table before exits, 
Server runs fine with repeated trials 
                 ^^^^                 

Thanks
Jimmy

This is probably bad error handling in ha_partition.cc.

This could be handled in ha_partition::cleanup_new_partitions, by doing copy_partitions first and if error call cleanup_new_partitions before returning from ha_partition::change_partitions

I will also fix the error handling (cleaning up) from bug#46922 in the scope of this bug (only fixing the crash in the other bug, since the cleaning up is very similar to this bug).

bug#45961 is marked as a dup of this one.

hackish script to dump ddl_log.log file (not really complete...)

Attachment: dump_ddl_log.pl (text/plain), 5.09 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97290

3316 Mattias Jonsson	2010-01-18
      Bug#47343: InnoDB fails to clean-up after lock wait timeout on
                 REORGANIZE PARTITION
      
      There were several problems which lead to this this,
      all related to bad error handling.
      
      1) There was several bugs preventing the ddl-log to be used for
         cleaning up created files on error.
      
      2) The error handling after the copy partition rows did not close
         and unlock the tables, resulting in deletion of partitions
         which were in use, which lead InnoDB to put the partition to
         drop in a background queue.
     @ sql/ha_partition.cc
        Bug#47343: InnoDB fails to clean-up after lock wait timeout on
                   REORGANIZE PARTITION
        
        Better error handling, if partition has been created/opened/locked
        then make sure it is unlocked and closed before returning error.
        The delete of the newly created partition is handled by the ddl-log.
     @ sql/sql_parse.cc
        Bug#47343: InnoDB fails to clean-up after lock wait timeout on
                   REORGANIZE PARTITION
        
        Fix a bug found when experimenting, thd could really be NULL here,
        as mentioned in the function header.
     @ sql/sql_partition.cc
        Bug#47343: InnoDB fails to clean-up after lock wait timeout on
                   REORGANIZE PARTITION
        
        Used the correct .frm shadow name to put into the ddl-log.
        Really use the ddl-log to handle errors.
     @ sql/sql_table.cc
        Bug#47343: InnoDB fails to clean-up after lock wait timeout on
                   REORGANIZE PARTITION
        
        Fixes of the ddl-log when used as error recovery (no crash).
        When executing an entry from memory (not read from disk)
        the name_len was not set correctly.

Marked bug#45808 as a duplicate of this bug (and verified that the proposed patch resolves the issue).

Pushed into mysql-5.1-bugteam

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97404

3325 Mattias Jonsson	2010-01-19
      post-push patch for bug#47343.
      
      Missing ha_rnd_end in copy_partitions, found due to a
      DBUG_ASSERT in mysql-pe
     @ sql/ha_partition.cc
        Post-push patch for bug#47343
        
        Must call ha_rnd_end since ha_rnd_init has been called.

pushed to mysql-5.1-bugteam and mysql-pe.

There was one more missing command in copy_partitions, it did not end the rnd call
if it encountered an error. This is also included in the push.

Bug#50561 is related, as it causes the same type of errors, but is not a duplicate of this (that bug is about I_S which ignores name locks which partitioning currently uses, when it really should also hold LOCK_open).

Pushed into 5.1.44 (revid:joro@sun.com-20100204101444-2j32mhqroo0iiio6) (version source revid:mattias.jonsson@sun.com-20100119160251-0xvcgzw0y08xwk6r) (merge vers: 5.1.43) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100127214222-m77psnbrphabtmj0) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)

Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alexey.kopytov@sun.com-20100123210923-lx4o1ettww9fdkqk) (merge vers: 5.5.2-m2) (pib:16)

Documented in the 5.1.44, 5.5.2, 5.6.0, and 6.0.14 changelogs, as follows:

        When an ALTER TABLE ... REORGANIZE PARTITION statement on an
        InnoDB table failed due to innodb_lock_wait_timeout expiring
        while waiting for a lock, InnoDB did not clean up any temporary
        files or tables which it had created. Attempting to reissue the
        ALTER statement after the timeout could lead to storage engine
        errors or possibly a crash of the server.

Closed.

The test case relies on the presence of Bug#37346 "innodb does not 
detect deadlock between update and alter table" and can not work in next-4284,
where this bug is fixed. The test case needs to be rewritten.

When rewriting the test case, if you still need innodb-lock-wait-timeout=1 master option, please consider adding the test case to innodb_lock_wait_timeout_1.test.

Putting the back to 'Closed' to not confuse matters with Showstopper flag.
The failing test case is not a showstopper.

I will try to fix the test case withing bug#50561 and probably add a DBUG_EXECUTE_IF to return failure instead to simulate error in the storage engine.

5.6.0 changelog entry unneeded.

Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)

Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)

Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)

No additional changelog entries required. Returning to Closed state.

Noted in 5.1.43sp1 changelog.

Pushed into 5.1.47 (revid:build@mysql.com-20100412101220-adgau1r18kwgthps) (version source revid:build@mysql.com-20100412101220-adgau1r18kwgthps) (merge vers: 5.1.47) (pib:16)

Fix is already in 5.1 releases, so no new changelog entries needed.

Returning to Closed state.

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:alik@sun.com-20100422150658-fkhgnwwkyugtxrmu) (merge vers: 6.0.14-alpha) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:alexey.kopytov@sun.com-20100429203306-tg0wz4y2xyx8edrl) (merge vers: 5.5.5-m3) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)