Bug #45961 DDL on partitioned innodb tables leaves data dictionary in an inconsistent state
Submitted: 6 Jul 9:26 Modified: 20 Nov 6:58
Reporter: Shane Bester
Status: Patch approved
Category:Server: InnoDB Severity:S1 (Critical)
Version:5.1.34, 5.1.37, 6.0.12 OS:Any
Assigned to: Jimmy Yang Target Version:5.1+
Triage: Triaged: D1 (Critical)

[6 Jul 9:26] Shane Bester
Description:
When running concurrent DDL, there is a bug that causes InnoDB to complain about existing
files, and subsequently cause a mismatch between the data dictionary and real life.   I
also witnessed a crash once.

090619  9:41:55  InnoDB: Error creating file './test/t1#P#p0#TMP#.ibd'.
090619  9:41:55  InnoDB: Operating system error number 17 in a file operation.
InnoDB: Error number 17 means 'File exists'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
InnoDB: The file already exists though the corresponding table did not
InnoDB: exist in the InnoDB data dictionary. Have you moved InnoDB
InnoDB: .ibd files around without using the SQL commands
InnoDB: DISCARD TABLESPACE and IMPORT TABLESPACE, or did
InnoDB: mysqld crash in the middle of CREATE TABLE? You can
InnoDB: resolve the problem by removing the file './test/t1#P#p0#TMP#.ibd'
InnoDB: under the 'datadir' of MySQL.

On Windows, we get diffferent error message (OS Specific codes):

Version: '5.1.37-debug'  socket: ''  port: 3306  Source distribution
090706  9:03:53  InnoDB: Error creating file '.\test\t1#p#p0#tmp#.ibd'.
090706  9:03:53  InnoDB: Operating system error number 80 in a file operation.
InnoDB: Some operating system error numbers are described at

How to repeat:
Start server with high-performance options for easier repeatability:

--innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=0 --innodb_lock_wait_timeout=1
--innodb_file_per_table

delimiter ;

drop procedure if exists `p1`;
delimiter //
create procedure `p1`(`num` int)
begin
  declare `i` int default '0';
  declare `j` int default '0';
  declare continue handler for sqlexception begin end;
  repeat
   drop table if exists `t1`;
   create table `t1` (`a` char(255) primary key) engine=innodb partition by key ()
partitions 4;
   start transaction;
   set `j`=0;
   repeat
     replace into `t1` values (concat(`j`,`i`));
     set `j`=`j`+1;
   until `j`>=1000 end repeat;
   commit;
   alter table `t1` add partition partitions 1;
   alter table `t1` coalesce partition 1;
   set `i`=`i`+1;
   until `i`>=`num` end repeat;

end //

delimiter ;

call `p1`(10000);  #run this in two connections and observe mysql error log.
[6 Jul 9:42] Shane Bester
if you shutdown the server and restart it and rerun the testcase, innodb dies due to
memory corruption!

Attachment: bug_after_restart_crash_memory_corruption.txt (text/plain), 8.18 KiB.

[6 Jul 11:32] Sveta Smirnova
Thank you for the report.

Verified as described.
[6 Jul 15:31] Shane Bester
note to developers:  it's possible you'll also hit bug #45808 with this testcase..
[10 Jul 16:01] Andrii Nikitin
Verified if the original procedure split into two: first does DDL, second - DML.
If started with no configuration file mysqld crashes after 1 min of running the test.

G:\mysql-5.1.36-winbuild\mysql-advanced-gpl-5.1.36-build\storage\innobase\include\sync0syn
c.ic line 115
000000014033D320    mysqld.exe!row_get_mysql_key_number_for_index()[row0mysql.c:1677]
0000000140334DFB    mysqld.exe!ha_innobase::info()[ha_innodb.cc:6224]
0000000140042917    mysqld.exe!ha_partition::info()[ha_partition.cc:4935]
000000014004ACD7    mysqld.exe!handler::get_dup_key()[handler.cc:2954]
000000014004E05F    mysqld.exe!handler::print_error()[handler.cc:2612]
00000001401AD637    mysqld.exe!mysql_change_partitions()[sql_partition.cc:5102]
00000001401B45E7    mysqld.exe!fast_alter_partition_table()[sql_partition.cc:6354]
000000014020B93B    mysqld.exe!mysql_alter_table()[sql_table.cc:6905]
00000001401A4AC4    mysqld.exe!mysql_execute_command()[sql_parse.cc:2861]
000000014013FBE8    mysqld.exe!sp_instr_stmt::exec_core()[sp_head.cc:2908]
0000000140143395    mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core()[sp_head.cc:2735]
0000000140143596    mysqld.exe!sp_instr_stmt::execute()[sp_head.cc:2851]
0000000140144FC5    mysqld.exe!sp_head::execute()[sp_head.cc:1255]
000000014014638F    mysqld.exe!sp_head::execute_procedure()[sp_head.cc:1983]
00000001401A7144    mysqld.exe!mysql_execute_command()[sql_parse.cc:4359]

If started with 
--innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=0 --innodb_lock_wait_timeout=1
--innodb_file_per_table
then no crash, just following errors appear in log
090710 15:42:26  InnoDB: Error creating file '.\part\t1#p#p0#tmp#.ibd'.
[10 Jul 16:50] Shane Bester
andrii, your last stack trace is reported in bug #45808 . it's garbage pointer deref.  if
you use debug build, it'll make an assertion instead of psuedorandom crash.
[20 Jul 15:25] Andrii Nikitin
1) looks the problem occurs when "Lock wait timeout exceeded" risen by ALTER statement.
2) server restart removes temporary files
3) if ALTER TABLE ADD PARTITION executed again immediately after restart then crash with
stacktrace identical to bug #45808 . 
4) if SELECT * executed first after server restart, then 3) will not crash
5) I tried to manually delete the problem #TMP# files and finished with full table data
loss (except last partition)
6) increasing innodb_lock_wait_timeout will not help in most cases because of bug #37346

e.g. if in original testcase we will wrap 'replace' statement with 'start transaction'
and 'commit;', then problem will occur only with very small innodb_lock_wait_timeout
value.
[6 Aug 14:16] Marko Mäkelä
When investigating Bug #45808, I saw similar error messages when running with
innodb_file_per_table=1. The messages went away with innodb_file_per_table=0.
[10 Aug 15:03] Marko Mäkelä
While trying to repeat this, I got one case of apparent duplicate call of
row_prebuilt_free():

InnoDB: Error: trying to free a corrupt
InnoDB: table handle. Magic n 26423527, magic n2 26423527, table name
"test"."#mysql50#t1#P#p0#TMP#"
InnoDB: Apparent memory corruption: mem dump  len 500; …

This is with my fix for Bug #45808 from [6 Aug 12:39] in place. It looks like the
partition code is calling ha_innobase::close() on an already closed handle.

I was unable to repeat that crash. I got a similar crash from the Bug #45808 test case
before applying my fix.

Also, like with Bug #45808, innodb_file_per_table=1 makes InnoDB spam the log with this
kind of messages:

090810 15:51:47  InnoDB: Error creating file './test/t1#P#p0#TMP#.ibd'.
090810 15:51:47  InnoDB: Operating system error number 17 in a file operation.
InnoDB: Error number 17 means 'File exists'.

With innodb_file_per_table=0, it only spams this:

090810 16:00:47  InnoDB: Error: table `test`.`#mysql50#t1#P#p0#TMP#` already exists in
InnoDB internal
InnoDB: data dictionary. …

Both are signs of inconsistent data dictionaries. I will investigate this further.
[10 Aug 15:06] Marko Mäkelä
Regarding [6 Jul 9:42], I got the “double ha_innobase::close()” error after killing
and restarting the server and the test case. Like in the attached file, I had “Magic n
26423527, magic n2 26423527” in the error message. This is ROW_PREBUILT_FREED,
suggesting that row_prebuilt_free() had already been called on the object.
row_prebuilt_free() was called by ha_innobase::close(), which in turn was called by
ha_partition::close().
[8 Sep 19:50] Jimmy Yang
Part of the root cause for this crash will be addressed in bug #47029. We will address the
error code returning part in this CR, per suggestion from Mikael.

Some email exchange explains the discussion on this issue. Please also refer to bug 47029
for more infomartion. 

Thanks
Jimmy

======================Email Exchange with Mikael ==========================
Hi,

Jimmy Yang wrote:
> Mikael, briefly chatted with Heikki on this, we will change the error return code (in
this particular situation) from DB_DUPLICATE_KEY to generic DB_ERROR (though feel a big
uneasy not returning an real error code to go around the error reporting issue :) ).  
> 
> I'll use #45961 to make the change, and will update the bug description with these
information. 
> 
> For DROP TABLE related issues, I'll need to do some background checks...
> 

Great, thx. I have already pushed your suggested change to the bug tree
so should soon appear in the 5.1 tree.

Rgrds Mikael

Hi,
If you want to hang your error handling bug to another bug you can use
BUG#45961 which is crashing due to the same problem and stack trace is
very similar.

I have also noted that there is a few bugs related to the
problem that InnoDB does a certain amount of DROP TABLE processing in
the background after the SQL query DROP TABLE already have been
processed. It seems that Partitioning management escalates this
problem since it is very common practice to reuse partition names
it seems.

See BUG#45961, BUG#43729.

I don't know if you already have a plan for fixing this issue.
I gather you have.
Rgrds Mikael

Jimmy Yang wrote:
> Thanks, Mikael.
> 
>> I gather a good work division is that I handle
>> the below patch and that you handle the error handling.
> 
> Yes, Mikael, for error handling, I will do some
> digging and see how we go from there.
> 
> 
> 
> Thanks
> Jimmy
========================================================
[9 Sep 5:25] Jimmy Yang
Here are some more debugging information on the problem:

The prebuilt structure is the one that got freed and thus triggerred the SIGSEGV when
mysql_rename_partitions() failed with dup key error and get_dup_key() got called etc.

Program received signal SIGSEGV, Segmentation fault.
0x00000000008a1949 in row_update_prebuilt_trx (prebuilt=0x2aaaaad96cb8,
    trx=0x2aaaaad8e0b8) at row/row0mysql.c:787
787                     ut_print_name(stderr, NULL, TRUE, prebuilt->table->name);

(gdb) print prebuilt->table->name
Cannot access memory at address 0x1a

However, the prebuilt structure is freed just a couple of lines before
mysql_rename_partitions() is called in fast_alter_partition_table(), following is the
call stack:

row_prebuilt_free
ha_innobase::close 
ha_partition::close
close_handle_and_leave_table_as_lock 
close_data_files_and_morph_locks 
alter_close_tables           <== Table closed right before rename tbl.
fast_alter_partition_table ( sql_partition.cc:6380)
mysql_alter_table

mysql_rename_partitions() is called right after alter_close_tables() in the same if ()
statement at sql_partition.cc:6380. If alter these two sequence, this SIGSEGV due to
freed "prebuilt" could go away, but it will still fail because some of the transaction
information for this operation has gone away.

My general suggestion for this is that we should try to avoid re-entering /re-access
innodb code from mysql layer for error printing. It will be good all the innodb error
being handled locally right when error happens, and information are available. And use
error push, to push the error message up.

What I am going to do for this bug is still following what Mikael/Heikki suggested and
change the return code from DB_DUPLICATE_KEY to DB_ERROR for table rename. But use
push_warning_printf() to push some error information up.

Thanks
Jimmy
[3 Nov 12:04] Mattias Jonsson
Changing to Category: InnoDB.

Jimmy, if you need some change in the partitioning area, please contact me or open a new
bug. As I understand this bug, it is about error handling in InnoDB (as noted in
bug#45808).
[20 Nov 3:47] Jimmy Yang
Please note, the problem for the bug could well be fixed in bug #47029. The current fix is
just to avoid the SIGSEGV in the error handling situation.
[20 Nov 6:58] Jimmy Yang
Fix checked in. Again, this is to prevent a future server crash if a similar error (a
failure to drop temp table causing a dup key error) happens. Please refer to bug #47029
for real fix of the problem.