Bug #58787 InnoDB enforcing order of DDL and DML in binary log?
Submitted: 7 Dec 2010 15:18 Modified: 18 Dec 2010 13:50
Reporter: Matthew Lord Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Locking Severity:S1 (Critical)
Version: OS:Any
Assigned to: CPU Architecture:Any
Triage: Triaged: D2 (Serious)

[7 Dec 2010 15:18] Matthew Lord
Description:
(gdb) where
#0 ha_commit_trans (thd=0x42f31d0, all=false) at handler.cc:1089
#1 0x000000000072e5fa in ha_enable_transaction (thd=0x42f31d0, on=true) at handler.cc:3567
#2 0x0000000000754aa4 in mysql_alter_table (thd=0x42f31d0, new_db=0x43674a0 "vamsidb",
new_name=0x4367138 "t1", create_info=0x48c8d2e0, table_list=0x4367170,
alter_info=0x48c8d910, order_num=0, order=0x0, ignore=false) at sql_table.cc:7903

BUT it does not call ht->prepare (which is innobase_xa_prepare) because rw_ha_count seems to be 1. 

It goes thru ha_commit_trans 2nd time as follows. This time ha_info is null and so ht->prepare is not called this time either. 

Breakpoint 1, ha_commit_trans (thd=0x42f31d0, all=true) at handler.cc:1089
1089 THD_TRANS *trans= all ? &thd->transaction.all : &thd->transaction.stmt;
(gdb) where
#0 ha_commit_trans (thd=0x42f31d0, all=true) at handler.cc:1089
#1 0x000000000060eea3 in end_trans (thd=0x42f31d0, completion=COMMIT) at sql_parse.cc:754
#2 0x000000000072e60a in ha_enable_transaction (thd=0x42f31d0, on=true) at handler.cc:3568
#3 0x0000000000754aa4 in mysql_alter_table (thd=0x42f31d0, new_db=0x43674a0 "vamsidb",
new_name=0x4367138 "t1", create_info=0x48c8d2e0, table_list=0x4367170,
alter_info=0x48c8d910, order_num=0, order=0x0, ignore=false) at sql_table.cc:7903

So does it mean that mysql+innodb does NOT enforce that innodb DDL and DML (on different tables) go in the same order in transaction log and binlog? 

How to repeat:
Put a breakpoint in innobase_xa_prepare and do some ddl (alter table add column would be good) on innodb table. 

Suggested fix:
N/A
[7 Dec 2010 17:01] Inaam Rana
(11:55:10 AM) inaam1: Heikki: question about xa and DDL
(11:55:27 AM) Heikki: Please ask
(11:55:42 AM) inaam1: when running a DDL we don't do innobase_xa_prepare()
(11:56:10 AM) Heikki: Hmm... I think we should do
(11:56:14 AM) inaam1: how do we then ensure ordering of commit between binlog and redo (or does it not matter in case of DDL)
(11:56:20 AM) inaam1: http://bugs.mysql.com/bug.php?id=58787
(11:56:38 AM) Heikki: It does matter, of course
(11:57:09 AM) Heikki: Since MySQL commits the DDL quickly, the window for errors is short
(11:57:47 AM) inaam1: but it is a bug neverthless
(11:57:48 AM) Heikki: It should enforce the ordering
(11:58:19 AM) inaam1: hmm...so that is something that needs to be fixed in mysql code
(11:58:48 AM) Heikki: Yes, I think MySQL should issue the PREPARE
(12:00:05 PM) inaam1: ok. I'll ask Matt to assign the bug to mysql team that deals with xa trxs
[20 Dec 2010 8:53] Mats Kindahl
I agree with Heikki that this is something that needs to be fixed at the server level not at the storage engine level.
[7 Jun 2011 19:32] Dmitry Lenev
Hello!

Further investigation has shown that we can view ALTER TABLE as
consisting of two phases:

During the first phase we open table being altered, create temporary
table representing new version of the table and copy data to this
table from the old version. This phase ends with enabling back
transactions (which were disabled for duration of copying) which
also does commit. Both stacktraces which you see in the original
report represent exactly this commit. The first call to
ha_trans_commit() is for statement transaction and the second is
for the global transaction. The latter in this case is probably
empty due to autocommit mode being on.

Note that since nothing is written to binary log during this phase
there is only one engine participating in this transaction - InnoDB
(hence rw_ha_count == 1). Because of this there is no need for
two-phase commit and server takes a shortcut by performing one phase
commit.

Since nothing is written to the binary log during this phase/in this
transaction there should be no problem with ordering of binary-log/
redo-log records for other transactions. After all the situation
is no different from use-case when row-based binary logging is used
and we are committing transaction which has changed only temporary
tables.

During the second phase we swap new and old versions of the table,
drop old version and, finally, write ALTER TABLE statement to the
binary log.
Swapping of old and new versions of table is achieved by two
consecutive calls to ha_innobase::rename_table() method.
Each of these calls creates its own transaction on InnoDB data
dictionary which is committed within the same call. Similar thing
happens in ha_innobase::delete_table() call which drops the old
version of table. Finally ALTER TABLE is written to the binary
log but directly and not through transactional cache.

Indeed, the fact that changes to data-dictionary and write to
binary log are non-atomic can cause problems.

For example, crash between update to data-dictionary and write
to binary log can create discrepancy between database state and
binary log. But MySQL's DDL is known not to be crash-safe. So
there is nothing really new here (also I think that this bug is
not about this issue).

Another scenario in which problem due to this non-atomicity can
arise (and in which order of binary and redo logs is important)
is when someone takes a backup using something like
innodbhotbackup concurrently to a DDL statement, and then later
uses this backup + binary log for Point-in-Time-Recovery or for
slave bootstrapping.

I have discussed possible issues with Backup Team and came to
the following conclusions:

1) MySQL Enterprise Backup's mysqlbackup tool should not be
   affected by this problem in default mode. In this mode
   it tries to create a validity point while holding FLUSH
   TABLES WITH READ LOCK lock, so no concurrent DDL is
   possible at the time when critical step of backup is done.
   The same should be true for 3rd-party tools/scripts which
   acquire FTWRL for creation of backup's validity point.
2) MySQL Enterprise Backup's ibbackup tool, mysqlbackup tool
   in --innodb-only mode and possible 3rd-party tools that
   don't acquire FTWRL might be affected by this issue
   (although probability of bad things happenning is fairly
   small).

   Please note that MEB's manual clearly lists inability to
   guarantee correct backups while concurrent DDL is running
   as a known limitation and situation to avoid. See:
   http://dev.mysql.com/doc/mysql-enterprise-backup/3.5/en/bugs.backup.html
   So the latter is known and documented issue.

Since proper fix for this problem is likely to require changing
of our DDL statements to be performed and logged within one
transaction and might involve transition from .FRM-files to
real in-table data-dictionary it will, in my opinion, take
significant amount of time to implement and can't be done in
stable versions of server.