Bug #26395 if crash during autocommit update to transactional table on master, slave fails
Submitted: 15 Feb 2007 8:37 Modified: 6 Feb 2008 21:58
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0-bk,5.1-bk OS:Linux (linux)
Assigned to: Sven Sandberg CPU Architecture:Any

[15 Feb 2007 8:37] Guilhem Bichot
Description:
cannot exist in <5.0.
When a statement modifies an innodb table in autocommit mode, like this:
insert into t values(1);
then what is written to the binary log is (I'm using statement-based, but row-based in 5.1 has the same problem) these 2 events:
insert into t values(1); (Query_log_event)
Xid_log_event;
then the data is committed inside innodb.
If the master crashes right before writing the Xid_log_event to disk, the binlog may contain only the INSERT.
At restart, InnoDB will rollback.
Slave will replicate the INSERT, and so the result is that on master the statement has been rolled back while on slave it is executed: on master the table does not contain "1", on slave it does.
If running in non-autocommit mode (type "BEGIN;" before the INSERT), a BEGIN event is written before the INSERT, and so slave will execute:
BEGIN;
INSERT;
then it will be told that the binlog ends, and so will know it has to rollback (Format_description_log_event::exec_event() when it executes the Format_description of the next binlog), and it *can* rollback, because it was in non-autocommmit mode, because there was a BEGIN in the binlog.

How to repeat:
set up replication, but start master with:
--debug=d,half_binlogged_transaction
(you need a debug build for that). This option will simulate a master's crash right before writing the Xid_log_event to disk.
Do on master:
create table t(a int) engine=innodb;
show create table t;
insert into t values(1); # master will crash intentionally here
Restart master. Make sure to "stop slave; start slave" on slave, and you should see that on master: t is empty, and on slave: t contains "1" (I tested).

Suggested fix:
change bool MYSQL_LOG::write(THD *thd, IO_CACHE *cache, Log_event *commit_event)
to always write a "BEGIN" event.
It would also make BUG#22540 much less serious (because the wrong end_log_pos would stop affecting Exec_master_log_pos).
[23 May 2007 17:35] Chuck Bell
I examined this bug using version 5.1.19. When running the test outlined with the planned crash, I encountered a different outcome. It seems the master does indeed complete the insert, but it is the slave that does not. 

When I examined the binlogs after the crash, the master's binlog had the insert and a commit statement in it, but the slave's binlog did not. Even when the slave was restarted, it did not run the insert. 

My suspicion is the planned crash is in the wrong place to duplicate the original bug. It is recommended that this be determined and corrected if necessary. 

I have run the test on both Windows and Linux and confirmed the behavior described above to be consistent across both platforms.
[25 May 2007 12:50] Guilhem Bichot
Ok, there are two problems:
1) in 5.1, some pieces of the logic of half_binlogged_transaction
were removed; until it is restored (I will talk with Mats), my testcase cannot work in 5.1.
2) I tested with 5.0 today and was able to repeat the problem only
after commenting out a line of code. An autocommit statement is stored in the binlog as the statement (here, INSERT) and an Xid_log_event. The logic of half_binlogged_transaction is that it simulates a crash in the middle of writing the statement (the 100 last bytes of the statement, and the Xid_log_event, are intentionally not written to the binlog); so the slave cannot even read the INSERT (it's corrupted in the binlog). To test the present bug, the INSERT has to be readable by slave and so the 100 last bytes must not be chopped (only the Xid_log_event must be).
===== log.cc 1.210 vs edited =====
*** /tmp/bk_log.cc-1.210_Tr5jin 2007-04-27 15:33:46 +02:00
--- edited/log.cc       2007-05-25 14:35:37 +02:00
***************
*** 1868,1874 ****
      if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
        goto err;
      length=my_b_bytes_in_cache(cache);
!     DBUG_EXECUTE_IF("half_binlogged_transaction", length-=100;);
      do
      {
        /* Write data to the binary log file */
--- 1868,1874 ----
      if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
        goto err;
      length=my_b_bytes_in_cache(cache);
! //    DBUG_EXECUTE_IF("half_binlogged_transaction", length-=100;);
      do
      {
        /* Write data to the binary log file */

After that patch, the logic of half_binlogged_transaction is that it
will write the transaction to binlog except the commit event (which is
Xid_log_event in my testcase); in detail, it does all that the normal
code does, except that it skips over writing the commit event to
binlog, and then it aborts the mysqld. This is to imitate what could
happen in real life, where the mysqld would crash between writing
INSERT to binlog and writing Xid_log_event to binlog.
With that, the master crashes in INSERT (so this INSERT will rollback), and in the binlog there is INSERT but no Xid_log_event. Slave however picks INSERT, and so executes it. So slave and master differ.
[29 Aug 2007 16:27] Chuck Bell
The following diff can be used to reproduce the problem in the mysql-5.1-new-rpl clone.

===== log.cc 1.294 vs edited =====
--- 1.294/sql/log.cc    2007-08-29 12:25:51 -04:00
+++ edited/log.cc       2007-08-29 12:22:26 -04:00
@@ -3987,8 +3987,12 @@
       if ((write_error= write_cache(cache, false, false)))
         goto err;

+      DBUG_EXECUTE_IF("half_binlogged_transaction", goto DBUG_skip_commit;);
       if (commit_event && commit_event->write(&log_file))
         goto err;
+#ifndef DBUG_OFF
+DBUG_skip_commit:
+#endif
       if (flush_and_sync())
         goto err;
       DBUG_EXECUTE_IF("half_binlogged_transaction", abort(););
[30 Aug 2007 19:36] Chuck Bell
There are 2 patches. One for 5.0 that fixes the problem and one for 5.1 that corrects a set of test results and adds the debug code to detect the problem. 

See:

5.0 - http://lists.mysql.com/commits/33470
5.1 - http://lists.mysql.com/commits/33243
[25 Oct 2007 10:26] Mats Kindahl
Setting to "in progress" since there is no new patch since my last comments.
[29 Nov 2007 13:56] Lars Thalmann
REFINED PROBLEM DESCRIPTION (RPD)
=================================

An INSERT statement executed on an XA storage engine in auto-commit
mode is executed like this:

1. Engine prepare insert operation
2. INSERT statement is logged in binlog
3. XID is logged in binlog
4. Insert operation is committed in storage engine

If there is master crash between 2 and 3, then the master will
rollback the insert.

But, the slave will receive INSERT in binary log and execute it.

CLARIFICATION ABOUT SEMANTICS
=============================

The semantics of the logged INSERT statement shall be that it is to be
committed if only if there is an XID event.  A logged INSERT statement
does *not* mean that it has been executed on the master when XA is
being used.

If XA is *not* being used and autocommit is on, then the logged INSERT
statement *does* mean that it has been executed on the master.

VISION
======

A) In the future we want each of these event categories to be clearly
   expressed in the binary log:

   1. Transaction start
   2. Statement start (this includes any event that is used 
      as "seed" for coming events)
   3. Inside statement
   4. Statement end
   5. Transaction commit/autocommitted statement
   6. Transaction rollback

   Probably need 2 start bits: 
   00. No start
   01. Statement start
   11. Transaction start
   
   Probably need 2 end bits: 
   00. No end
   01. Transaction- or autocommit
   10. Rollback
   11. Statement end

B) We probably need to stop to use transactional cache in the future.
   Reasons:

   1) We have both transactional and non-transaction tables, and if a
      dependent non-transactional update happens in the middle of a
      (rolled-back) transaction, it is impossible to get it correctly
      logged if we only log this particular statement.

      So, we need to log the whole transaction and add a rollback at
      end, *if* there is a single dependent non-transactional table
      update in it.

      (Note that in row-based logging, a logged "statement" is never
      dependent on an easlier one.)

   2) For long transactions, the transaction cache can be quite
      filled.

      We would probably like to have interleaving in the future,
      i.e. not have to wait until the whole transaction is executed
      before we start logging it.

POSSIBLE SOLUTIONS
==================
1. Always use autocommit off on slave.

   Negatives:
   - Rollbacks will get wider scope on slave than master.

2. Add BEGIN whenever there is a XA table involved in an autocommitted
   statement (Chuck's patch).

   Negatives:
   - An extra event in the binlog.

3. Make sure INSERT is stored in transaction cache and do not flush
   transaction cache until the transaction is committed (step 3 in
   RPD above).

   Negatives:
   - Not in line with future vision (B).

DECISION
========
Use solution 2 for now and add the bits for transaction/start end in
later version (WL#4162).

Sven to do
----------
1. Write test case to show that the bug happens
2. Write test case to show that non-xa engines are not affected
   by the patch (e.g. NDB)
3. Review Chuck's patch and get other reviews
[13 Dec 2007 18:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/39927

ChangeSet@1.2651, 2007-12-13 19:56:17+01:00, sven@riska.(none) +21 -0
  BUG#26395: if crash during autocommit update to transactional table on master, slave fails
  Now, every transaction (including autocommit transactions) start with
  a BEGIN and end with a COMMIT/ROLLBACK in the binlog.
  Added a test case, and updated lots of test case result files.
[14 Dec 2007 13:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/39990

ChangeSet@1.2654, 2007-12-14 14:40:45+01:00, sven@riska.(none) +21 -0
  BUG#26395: if crash during autocommit update to transactional table on master, slave fails
  Now, every transaction (including autocommit transactions) starts with
  a BEGIN and ends with a COMMIT/ROLLBACK in the binlog.
  Added a test case, and updated lots of test case result files.
[18 Dec 2007 16:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40141

ChangeSet@1.2558, 2007-12-18 17:08:28+01:00, sven@riska.(none) +11 -0
  BUG#26395: if crash during autocommit update to transactional table on master, slave fails
  Now, every transaction (including autocommit transactions) start with
  a BEGIN and end with a COMMIT/ROLLBACK in the binlog.
  Added a test case, and updated lots of test case result files.
[21 Dec 2007 19:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40354

ChangeSet@1.2560, 2007-12-21 20:30:23+01:00, sven@riska.(none) +10 -0
  BUG#26395: if crash during autocommit update to transactional table on master, slave fails
  Now, every transaction (including autocommit transactions) start with
  a BEGIN and end with a COMMIT/ROLLBACK in the binlog.
  Added a test case, and updated lots of test case result files.
[5 Feb 2008 12:46] Bugs System
Pushed into 5.0.56
[5 Feb 2008 13:04] Bugs System
Pushed into 5.1.24-rc
[5 Feb 2008 13:08] Bugs System
Pushed into 6.0.5-alpha
[6 Feb 2008 21:58] Jon Stephens
Bugfix documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:

        When the master crashed during an update on a transactional
        table while in AUTOCOMMIT mode, the slave failed. This fix
        causes every transaction (including AUTOCOMMIT transactions) to
        be recorded in the binlog as starting with a BEGIN and ending
        with a COMMIT or ROLLBACK.
[6 Mar 2008 5:41] Jon Stephens
Also documented for 5.1.23-ndb-6.2.14.
[30 Mar 2008 0:27] Jon Stephens
Also documented for 5.1.23-ndb-6.3.11.