Bug #55375 Transaction bigger than max_binlog_cache_size crashes slave
Submitted: 19 Jul 2010 18:35 Modified: 15 Nov 2010 13:20
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.1 OS:Any
Assigned to: Libing Song CPU Architecture:Any

[19 Jul 2010 18:35] Sven Sandberg
Description:
When the slave executes a transaction that is bigger than the limit determined by MAX_BINLOG_TRANSACTION_SIZE, then it coredumps.

(The actual limit is much bigger than MAX_BINLOG_TRANSACTION_SIZE. This is probably another bug.)

How to repeat:
--source include/have_binlog_format_statement.inc
--source include/have_innodb.inc
--source include/master-slave.inc
CREATE TABLE t1 (a VARCHAR(100)) ENGINE = INNODB;
--sync_slave_with_master
SET GLOBAL MAX_BINLOG_CACHE_SIZE = 4096;
STOP SLAVE;
START SLAVE;
--connection master
--let $n= 380
BEGIN;
while ($n) {
  INSERT INTO t1 VALUES ('32byte');
  --dec $n
}
COMMIT;
--sync_slave_with_master
[19 Jul 2010 19:28] Sven Sandberg
See also BUG#55377.
[20 Jul 2010 9:40] Alfranio Junior
The error happens because of the following assertion in log.cc:

if (mysql_bin_log.check_write_error(thd))
{
  /*
    "all == true" means that a "rollback statement" triggered the error and
    this function was called. However, this must not happen as a rollback
    is written directly to the binary log. And in auto-commit mode, a single
    statement that is rolled back has the flag all == false.
  */
  ---> DBUG_ASSERT(!all); <---

This means that a multi-transaction is being rolled back and there is a failure, i.e. mysql_bin_log.check_write_error(thd) returns true. However, this is not possible as a "rollback" or "commit" is written directly to the binary log and are not subjected to any boundaries imposed by max_binlog_cache_size and binlog_cache_size.

mysql_bin_log.check_write_error(thd) returned true because before rolling back the transaction by calling trans_rollback(thd) the diagnostic area is not cleaned up and has information on the previous statement that failed.

So a possible fix would be:

=== modified file 'sql/rpl_rli.cc'
--- sql/rpl_rli.cc      2010-07-15 13:47:50 +0000
+++ sql/rpl_rli.cc      2010-07-20 09:13:29 +0000
@@ -1219,8 +1219,10 @@
     to rollback before continuing with the next events.
     4) so we need this "context cleanup" function.
   */
   if (error)
   {
+    thd->clear_error();
     trans_rollback_stmt(thd); // if a "statement transaction"
     trans_rollback(thd);      // if a "real transaction"
   }

Furthermore, I don't think this is related to BUG#55377.
[6 Aug 2010 10:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115178

3473 Li-Bing.Song@sun.com	2010-08-06
      Bug #55375  	Transaction bigger than max_binlog_cache_size crashes slave
      
      When slave executes a transaction bigger than slave's max_binlog_cache_size,
      slave will crash. It is caused by the assert that server should only roll back
      the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL 
      happens. But slave sql thread always rollbacks the whole transaction when
      an error happens.
      
      Ather this patch, we always clear any error set in sql thread(it is different
      from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
      the transaction.
     @ sql/log_event.cc
        Some functions don't return the error code, so it is a wrong error code.
        The error should always be set into thd->main_da. So we use 
        slave_rows_error_report to report the right error.
[6 Sep 2010 10:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117592

3485 Li-Bing.Song@sun.com	2010-09-06
      Bug #55375  	Transaction bigger than max_binlog_cache_size crashes slave
            
      When slave executes a transaction bigger than slave's max_binlog_cache_size,
      slave will crash. It is caused by the assert that server should only roll back
      the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL 
      happens. But slave sql thread always rollbacks the whole transaction when
      an error happens.
            
      Ather this patch, we always clear any error set in sql thread(it is different
      from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
      the transaction.
     @ mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt
        binlog_cache_size and max_binlog_cache_size can be set in the client connection.
        so remove this option file.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ sql/log_event.cc
        Some functions don't return the error code, so it is a wrong error code.
        The error should always be set into thd->main_da. So we use 
        slave_rows_error_report to report the right error.
     @ sql/slave.cc
        exec_relay_log_event() need call cleanup_context() to clear context. 
        clearup_context() will call end_trans().
        
        Clear thd's error before cleanup_context. It avoid to trigger the assert
        which cause this bug.
[7 Sep 2010 3:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117650

3485 Li-Bing.Song@sun.com	2010-09-07
      Bug #55375  	Transaction bigger than max_binlog_cache_size crashes slave
            
      When slave executes a transaction bigger than slave's max_binlog_cache_size,
      slave will crash. It is caused by the assert that server should only roll back
      the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL 
      happens. But slave sql thread always rollbacks the whole transaction when
      an error happens.
            
      Ather this patch, we always clear any error set in sql thread(it is different
      from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
      the transaction.
     @ mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt
        binlog_cache_size and max_binlog_cache_size can be set in the client connection.
        so remove this option file.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ sql/log_event.cc
        Some functions don't return the error code, so it is a wrong error code.
        The error should always be set into thd->main_da. So we use 
        slave_rows_error_report to report the right error.
     @ sql/slave.cc
        exec_relay_log_event() need call cleanup_context() to clear context. 
        clearup_context() will call end_trans().
        
        Clear thd's error before cleanup_context. It avoid to trigger the assert
        which cause this bug.
[9 Oct 2010 6:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120419

3524 Li-Bing.Song@sun.com	2010-10-09
      Bug#55375  Transaction bigger than max_binlog_cache_size crashes slave
      
      When slave executes a transaction bigger than slave's max_binlog_cache_size,
      slave will crash. It is caused by the assert that server should only roll back
      the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL 
      happens. But slave sql thread always rollbacks the whole transaction when
      an error happens.
                  
      Ather this patch, we always clear any error set in sql thread(it is different
      from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
      the transaction.
     @ mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt
        binlog_cache_size and max_binlog_cache_size can be set in the client connection.
        so remove this option file.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ sql/log_event.cc
        Some functions don't return the error code, so it is a wrong error code.
        The error should always be set into thd->main_da. So we use 
        slave_rows_error_report to report the right error.
     @ sql/slave.cc
        exec_relay_log_event() need call cleanup_context() to clear context. 
        clearup_context() will call end_trans().
                
        Clear thd's error before cleanup_context. It avoid to trigger the assert
        which cause this bug.
[9 Oct 2010 7:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120420

3524 Li-Bing.Song@sun.com	2010-10-09
      Bug#55375  Transaction bigger than max_binlog_cache_size crashes slave
      
      When slave executes a transaction bigger than slave's max_binlog_cache_size,
      slave will crash. It is caused by the assert that server should only roll back
      the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL 
      happens. But slave sql thread always rollbacks the whole transaction when
      an error happens.
                  
      Ather this patch, we always clear any error set in sql thread(it is different
      from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
      the transaction.
     @ mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt
        binlog_cache_size and max_binlog_cache_size can be set in the client connection.
        so remove this option file.
     @ mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test
        SET binlog_cache_size and max_binlog_cache_size for all test cases.
        Add test case for bug#55375.
     @ sql/log_event.cc
        Some functions don't return the error code, so it is a wrong error code.
        The error should always be set into thd->main_da. So we use 
        slave_rows_error_report to report the right error.
     @ sql/slave.cc
        exec_relay_log_event() need call cleanup_context() to clear context. 
        clearup_context() will call end_trans().
                
        Clear thd's error before cleanup_context. It avoid to trigger the assert
        which cause this bug.
[11 Oct 2010 2:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120447

3229 Li-Bing.Song@sun.com	2010-10-11
      Postfix for BUG#55375.
      Removed option file and changed result file.
[11 Oct 2010 2:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120449

3230 Li-Bing.Song@sun.com	2010-10-11
      Postfix for BUG#55375
      Removed option file and changed result file.
[11 Oct 2010 3:38] Libing Song
Pushed into mysql-5.1-bugteam and merged into mysql-5.5-bugteam and mysql-trunk-merge.
[13 Oct 2010 7:35] Jon Stephens
Documented bugfix in the 5.5.7 changelog as follows:

      When slave tried to execute a transaction larger than the slave's
      value for max_binlog_cache_size, it crashed. This was caused by 
      an assertion that the server should roll back only the statement 
      but not the entire transaction when the error ER_TRANS_CACHE_FULL 
      occurred. However, the slave SQL thread always rolled back the 
      entire transaction whenever any error occurred.

Set NM status, waiting for merges to 5.1/5.6.
[29 Oct 2010 15:09] Jon Stephens
Also documented in the 5.1.53 changelog. Waiting for merge to -trunk.
[9 Nov 2010 19:47] Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:sunanda.menon@sun.com-20101109182959-otkxq8vo2dcd13la) (version source revid:sunanda.menon@sun.com-20101109182959-otkxq8vo2dcd13la) (merge vers: 5.5.7-rc) (pib:21)
[13 Nov 2010 16:07] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:37] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[15 Nov 2010 13:20] Jon Stephens
Does not appear in a 5.6 release; no additional changelog entries required.

Closed.
[18 Nov 2010 15:53] Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)