Bug #40278 Replication failure on RBR + MyISAM + SAVEPOINTs
Submitted: 23 Oct 2008 9:33 Modified: 8 Mar 2010 18:57
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S2 (Serious)
Version:5.1-rpl, 5.1-bzr OS:Any
Assigned to: Alfranio Junior CPU Architecture:Any

[23 Oct 2008 9:33] Philip Stoev
Description:
When executing a transactional workload containing MyISAM tables and SAVEPOINTs, replication will fail as follows:

Could not execute Update_rows event on table test.table10_myisam_int_autoinc; Can't find record in 'table10_myisam_int_autoinc', Error_code: 1032;
handler error HA_ERR_KEY_NOT_FOUND;
the event's master log master-bin.000001, end_log_pos 75732

Note that even if savepoints do not make sense in a pure-MyISAM setup, transactions containing both MyISAM and Innodb are a valid real-life workload  and will exhibit the same problem. The error is always in the MyISAM table.

How to repeat:
A test case will be uploaded shortly.
[23 Oct 2008 9:34] Philip Stoev
YY file

Attachment: bug40278.yy (application/octet-stream, text), 442 bytes.

[23 Oct 2008 9:34] Philip Stoev
ZZ file

Attachment: bug40278.zz (text/plain), 169 bytes.

[23 Oct 2008 9:40] Philip Stoev
To reproduce with the random query generator, please clone the mysql-test-extra-6.0 tree and then execute:

$ perl runall.pl \
  --basedir=/path/to/5.1-rpl \
  --grammar=bug40278.yy \
  --gendata=conf/bug40278.zz
  --rpl_mode=row \
  --mysqld=--innodb-lock-wait-timeout=1

This example uses both MyISAM and Innodb for greater plausibility. The Innodb tables however are not needed and you can get them out of the way by editing the ZZ file.

Concurrency is required for this bug to happen, running the workload in a single thread does not cause issues.
[23 Oct 2008 9:51] Valeriy Kravchuk
Can not repeat with main mysql-5.1 tree from bzr.
[23 Oct 2008 9:54] Valeriy Kravchuk
Sorry, was able to repeat with 3 concurrent threads even with 5.1 main tree:

openxs@suse:/home2/openxs/bzr/mysql-test-extra-6.0/mysql-test/gentest> perl runall.pl --basedir=/home2/openxs/dbs/5.1 --rpl_mode=row --gendata=/home2/openxs/bug40278.zz --grammar=/home2/openxs/bug40278.yy --mysqld=--innodb_lock_wait_timeout=1 --threads=3
# 16:58:32 Please see http://inside.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework.
# 16:58:32 Starting
# 16:58:32 # runall.pl \
# 16:58:32 # --basedir=/home2/openxs/dbs/5.1 \
# 16:58:32 # --rpl_mode=row \
# 16:58:32 # --gendata=/home2/openxs/bug40278.zz \
# 16:58:32 # --grammar=/home2/openxs/bug40278.yy \
# 16:58:32 # --mysqld=--innodb_lock_wait_timeout=1 \
# 16:58:32 # --threads=3
# 16:58:32 Running perl mysql-test-run.pl --start-and-exit --mysqld="--innodb_lock_wait_timeout=1" --mysqld=--relay-log=master-relay-bin --mysqld=--loose-innodb --mysqld=--loose-falcon-debug-mask=2 --mysqld=--secure-file-priv= --mysqld=--max-allowed-packet=16Mb --master_port=19306 rpl_alter --slave_port=19308 .
Logging: mysql-test-run.pl --start-and-exit --mysqld=--innodb_lock_wait_timeout=1 --mysqld=--relay-log=master-relay-bin --mysqld=--loose-innodb --mysqld=--loose-falcon-debug-mask=2 --mysqld=--secure-file-priv= --mysqld=--max-allowed-packet=16Mb --master_port=19306 rpl_alter --slave_port=19308
MySQL Version 5.1.30
Using dynamic switching of binlog format
Using ndbcluster when necessary, mysqld supports it
Setting mysqld to support SSL connections
Binaries are debug compiled
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 19306
Using MASTER_MYPORT1        = 19307
Using SLAVE_MYPORT          = 19308
Using SLAVE_MYPORT1         = 19309
Using SLAVE_MYPORT2         = 19310
Using NDBCLUSTER_PORT       = 9311
Using NDBCLUSTER_PORT_SLAVE = 9312
Using IM_PORT               = 9313
Using IM_MYSQLD1_PORT       = 9314
Using IM_MYSQLD2_PORT       = 9315
 - adding combinations
Killing Possible Leftover Processes
Removing Stale Files
Creating Directories
Installing Master Database
Installing Slave1 Database
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

Servers started, exiting
# 16:58:37 Establishing replication, mode row ...
# 16:58:37 Starting
# 16:58:37  gentest.pl \
# 16:58:37  --gendata=/home2/openxs/bug40278.zz \
# 16:58:37  --rpl_mode=row \
# 16:58:37  --threads=3 \
# 16:58:37  --queries=1000 \
# 16:58:37  --duration=3600 \
# 16:58:37  --dsn1=dbi:mysql:host=127.0.0.1:port=19306:user=root:database=test \
# 16:58:37  --grammar=/home2/openxs/bug40278.yy
# 16:58:37 Creating table table10_innodb_int_autoinc .
# 16:58:37 Creating table table10_myisam_int_autoinc .
# 16:58:37 Reporters: ErrorLog, Backtrace
# 16:58:37 Validators: FalconErrors, ErrorMessageCorruption, ReplicationSlaveStatus
# 16:58:37 Starting 3 processes, 1000 queries each, duration 3600 seconds.
# 16:58:41 Child process completed successfully.
# 16:58:41 Started periodic reporting process...
# 16:58:44 Slave thread has stopped with error: Could not execute Update_rows event on table test.table10_myisam_int_autoinc; Can't find record in 'table10_myisam_int_autoinc', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 288221
# 16:58:44 Slave thread has stopped with error: Could not execute Update_rows event on table test.table10_myisam_int_autoinc; Can't find record in 'table10_myisam_int_autoinc', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 288221
# 16:58:49 Killing periodic reporting process with pid 31881...
# 16:58:54 Killing child process with pid 31887...
# 16:58:54 Test completed with failure status 103.
[7 Apr 2009 16:26] Alfranio Junior
The savepoint plays no role in the bug as it is disabled in the configuration file. The bug seems to be a duplicate of BUG#40251.

query:
	insert | update | delete |
	xid_event ;

xid_event:
	START TRANSACTION | COMMIT | ROLLBACK ;
#	SAVEPOINT A | ROLLBACK TO SAVEPOINT A | RELEASE SAVEPOINT A ;
[16 Apr 2009 16:59] Alfranio Junior
To reproduce the problem do what follows:

1 - Start a MySQL with the binary log enabled.
2 - Set the row format.
3 - Start two concurrent connections (i.e. mysql), 
referenced in what follows as con_01 and con_02.
4 - Create the following table through one of the
connections:
 CREATE TABLE test (a int);
 INSERT INTO test VALUES(1);
5 - Execute the following sequence of commands:

con1: START TRANSACTION;
con2: START TRANSACTION;
con1: UPDATE test SET a=10;
con2: DELETE FROM test;
con2: COMMIT;
con1: COMMIT;

6 - Check the binary log.
[16 Apr 2009 17:02] Alfranio Junior
Binary Log (mysqlbinlog -v var/mysqld.1/data/master-bin.000001)

Attachment: sequence.txt (text/plain), 3.25 KiB.

[16 Apr 2009 17:13] Alfranio Junior
In the context of a transaction, the replication code writes to the binary log only when a commit or rollback is requested by the application. Although this
rule is fine for transactional engines, a non-transactional engine should write their changes to the binary log right after the execution of every statement as any lock acquired is release at this time thus making the changes available for concurrent transactions.
[16 Apr 2009 17:19] Alfranio Junior
Suggested fix, change the following code in log.cc:

    /*
      Should we write to the binlog cache or to the binlog on disk?
      Write to the binlog cache if:
      - it is already not empty (meaning we're in a transaction; note that the
     present event could be about a non-transactional table, but still we need
     to write to the binlog cache in that case to handle updates to mixed
     trans/non-trans table types the best possible in binlogging)
      - or if the event asks for it (cache_stmt == TRUE).
    */
    if (opt_using_transactions && thd)
    {
      if (thd->binlog_setup_trx_data())
        goto err;

      binlog_trx_data *const trx_data=
        (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
      IO_CACHE *trans_log= &trx_data->trans_log;
      my_off_t trans_log_pos= my_b_tell(trans_log);
      if (event_info->get_cache_stmt() || trans_log_pos != 0)
      {
        DBUG_PRINT("info", ("Using trans_log: cache: %d, trans_log_pos: %lu",
                            event_info->get_cache_stmt(),
                            (ulong) trans_log_pos));
        if (trans_log_pos == 0)
          thd->binlog_start_trans_and_stmt();
        file= trans_log;
      }
[13 May 2009 1:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73889

2874 Alfranio Correia	2009-05-13
      BUG#40278 Replication failure on RBR + MyISAM + SAVEPOINTs
            
      This is a draft of a patch and should be only used to foster discussions
      on the final approach.
[15 Jun 2009 14:49] Alfranio Junior
Please, check the WL#2687.
[4 Nov 2009 1:35] Alfranio Junior
Pushed to mysql-5.1-rep+3

revision-id: alfranio.correia@sun.com-20091103190256-637o8qxlveikrt3i
[12 Nov 2009 23:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90290

3120 Alfranio Correia	2009-11-12
      Post-fix for WL#2687 WL#5072 BUG#40278 BUG#47175
      
      Create a set of test cases to see if some DDL statements implicitly commit
      a transaction on the NDB and are written directly to the binary log without
      going through either the Statement- or Transactional-Cache.
[13 Nov 2009 0:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90291

3120 Alfranio Correia	2009-11-13
      Post-fix for WL#2687 WL#5072 BUG#40278 BUG#47175
      
      Create a set of test cases to see if some DDL statements implicitly commit
      a transaction on the NDB and are written directly to the binary log without
      going through either the Statement- or Transactional-Cache.
[18 Jan 2010 12:05] Bugs System
Pushed into 6.0.14-alpha (revid:alik@ibmvm-20100118120357-hnzhgadkpzqfnvsc) (version source revid:alik@ibmvm-20100118115413-kd3klpine09yyktw) (merge vers: 6.0.14-alpha) (pib:16)
[18 Jan 2010 12:06] Bugs System
Pushed into mysql-next-mr (revid:alik@ibmvm-20100118120111-73dulkgc893it4r9) (version source revid:alik@ibmvm-20100118115335-0stecyzftqm7bqx6) (pib:16)
[23 Jan 2010 0:50] Paul DuBois
Noted in 6.0.14 changelog.

Using row-based replication, executing a transactional workload
containing MyISAM tables and SAVEPOINT statements caused replication
to fail. 

Setting report to NDI pending push to Celosia.
[15 Feb 2010 10:17] Jon Stephens
Discussed with Alfranio; no further merges expected. Closed.
[6 Mar 2010 11:05] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@linux-rbsx-20100118220048-5vnyqi5ghsbgmdsd) (merge vers: 5.5.99-m3) (pib:16)
[8 Mar 2010 18:57] Jon Stephens
Already documented in the 5.5.3 changelog. Closed.