Bug #16559 Replication Problems with Non transactional tables inside an interrupted trans.
Submitted: 17 Jan 2006 11:34 Modified: 28 Feb 2006 2:39
Reporter: Sebastian Nohn Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0.18 OS:Linux (Linux)
Assigned to: Guilhem Bichot CPU Architecture:Any

[17 Jan 2006 11:34] Sebastian Nohn
Description:
When running INSERTs on an MyISAM table inside a transacation that also updates an InnoDB table while that transaction gets interrupted (i.e. connection drop or something), the MySQL Slave goes out of sync.

Setup is two MySQL 5.0.18-standard with Master<->Master replication.

How to repeat:
This breaks replication:
============================================
CREATE TABLE LOGGING (
  GUID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  ID_USER INTEGER UNSIGNED NULL,
  SERVER_ADDR VARCHAR(20) NULL,
  ID_SESSION VARCHAR(50) NULL,
  TIME INTEGER UNSIGNED NULL,
  CHANNEL VARCHAR(20) NULL,
  SECTION VARCHAR(50) DEFAULT NULL,
  LEVEL TINYINT UNSIGNED NULL,
  DESCRIPTION VARCHAR(255) NULL,
  PRIMARY KEY(GUID)
)
TYPE=MyISAM;

CREATE TABLE USER_SESSION (
  ID_SESSION varchar(64) NOT NULL default '',
  SESSION_DATA mediumtext NOT NULL,
  LAST_UPDATE timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY (ID_SESSION)
)
TYPE=InnoDB;

mysql dms_2_0 < $file

$file (to simulate interruption)
---------------------------------------------------------------------------------
SET AUTOCOMMIT=0;
BEGIN;
insert into logging (id_user, time, server_addr, id_session, channel, section, level, description) values (null, 1137485760, "192.168.200.125", "91b4880d02d6c59df2fb7172caa78d2a", "", "",4, "Enter doPreFlightCookieCheck");
update user_session set session_data = 'VudABlbmFibGVkIjtiOjE7czoyMToiAENvbXBvbmVudABhdHRyaWJ1dGVzIjthOjA6e319fQ==', last_update = now() where id_session = '91b4880d02d6c59df2fb7172caa78d2a';
---------------------------------------------------------------------------------
============================================

If you change the table handler of "logging" to InnoDB it does not break:

============================================
CREATE TABLE LOGGING (
  GUID INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  ID_USER INTEGER UNSIGNED NULL,
  SERVER_ADDR VARCHAR(20) NULL,
  ID_SESSION VARCHAR(50) NULL,
  TIME INTEGER UNSIGNED NULL,
  CHANNEL VARCHAR(20) NULL,
  SECTION VARCHAR(50) DEFAULT NULL,
  LEVEL TINYINT UNSIGNED NULL,
  DESCRIPTION VARCHAR(255) NULL,
  PRIMARY KEY(GUID)
)
TYPE=InnoDB;

CREATE TABLE USER_SESSION (
  ID_SESSION varchar(64) NOT NULL default '',
  SESSION_DATA mediumtext NOT NULL,
  LAST_UPDATE timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY (ID_SESSION)
)
TYPE=InnoDB; 

mysql dms_2_0 < $file

$file (to simulate interruption)
---------------------------------------------------------------------------------
SET AUTOCOMMIT=0;
BEGIN;
insert into logging (id_user, time, server_addr, id_session, channel, section, level, description) values (null, 1137485760, "192.168.200.125", "91b4880d02d6c59df2fb7172caa78d2a", "", "",4, "Enter doPreFlightCookieCheck");
update user_session set session_data = 'VudABlbmFibGVkIjtiOjE7czoyMToiAENvbXBvbmVudABhdHRyaWJ1dGVzIjthOjA6e319fQ==', last_update = now() where id_session = '91b4880d02d6c59df2fb7172caa78d2a';
---------------------------------------------------------------------------------

If you do a

ALTER TABLE logging ENGINE=myisam; 

afterwards and run

mysql dms_2_0 < $file

again, replication breaks again.
============================================

echo "show slave status\G; " | mysql
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 192.168.200.102
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: bladeA2-bin.000002
        Read_Master_Log_Pos: 907967
             Relay_Log_File: bladeB2-relay-bin.000002
              Relay_Log_Pos: 499005
      Relay_Master_Log_File: bladeA2-bin.000002
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 1053
                 Last_Error: Query partially completed on the master (error on master: 1053) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; . Query: 'ROLLBACK'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 498866
            Relay_Log_Space: 908106
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL

# mysqlbinlog -j 498866 bladeA2-bin.000002 | less
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
# at 498866
#060113 16:35:54 server id 1001 end_log_pos 498950 Query thread_id=18 exec_time=0 error_code=0
use dms_2_0;
SET TIMESTAMP=1137166554;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1;
SET @@session.sql_mode=0;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1;
SET @@session.character_set_client=31,@@session.collation_connection=31,@@session.collation_server=31;
SET @@session.time_zone='SYSTEM';
BEGIN;
# at 498950
#060113 16:35:54 server id 1001 end_log_pos 7803 Query thread_id=18 exec_time=0 error_code=0
SET TIMESTAMP=1137166554;
update user_session set session_data = '.....', last_update = now() where id_session = '91b4880d02d6c59df2fb7172caa78d2a';
# at 506753
#060113 16:35:54 server id 1001 end_log_pos 506840 Query thread_id=18 exec_time=0 error_code=1053
SET TIMESTAMP=1137166554;
ROLLBACK;
[17 Jan 2006 11:35] Sebastian Nohn
Also happens on 5.0.17-max
[17 Jan 2006 13:29] Hartmut Holzgraefe
This is expected behavior. As the MyISAM table is not transactional changes applied to it
can't be rolled back when the transaction fails or is manually rolled back using the ROLLBACK
command. Replication only replicates successfully completed transactions on the other hand,
so changes done to non-transactional tables within rolled back transactions will never be seen
by the slave.
[17 Jan 2006 13:35] Sebastian Nohn
Is it also expected that the slaves goes completely out of sync?
[17 Jan 2006 21:17] Guilhem Bichot
Hello Sebastian,
Sorry, after a second look your claims are valid.
First, in MySQL, changes to a non-transactional table, if done in a transaction, are replicated properly even though the transaction rolled back.
Second, normally in your case, the binlog should look like (output of mysqlbinlog):

INSERT into myisam;
BEGIN; # this is the place where you started mysqlbinlog
UPDATE of innodb;
ROLLBACK;
and that would be fine (could you please check if you have this output?)
[ in fact the BEGIN+UPDATE+ROLLBACK block is not necessary, but due to the presence of the MyISAM insertion *inside* the transaction, well in some more complicated cases it is necessary. ]
What's really wrong is that ROLLBACK has 1053 as its error code (this is what causes your slave to stop. The error code should be 0; it's a bug which is not present in MySQL 4.1 and was introduced in MySQL 5.0. We'll fix that one, and then your replication should be fine.
Until then, there is the possibly dangerous --slave-skip-errors=1053 which can be used on your slave but that is scary. If you build from source I can send you a patch.
Testcase for the replication team:
drop table if exists ti,tm;
create table ti (a int) engine=innodb;
create table tm (a int) engine=myisam;
insert into ti values(1);
reset master;
begin;
insert into tm values(2);
update ti set a=3 where a=10; # stay connected after that.
Run the above, then shutdown mysqld, then restart it and do mysqlbinlog: the "ROLLBACK" event will have error code 1053 (bad) in 5.0 and 0 (good) in 4.1.
[18 Jan 2006 14:09] Sebastian Nohn
Output is:

# mysqlbinlog /usr/local/mysql/data/bladeA2-bin.000001
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
# at 4
#060118 11:52:03 server id 1001 end_log_pos 98 Start: binlog v 4, server v 5.0.18-standard-log created 060118 11:52:03 at startup
ROLLBACK;
# at 98
#060118 11:52:03 server id 1001 end_log_pos 193 Query thread_id=1185 exec_time=0 error_code=0
use dms_2_0;
SET TIMESTAMP=1137581523;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1;
SET @@session.sql_mode=0;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=31;
insert into tm values(2);
# at 193
#060118 11:52:05 server id 1001 end_log_pos 269 Query thread_id=1185 exec_time=21 error_code=0
SET TIMESTAMP=1137581525;
BEGIN;
# at 269
#060118 11:52:05 server id 1001 end_log_pos 99 Query thread_id=1185 exec_time=0 error_code=0
SET TIMESTAMP=1137581525;
update ti set a=3 where a=10;
# at 368
#060118 11:52:05 server id 1001 end_log_pos 447 Query thread_id=1185 exec_time=21 error_code=1053
SET TIMESTAMP=1137581525;
ROLLBACK;
# at 447
#060118 11:52:26 server id 1001 end_log_pos 466 Stop
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
[18 Feb 2006 16:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/2850
[18 Feb 2006 16:36] Guilhem Bichot
ChangeSet
  1.2061 06/02/18 17:19:16 guilhem@mysql.com +3 -0
  Fix for BUG#16559 "Replication Problems with Non transactional tables inside an interrupted trans.":
  problem was: when a connection disconnects having an open transaction affecting MyISAM and InnoDB, the ROLLBACK event
+stored in the binary log
  contained a non-zero error code (1053 because of the disconnection), so when slave applied the transaction, slave
+complained that its ROLLBACK succeeded
  (error_code=0) while master's had 1053, so slave stopped. But internally generated binlog events such as this ROLLBACK
  should always have 0 as error code, as is true in 4.1 and was accidentally broken in 5.0,
  so that there is no false alarm.
[18 Feb 2006 20:35] Guilhem Bichot
the fix is in 5.0.19 and 5.1.8
[28 Feb 2006 2:39] Paul DuBois
Noted in 5.0.19, 5.1.7 changelogs.

For a transaction that used <literal>MyISAM</literal> and
<literal>InnoDB</literal> tables, interruption of the
transaction due to a dropped connection on a master server
caused slaves to lose synchrony. (Bug #16559)