Bug #50124 Rpl failure on DROP table with concurrent txn/non-txn DML flow and SAVEPOINT
Submitted: 6 Jan 2010 23:46 Modified: 30 Jul 2010 3:01
Reporter: Elena Stepanova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:5.5.3-m3, 5.6.99-m4 OS:Any
Assigned to: Jon Olav Hauglid
Triage: Triaged: D2 (Serious)

[6 Jan 2010 23:46] Elena Stepanova
Description:
When a transaction involves both transactional and non-transactional tables, it is written into the binlog fully upon COMMIT even if ROLLBACK TO SAVEPOINT was executed in the middle. If a concurrent connection attempts to drop a transactional table which was locked after SAVEPOINT was set, it is able to do so as soon as ROLLBACK TO SAVEPOINT is executed and the lock is released. The DML transaction is written to the binlog later and makes slave SQL thread abort with error 1146 (table does not exist).

How to repeat:
# t/rpl_lock.test
# run as
# perl ./mysql-test-run.pl \
# --mysqld=--binlog_format=row --mysqld=--innodb rpl_lock
#-----------------

--source include/master-slave.inc
--source include/have_innodb.inc

connection master;

USE test;
DROP TABLE IF EXISTS t, log, t2;
CREATE TABLE t (i INT) ENGINE = InnoDB;
CREATE TABLE t2 LIKE t;
CREATE TABLE log (i INT) ENGINE = MyISAM;
FLUSH LOGS;
START TRANSACTION;
INSERT INTO t2 VALUES (1);
INSERT INTO log VALUES (1);
SAVEPOINT insert_statement;
INSERT INTO t VALUES (1);

connection master1;

USE test;
send DROP TABLE t;

connection master;

ROLLBACK TO SAVEPOINT insert_statement;
INSERT INTO t2 VALUES (2);
COMMIT;

connection master1;
reap;
FLUSH LOGS;
SHOW BINLOG EVENTS IN 'master-bin.000002';
[8 Jan 2010 14:03] Philip Stoev
I asked Jon Olav Hauglid to take a look at this from a MDL perspective.
[14 Jan 2010 14:10] Luis Soares
See also: BUG#47327. Probably related.
[14 Jan 2010 18:39] Alfranio Correia
This is not related to BUG#47327 which is just a request for
optimization. Most likely, this bug is related to BUG#42643.

Note that BUG#42643 is about both the drop and truncate
statements.
[9 Apr 2010 8:09] Zhenxing He
See also bug#47327
[5 May 2010 14:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107543

3013 Jon Olav Hauglid	2010-05-05
      Bug #50124 Rpl failure on DROP table with concurrent txn/non-txn
                 DML flow and SAVEPOINT
      
      The problem was that replication could break if a transaction involving
      both transactional and non-transactional tables was rolled back to a
      savepoint. It broke if a concurrent connection tried to drop a
      transactional table which was locked after the savepoint was set.
      This DROP TABLE completed when ROLLBACK TO SAVEPOINT was executed as the
      lock on the table was dropped by the transaction. When the slave later
      tried to apply the binlog, it would fail as the table would already
      have been dropped.
      
      The reason for the problem is that transactions involving both
      transactional and non-transactional tables are written fully to the
      binlog during ROLLBACK TO SAVEPOINT. At the same time, metadata locks
      acquired after a savepoint, were released during ROLLBACK TO SAVEPOINT.
      This allowed a second connection to drop a table only used between
      SAVEPOINT and ROLLBACK TO SAVEPOINT. Which caused the transaction binlog
      to refer to a non-existing table when it was written during ROLLBACK
      TO SAVEPOINT.
      
      This patch fixes the problem by not releasing metadata locks when
      ROLLBACK TO SAVEPOINT is executed if binlogging is enabled.
      
      Test case added to rpl.rpl_savepoint.test.
[2 Jun 2010 10:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109938

3089 Jon Olav Hauglid	2010-06-02
      Bug #50124 Rpl failure on DROP table with concurrent txn/non-txn
                 DML flow and SAVEPOINT
      
      The problem was that replication could break if a transaction involving
      both transactional and non-transactional tables was rolled back to a
      savepoint. It broke if a concurrent connection tried to drop a
      transactional table which was locked after the savepoint was set.
      This DROP TABLE completed when ROLLBACK TO SAVEPOINT was executed as the
      lock on the table was dropped by the transaction. When the slave later
      tried to apply the binlog, it would fail as the table would already
      have been dropped.
      
      The reason for the problem is that transactions involving both
      transactional and non-transactional tables are written fully to the
      binlog during ROLLBACK TO SAVEPOINT. At the same time, metadata locks
      acquired after a savepoint, were released during ROLLBACK TO SAVEPOINT.
      This allowed a second connection to drop a table only used between
      SAVEPOINT and ROLLBACK TO SAVEPOINT. Which caused the transaction binlog
      to refer to a non-existing table when it was written during ROLLBACK
      TO SAVEPOINT.
      
      This patch fixes the problem by not releasing metadata locks when
      ROLLBACK TO SAVEPOINT is executed if binlogging is enabled.
      
      The patch also makes sure that metadata locks taken inside a sub-statement
      are not released if the sub-statement does a ROLLBACK TO SAVEPOINT.
      This prevents items protected by these metadata locks from being
      altered/dropped before the sub-statement has completed.
[2 Jun 2010 11:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109957

3091 Jon Olav Hauglid	2010-06-02
      Bug #50124 Rpl failure on DROP table with concurrent txn/non-txn
                 DML flow and SAVEPOINT
      
      The problem was that replication could break if a transaction involving
      both transactional and non-transactional tables was rolled back to a
      savepoint. It broke if a concurrent connection tried to drop a
      transactional table which was locked after the savepoint was set.
      This DROP TABLE completed when ROLLBACK TO SAVEPOINT was executed as the
      lock on the table was dropped by the transaction. When the slave later
      tried to apply the binlog, it would fail as the table would already
      have been dropped.
      
      The reason for the problem is that transactions involving both
      transactional and non-transactional tables are written fully to the
      binlog during ROLLBACK TO SAVEPOINT. At the same time, metadata locks
      acquired after a savepoint, were released during ROLLBACK TO SAVEPOINT.
      This allowed a second connection to drop a table only used between
      SAVEPOINT and ROLLBACK TO SAVEPOINT. Which caused the transaction binlog
      to refer to a non-existing table when it was written during ROLLBACK
      TO SAVEPOINT.
      
      This patch fixes the problem by not releasing metadata locks when
      ROLLBACK TO SAVEPOINT is executed if binlogging is enabled.
[25 Jun 2010 7:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/112165

3074 Jon Olav Hauglid	2010-06-25
      Bug #50124 Rpl failure on DROP table with concurrent txn/non-txn
                 DML flow and SAVEPOINT
      
      The problem was that replication could break if a transaction involving
      both transactional and non-transactional tables was rolled back to a
      savepoint. It broke if a concurrent connection tried to drop a
      transactional table which was locked after the savepoint was set.
      This DROP TABLE completed when ROLLBACK TO SAVEPOINT was executed as the
      lock on the table was dropped by the transaction. When the slave later
      tried to apply the binlog, it would fail as the table would already
      have been dropped.
      
      The reason for the problem is that transactions involving both
      transactional and non-transactional tables are written fully to the
      binlog during ROLLBACK TO SAVEPOINT. At the same time, metadata locks
      acquired after a savepoint, were released during ROLLBACK TO SAVEPOINT.
      This allowed a second connection to drop a table only used between
      SAVEPOINT and ROLLBACK TO SAVEPOINT. Which caused the transaction binlog
      to refer to a non-existing table when it was written during ROLLBACK
      TO SAVEPOINT.
      
      This patch fixes the problem by not releasing metadata locks when
      ROLLBACK TO SAVEPOINT is executed if binlogging is enabled.
[25 Jun 2010 7:50] Jon Olav Hauglid
Pushed to mysql-trunk-bugfixing (5.5.6) and merged to mysql-next-mr-bugfixing.
[23 Jul 2010 12:26] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[23 Jul 2010 12:33] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100723121929-90e9zemk3jkr2ocy) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (pib:18)
[30 Jul 2010 3:01] Paul Dubois
Noted in 5.5.6 changelog.

Replication could break if a transaction involving both transactional
and nontransactional tables was rolled back to a savepoint. It broke
if a concurrent connection tried to drop a transactional table which
was locked after the savepoint was set. This DROP TABLE completed
when ROLLBACK TO SAVEPOINT was executed because the lock on the table
was dropped by the transaction. When the slave later tried to apply
the binary log events, it would fail because the table had already
been dropped.