Bug #55888 ndb table binlogged as transactional table when ndb_use_transactions=0
Submitted: 10 Aug 2010 15:41 Modified: 12 Aug 2010 19:15
Reporter: Sven Sandberg Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:mysql-5.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: 5.1+
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[10 Aug 2010 15:41] Sven Sandberg
Description:
Transactional and non-transactional updates are written to the binary log in different ways in order to guarantee that a replication slave does not go out of sync. Since NDB is a transactional storage engine, updates to NDB tables are logged as updates to other transactional tables.

When ndb_use_transactions=0, NDB is a non-transactional storage engine. Despite this, updates to NDB tables are logged as if NDB was transactional. This can cause slave to go out of sync.

How to repeat:
--source include/ndb_master-slave.inc
--source include/have_innodb.inc
SET ndb_use_transactions = 0;
CREATE TABLE t1 (a INT) ENGINE = NDB;
CREATE TABLE t2 (a INT) ENGINE = InnoDB;
BEGIN;
INSERT INTO t1 VALUES (1);
--connection master1
  INSERT INTO t2 SELECT * FROM t1;
--connection master
ROLLBACK;
SELECT * FROM t1;
SELECT * FROM t2;
--sync_slave_with_master
SELECT * FROM t1;
SELECT * FROM t2;

Suggested fix:
1. Treat ndb as non-transactional when ndb_use_transactions=0.
2. Extend existing tests for transactions in replication to cover the case when NDB is used as a non-transactional storage engine.
[10 Aug 2010 15:45] Sven Sandberg
The test case above is for mysql-5.1-telco-7.1.

In mysql-5.1, the test case does not show the bug. This may be due to BUG#43236 (I'm guessing).
[12 Aug 2010 18:30] Sveta Smirnova
Thank you for the report.

Which results do you get? I get:

SELECT * FROM t1;
a
1
SELECT * FROM t2;
a
1
SELECT * FROM t1;
a
1
SELECT * FROM t2;
a

But this means query on InnoDB table was not rolled back while NDB table works fine. Version 5.1-telco-7.0
[12 Aug 2010 18:31] Sveta Smirnova
Bug #55911 can be related.
[12 Aug 2010 19:03] Sven Sandberg
Sveta, I get the same results as you did. This is a manifestation of the bug. The bug in cluster replication causes the innodb table to go out of sync as follows:

1. It creates one NDB table and one INNODB table.
2. Inside a transaction, it inserts a row in the NDB table. It does not commit
   the transaction, but since NDB is non-transactional, the insert takes effect
   immediately. However, since replication thinks that NDB is transactional,
   the statement is not written to the binary log immediately, it is only
   written to an internal cache.
3. From another connection, it does an INSERT...SELECT that copies the row
   from the ndb table to the innodb table. This statement is committed
   immediately (autocommit=1), so it is written to the binary log immediately.
4. The transaction in the first connection is committed. This causes the master
   to flush the cache to the binary log.

So the binary log is out of order: the INSERT...SELECT (from ndb to innodb) is written first, and the transaction that populates the ndb table is written later. So the slave executes INSERT...SELECT when the ndb table is empty whereas the master executed INSERT...SELECT when the ndb table is populated.

(I don't know why I made the transaction a ROLLBACK. It works just as well with a COMMIT.)
[12 Aug 2010 19:15] Sveta Smirnova
Sven,

thank you for the feedback. Verified as described.