Bug #20677 Race condition between table discovery and insert
Submitted: 24 Jun 2006 17:28 Modified: 26 Dec 2006 16:33
Reporter: Kristian Nielsen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:mysql-5.1 OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: 5.1.12

[24 Jun 2006 17:28] Kristian Nielsen
Description:
The test case ndb_binlog_multi fails sporadically:

ndb_binlog_multi               [ fail ]

Errors are (from /dev/shm/v4/log/mysqltest-time) :
mysqltest: Result length mismatch
(the last lines may be the most important ones)
Below are the diffs between actual and expected results:
-------------------------------------------------------
*** r/ndb_binlog_multi.result   2006-06-23 01:15:03.000000000 +0300
--- r/ndb_binlog_multi.reject   2006-06-24 20:24:39.000000000 +0300
***************
*** 29,45 ****
  show binlog events from <binlog_start>;
  Log_name      Pos     Event_type      Server_id       End_log_pos     Info
  master-bin.000001     #       Query   #       #       use `test`; CREATE TABLE t2 (a INT PRIMARY KEY, b int) ENGINE = NDB
- master-bin.000001     #       Query   #       #       BEGIN
- master-bin.000001     #       Table_map       #       #       table_id: # (test.t2)
- master-bin.000001     #       Table_map       #       #       table_id: # (cluster.apply_status)
- master-bin.000001     #       Write_rows      #       #       table_id: #
- master-bin.000001     #       Write_rows      #       #       table_id: # flags: STMT_END_F
- master-bin.000001     #       Query   #       #       COMMIT
  master-bin.000001     #       Query   #       #       use `test`; DROP TABLE t2
  SELECT inserts,updates,deletes,schemaops FROM 
  cluster.binlog_index WHERE epoch=<the_epoch>;
  inserts       updates deletes schemaops
- 2     0       0       0
  reset master;
  reset master;
  CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE = NDB;
--- 29,38 ----
-------------------------------------------------------

It does not fail every time it is run, but it can be repeated reliably by running the test case in a loop

How to repeat:
(cd mysql-test && for i in `seq 1 300`; do echo XXX $i XXX; MTR_BUILD_THREAD=4 perl mysql-test-run.pl --tmpdir=/dev/shm/t4 --vardir=/dev/shm/v4 --timer --ps-protocol --mysqld=--binlog-format=row ndb_binlog_multi || exit 1; done)

It fails only fairly rarely (I have seen it requiring more than 10 iterations to fail), but it always fails sooner or later.
[24 Jun 2006 20:17] Kristian Nielsen
What happens here is that server1 is supposed to get NDB events from the operations (create, insert) on server2, and inject them into the local binlog.

But from the failure, occasionally the insert events do not arrive (or arrive too late perhaps), so the local binlog contains only the CREATE event (from server 2) and the local DROP event, not the INSERT events.

The comments in the test seem to imply that the DELETE on server1 should make sure that all events are recived first, but clearly that does not always happen. This could indicate a bug, or it could be that the events are in fact received, but are not yet injected to the local binlog due to local server1 thread scheduling.

I would like some cluster expert's opinion on whether this is a real bug or not, though somehow it does not seem right that a DELETE should cause cluster to wait for an acknowledge from binlog injection. And if no such wait for acknowledge, it seems that this is not a bug but just a race in the test case?

For now, I will try to make a patch that removes the race in the test case by explicitly waiting for the INSERT events to be injected before checking binlog.
[24 Jun 2006 21:12] Kristian Nielsen
I tried extending the test case to loop, waiting for the INSERT event. This shows that in the failure case, the INSERT events are not just delayed, they are in fact getting lost!

Found something interesting in the master1.err log. In the normal (success) case, it contains this:

060625  0:06:14 [Note] NDB Binlog: Node: 1, subscribe from node 5, Subscriber bitmask 020
060625  0:06:14 [Note] NDB Binlog: Node: 1, subscribe from node 4, Subscriber bitmask 030
060625  0:06:14 [Note] NDB Binlog: Node: 2, subscribe from node 5, Subscriber bitmask 020
060625  0:06:14 [Note] NDB Binlog: Node: 2, subscribe from node 4, Subscriber bitmask 030
060625  0:06:14 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t2
060625  0:06:14 [Note] NDB Binlog: logging ./test/t2

But in the failure case, it contains instead this:

060624 23:48:56 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t2
060624 23:48:56 [Note] NDB Binlog: logging ./test/t2
060624 23:48:56 [Note] NDB Binlog: Node: 1, subscribe from node 5, Subscriber bitmask 020
060624 23:48:56 [Note] NDB Binlog: Node: 1, subscribe from node 4, Subscriber bitmask 030
060624 23:48:56 [Note] NDB Binlog: Node: 2, subscribe from node 5, Subscriber bitmask 020
060624 23:48:56 [Note] NDB Binlog: Node: 2, subscribe from node 4, Subscriber bitmask 030

Note that in the failure case, the subscription seems to happen *after* table creation, and in the success case *before* the table creation. This was verified in several success and several failure runs.

So it looks like a late subscription causes the insert (but not create) events to get lost.
[25 Jun 2006 7:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8195
[25 Jun 2006 7:12] Kristian Nielsen
I have committed a patch that starts the test with a dummy table CREATE and DROP, to ensure that event subscription is set up properly. This fixes the test failues (I was able to run the test 300 times in a loop with no failures).

Tomas, I need your help to decide whether this is a valid fix (so it is just the testcase that is broken), or whether there is a real bug to be fixed.
[26 Jun 2006 17:19] Kristian Nielsen
Quoting Tomas:

"The real problem is that the insert is allowed before the discovery of the table has completed totally... if only occurs in this special case when you start up a mysqld at the same time as you do a create table..."

I will attach the master.err and master1.err log files from when the failure occurs.

For now, I have pushed to 5.1.12 the workaround with extra create/drop table at the start of the test. This fixes the test failure. The bug is therefore no longer a P1/Showstopper. It only occurs if one mysqld issues create table simultaneously with the start of another mysqld.
[26 Jun 2006 17:19] Kristian Nielsen
master error log

Attachment: master.err (application/octet-stream, text), 4.74 KiB.

[26 Jun 2006 17:20] Kristian Nielsen
master1 error log

Attachment: master1.err (application/octet-stream, text), 3.65 KiB.