MySQL Bugs: #41431: Conflict resolution keeps applied epoch information out of sync

Bug #41431	Conflict resolution keeps applied epoch information out of sync
Submitted:	12 Dec 2008 12:50	Modified:	1 Oct 2009 8:45
Reporter:	Geert Vanderkelen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Replication	Severity:	S3 (Non-critical)
Version:	mysql-5.1-telco-6.3	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	mysql-5.1.28-ndb-6.3.18

Description:
In a simple Master/Slave Cluster replication setup, when the slave has some conflict resolution entries in mysql.ndb_replication, the applied epoch information in the ndb_apply_status and ndb_binlog_index stay out of sync.

For example, to check if the Slave applied all it got from the master:

slave> select hex(MAX(orig_epoch)) from mysql.ndb_binlog_index; select hex(epoch) from mysql.ndb_apply_status;
+----------------------+
| hex(MAX(orig_epoch)) |
+----------------------+
| 24BB00000000F        | 
+----------------------+

+---------------+
| hex(epoch)    |
+---------------+
| 24BB00000000F | 
+---------------+

The above should match when there is no traffic.
Here is when some conflict resolution is put in:

slave> select hex(MAX(orig_epoch)) from mysql.ndb_binlog_index; select hex(epoch) from mysql.ndb_apply_status;
+----------------------+
| hex(MAX(orig_epoch)) |
+----------------------+
| 24B0A00000009        | 
+----------------------+

+---------------+
| hex(epoch)    |
+---------------+
| 24B0A00000010 | 
+---------------+

(see how to repeat)

How to repeat:
Normal Cluster Replication setup, Master/Slave.

0) Make sure replication works!

1) Create mysql.ndb_replication (see manual for that: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication-conflict-resolution.html)

2) Add an entry for the test table (we haven't created it yet, and that's OK!):
  INSERT INTO mysql.ndb_replication
    VALUES ('test', 'mobile_user', 0, NULL, 'NDB$MAX(confres)');

3) Create the test table, with triggers (I do this on the slave, I don't like DDL being replicated):

CREATE TABLE mobile_user (
mid INT UNSIGNED NOT NULL AUTO_INCREMENT,
confres INT UNSIGNED NOT NULL,
serva INT UNSIGNED DEFAULT 0,
servb INT UNSIGNED DEFAULT 0,
PRIMARY KEY (mid)
) ENGINE=ndb;

CREATE TRIGGER trg_mobile_user_insert BEFORE INSERT ON mobile_user
FOR EACH ROW SET NEW.confres = UNIX_TIMESTAMP(UTC_TIMESTAMP());
CREATE TRIGGER trg_mobile_user_update BEFORE UPDATE ON mobile_user
FOR EACH ROW SET NEW.confres = UNIX_TIMESTAMP(UTC_TIMESTAMP());

4) Insert data on the master, and start updating some:

 INSERT INTO mobile_user (serva,servb) VALUES (1,2),(5,6),(10,90);

 while true; do ./bin/mysql test -e "UPDATE mobile_user SET serva = serva + 1 WHERE mid = 1"; done

5) Check if the Slave gets the updates, should work. Monitor also the epoch:

  slave> select hex(MAX(orig_epoch)) from mysql.ndb_binlog_index; select hex(epoch) from mysql.ndb_apply_status;

During traffic, step 5 will rarely show them the same. If you stop the traffic, kill the while loop, you'll get wrong information like this:

slave> select hex(MAX(orig_epoch)) from mysql.ndb_binlog_index; select hex(epoch) from mysql.ndb_apply_status;
+----------------------+
| hex(MAX(orig_epoch)) |
+----------------------+
| 24B0A00000009        | 
+----------------------+
1 row in set (0.01 sec)

+---------------+
| hex(epoch)    |
+---------------+
| 24B0A00000010 | 
+---------------+

To solve it, you could do the following:

 slave> DELETE FROM mysql.ndb_replication;
 slave> TRUNCATE mobile.user;

Once the entry gone in ndb_replication, it should report again fine.

Needs to be verified on latest, and if correct procedures.

from -> Triage Bug Meeting

Support issues shows correction was done in telco-6.3.21

Hi,

I'm not sure what it is that I am supposed to document. Was there an actual bug that was fixed? (No commit message attached to bug report.) Is there an error or omission in the documentation? If so, what's wrong or missing?

Thanks!