MySQL Bugs: #21842: Cluster fails to replicate to innodb or myisam with err 134 using TPC-B

Bug #21842	Cluster fails to replicate to innodb or myisam with err 134 using TPC-B
Submitted:	25 Aug 2006 18:21	Modified:	4 Sep 2007 12:55
Reporter:	Jonathan Miller	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S2 (Serious)
Version:	5.1.12, 5.1.15	OS:	Linux (Linux 64bit )
Assigned to:	Rafal Somla	CPU Architecture:	Any

Description:
I created a master and slave cluster. Use the load_tpcb.pl to create and load the TPCB database using cluster tables. On the slave I altered the tpcb tables to use myisam and also tried Innodb.

As soon as the TPCB test start the slave fails with:

Last_Errno: 134
Last_Error: Error in Write_rows event: error during transaction execution on table TPCB.account

Another Example:
Last_Errno: 134
Last_Error: Error in Write_rows event: error during transaction execution on table TPCB.trans

If I alter the table back to NDB and start the slave, the slave will move past and then error on the next table that is myisam or innodb.

Now mysql error show's to be:
MySQL error code 134: Record was already deleted (or record file crashed)

But the TPC-B test only does select, inserts and deletes. In addition the Last_error is in Write_rows

Additional error from the slave error log:

060825 17:52:54 [ERROR] Slave: Error in Write_rows event: row application failed, Error_code: 134060825 17:52:54 [ERROR] Slave: Error in Write_rows event: error during transaction execution on table TPCB.branch, Error_code: 134
060825 17:52:54 [ERROR] Slave (additional info): Unknown error Error_code: 1105
060825 17:52:54 [Warning] Slave: Unknown error Error_code: 1105
060825 17:52:54 [Warning] Slave: Unknown error Error_code: 1105
060825 17:52:54 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master2.000001' position 3859
060825 17:53:28 [Note] Slave SQL thread initialized, starting replication in log 'master2.000001' at position 3859, relay log './ndb18-relay-bin.000003' position: 3998
060825 17:53:28 [ERROR] Slave: Error in Write_rows event: row application failed, Error_code: 134060825 17:53:28 [ERROR] Slave: Error in Write_rows event: error during transaction execution on table TPCB.teller, Error_code: 134
060825 17:53:28 [ERROR] Slave (additional info): Unknown error Error_code: 1105
060825 17:53:28 [Warning] Slave: Unknown error Error_code: 1105
060825 17:53:28 [Warning] Slave: Unknown error Error_code: 1105
060825 17:53:28 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master2.000001' position 45
060825 18:00:46 [Note] Slave I/O thread killed while reading event
060825 18:00:46 [Note] Slave I/O thread exiting, read up to log 'master2.000001', position 5928457

 

How to repeat:
I have tried to recreate this using mysql-test, but the test passes and does not fail.

Create cluster replication as follows

Master
host1: MySQLD <- Test against this one
host2: ndbd, MySQLD <- For master replication
host3: ndb_mgmd, ndbd

host4: MySQLD <- For slave replication
host5: ndbd
host6: ndb_mgmd, ndbd

1) Start replication
2) On host 1 load the database
host1$>perl load_tpcb.pl --sock
3) Once tables are created, loaded and replicated login to the slave mysqld and alter the tables in the TPCB database.
mysql> alter table account engine=myisam;
Query OK, 100000 rows affected (2.72 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> alter table branch engine=myisam;
Query OK, 10000 rows affected (1.36 sec)
Records: 10000  Duplicates: 0  Warnings: 0

mysql> alter table teller engine=myisam;
Query OK, 20000 rows affected (1.63 sec)
Records: 20000  Duplicates: 0  Warnings: 0

mysql> alter table history engine=myisam;
Query OK, 0 rows affected (1.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table trans engine=myisam;
Query OK, 0 rows affected (1.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table sync engine=myisam;
Query OK, 0 rows affected (1.37 sec)
Records: 0  Duplicates: 0  Warnings: 0

4) On host1 start the tpcb test
perl tpcb_driver.pl -ho host1 -u root --sock

5) Show slave status\G on the slave
Last_Errno: 134
Last_Error: Error in Write_rows event: error during transaction execution on table TPCB.trans

Suggested fix:
Not a suggested fix, but the mysql-test I tried:

-- source include/have_ndb.inc
-- source include/master-slave.inc

connection master;
--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_query_log

CREATE TABLE t1 (id MEDIUMINT NOT NULL, b1 BIT(8), vc VARCHAR(255),
                 bc CHAR(255), d DECIMAL(10,4) DEFAULT 0,
                 f FLOAT DEFAULT 0, total BIGINT UNSIGNED,
                 y YEAR, t DATE,PRIMARY KEY(id))ENGINE=NDB;

sync_slave_with_master;
connection slave;
SHOW create TABLE t1;
ALTER TABLE t1 ENGINE=myisam;
SHOW create TABLE t1;
connection master;

--source include/rpl_multi_engine3.inc

DROP TABLE IF EXISTS t1;

- But the TPC-B test only does select, inserts and deletes. In addition the
Last_error is in Write_rows

+ But the TPC-B test only does select, inserts and updates, no deletes. In addition the
Last_error is in Write_rows

I think I have this problem too. Does anyone know of any way of arranging tables to avoid this problem? I am replicating from ndb tables to MyISAM tables, and getting this same error. Any other suggestions to work around the problem?

I haven't managed to find a work around, so I have set up a cluster on the slave. I would very much rather use MyISAM on the slave. Is anyone trying to fix this bug? I will set up test system to repeat this bug and give access to MySQL, but I cannot do it for about 3 weeks as we are waiting for some new computers to arrive. Any info/work arounds will be gratefully received. Thanks, Jason

On the surface, it looks like this is related to BUG#22583 and BUG#22550.

The error code 134 is an internal error code produced by the storage engine: the code 1105 is a proper external code.

What makes this strange is that you get the same error with InnoDB.

It would be a great help if the result of SHOW CREATE TABLE was added (after doing the ALTER TABLE), just to see that the table definitions are proper.

Your wish is my command O guru :-)

Database changed
mysql> alter table account engine=myisam;
Query OK, 100000 rows affected (5.25 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE account;
+---------+---------------------------------------------
--------------------------------------------------------
| Table   | Create Table

+---------+---------------------------------------------
--------------------------------------------------------
| account | CREATE TABLE `account` (
  `aid` int(11) NOT NULL DEFAULT '0',
  `bid` int(11) DEFAULT NULL,
  `balance` decimal(8,2) DEFAULT NULL,
  `filler` char(80) DEFAULT NULL,
  PRIMARY KEY (`aid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+---------+---------------------------------------------
--------------------------------------------------------
1 row in set (0.00 sec)

mysql>

mysql> alter table branch engine=innodb;
Query OK, 10000 rows affected (1.97 sec)
Records: 10000  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE branch;
+--------+------------------------------------------
----------------------------------------------------
| Table  | Create Table

+--------+------------------------------------------
----------------------------------------------------
| branch | CREATE TABLE `branch` (
  `bid` int(11) NOT NULL DEFAULT '0',
  `balance` decimal(8,2) DEFAULT NULL,
  `filler` char(80) DEFAULT NULL,
  PRIMARY KEY (`bid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+--------+------------------------------------------
----------------------------------------------------
1 row in set (0.00 sec)

mysql>

A little further testing today showed this:

On Master:
mysql> update account set balance = 3.00 where aid = 1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

On slave (MyISAM):

mysql> select balance from account where aid = 1;
ERROR 1030 (HY000): Got error 134 from storage engine
mysql> select balance from account where aid = 2;
+---------+
| balance |
+---------+
|    0.00 |
+---------+
1 row in set (0.00 sec)

On master:
mysql> select balance from branch where bid = 1;
+---------+
| balance |
+---------+
|    3.00 |
+---------+
1 row in set (0.00 sec)

On Slave: (InnoDB) Not able to repeat by hand the 134:

mysql> select balance from branch where bid = 1;
+---------+
| balance |
+---------+
|    3.00 |
+---------+
1 row in set (0.00 sec)

So now I try to set back the one in the myISAM table.

On master:

mysql> update account set balance = 0 where aid = 1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

On Slave: Slave bombs due to getting the 134 in trying to read the record:
  Slave_IO_Running: Yes
 Slave_SQL_Running: No

Last_Errno: 134
Last_Error: Error in Write_rows event: error during transaction execution on table TPCB.account

Hope this helps
/jeb

How to repeat:
I have tried to recreate this using mysql-test, but the test passes and does not
fail.

Create cluster replication as follows

Master
host1: MySQLD <- Test against this one
host2: ndbd, MySQLD <- For master replication
host3: ndb_mgmd, ndbd

host4: MySQLD <- For slave replication
host5: ndbd
host6: ndb_mgmd, ndbd

1) Start replication
2) On host 1 load the database
host1$>perl load_tpcb.pl --sock
3) Once tables are created, loaded and replicated login to the slave mysqld and
alter the tables in the TPCB database.
mysql> alter table account engine=myisam;
Query OK, 100000 rows affected (2.72 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> alter table branch engine=myisam;
Query OK, 10000 rows affected (1.36 sec)
Records: 10000  Duplicates: 0  Warnings: 0

mysql> alter table teller engine=myisam;
Query OK, 20000 rows affected (1.63 sec)
Records: 20000  Duplicates: 0  Warnings: 0

mysql> alter table history engine=myisam;
Query OK, 0 rows affected (1.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table trans engine=myisam;
Query OK, 0 rows affected (1.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table sync engine=myisam;
Query OK, 0 rows affected (1.37 sec)
Records: 0  Duplicates: 0  Warnings: 0

4) On host1 start the tpcb test
perl tpcb_driver.pl -ho host1 -u root --sock

5) Show slave status\G on the slave
Last_Errno: 134
Last_Error: Error in Write_rows event: error during transaction execution on
table TPCB.trans

070125 14:53:03  mysqld started
070125 14:53:04  InnoDB: Started; log sequence number 0 46409
070125 14:53:04 [Note] Starting MySQL Cluster Binlog Thread
070125 14:53:04 [Note] /home/ndbdev/jmiller/builds/libexec/mysqld: ready for connection
s.
Version: '5.1.15-beta-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
070125 14:53:04 [Note] SCHEDULER: Loaded 0 events
070125 14:53:59 [Note] Slave SQL thread initialized, starting replication in log 'FIRST
' at position 0, relay log './ndb12-relay-bin.000001' position: 4
070125 14:53:59 [Note] Slave I/O thread: connected to master 'rep@ndb09:3306',replicati
on started in log 'FIRST' at position 4
070125 15:04:11 [ERROR] Slave: Error in Write_rows event: row application failed, Error
_code: 134
070125 15:04:11 [ERROR] Slave: Error in Write_rows event: error during transaction exec
ution on table TPCB.account, Error_code: 134
070125 15:04:11 [ERROR] Slave (additional info): Unknown error Error_code: 1105
070125 15:04:11 [Warning] Slave: Unknown error Error_code: 1105
070125 15:04:11 [Warning] Slave: Unknown error Error_code: 1105
070125 15:04:11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem,
 and restart the slave SQL thread with "SLAVE START". We stopped at log 'ndb09.000001'
position 3724714

Here is an error trace of the same failure but with a patched code which prints slightly more precise error messages:
------------------------------------------------------------------------
070126 18:20:23 [Note] Slave SQL thread initialized, starting replication in log
                'FIRST' at position 0, relay log './ndb12-relay-bin.000001'
                position: 4
070126 18:20:23 [Note] Slave I/O thread: Connected to master 'rep@ndb09:3306',
                Reading log 'FIRST' from position 4
<...>
070126 18:49:24 [ERROR] Slave: replace_record: error in rnd_pos() method of
                MyISAM handler. Error_code: 134
070126 18:49:24 [ERROR] Slave SQL thread: Error detected when executing event at
                pos 3860317 in ./ndb12-relay-bin.000006.
070126 18:49:24 [Note] Slave SQL thread exiting, replication stopped in log
                'ndb09.000003' at position 3849944
------------------------------------------------------------------------

Note: perror 134: "MySQL error code 134: Record was already deleted (or record 
file crashed)"

Note: The offending event is a Write_rows event in a long sequence of such events.

The error happened inside replace_record() fnuction called when Write_rows event is executed. This function first tries to inserd record using ha_write_row. If it fails, then get_dup_key() is called and then rnd_pos() which tries to locate the conflicting row. This is where the error is detected.

I reproduced the problem in 5.1.15 tree.

Waiting for fix to BUG#22583 to be pushed to see if it solves the problem.

mysql-test/include/rpl_multi_engine3.inc

Attachment: rpl_multi_engine3.inc (application/octet-stream, text), 2.36 KiB.

mysql-test/extra/rpl_tests/rpl_ndb_2multi_eng.test

Attachment: rpl_ndb_2multi_eng.test (application/octet-stream, text), 10.12 KiB.

NOTE: 

rm ./r/rpl_ndb_2myisam.result
touch ./r/rpl_ndb_2myisam.result

replace current files with ones attached to bug report

./mysql-test-run.pl --force --do-test=rpl_ndb_2m --mysqld=--binlog-format=row --ndb-extra-test

Slave's binlog from the test run on recent rpl tree

Attachment: slave_tpcb.sql (application/octet-stream, text), 123.00 KiB.

Currently in the telco tree the MyISAM till get the same results:

070417 23:18:26 [Warning] Slave: Unknown error Error_code: 1105
070417 23:18:26 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 8383070417 23:18:58 [Note] Slave: received end packet from server, apparent master shutdown:
070417 23:18:58 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'master-bin.000001' position 8876
070417 23:18:58 [ERROR] Slave I/O thread: error reconnecting to master 'root@127.0.0.1:9306':                     Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 111'  errno: 2013  retry-time: 1  retries: 10
070417 23:18:58 [Note] /data1/mysql-5.1-telco/sql/mysqld: Normal shutdown

070417 23:18:58 [Note] Event Scheduler: Purging the queue. 0 events
070417 23:18:58 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
070417 23:18:58 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 8876
070417 23:18:58 [Note] Stopping Cluster Binlog
070417 23:18:58 [Note] Stopping Cluster Utility thread
070417 23:19:00 [Note] /data1/mysql-5.1-telco/sql/mysqld: Shutdown complete

So I am guessing that the patch for the bit field has not been applied to this tree as of yet.

But, the Innodb has much different results:

*** 372,378 ****
  --- Check Update on slave ---
  SELECT id,hex(b1),vc,bc,d,f,total,y,t FROM t1 WHERE id = 412;
  id    hex(b1) vc      bc      d       f       total   y       t
! 412   0       Testing MySQL databases is a cool       Must make it bug free for the customer 654321.4321      15.21   0       1965    2006-02-22
  --- Remove a record from t1 on master ---
  DELETE FROM t1 WHERE id = 42;
  --- Show current count on master for t1 ---
--- 372,378 ----
  --- Check Update on slave ---
  SELECT id,hex(b1),vc,bc,d,f,total,y,t FROM t1 WHERE id = 412;
  id    hex(b1) vc      bc      d       f       total   y       t
! 412   0       NULL    NULL    NULL    NULL    0       NULL    2006-02-22
  --- Remove a record from t1 on master ---
  DELETE FROM t1 WHERE id = 42;
  --- Show current count on master for t1 ---
***************
*** 382,388 ****
  --- Show current count on slave for t1 ---
  SELECT COUNT(*) FROM t1;
  COUNT(*)
! 4
  DELETE FROM t1;
  --- End test 5 key partition testing ---
  --- Do Cleanup ---
--- 382,388 ----
  --- Show current count on slave for t1 ---
  SELECT COUNT(*) FROM t1;
  COUNT(*)
! 5
  DELETE FROM t1;
  --- End test 5 key partition testing ---
  --- Do Cleanup ---

Seems that some of the data is getting messed up on the update and the delete does not go through at all.

Replace existing in /extra/rpl_tests/

Attachment: rpl_ndb_2multi_eng.test (application/octet-stream, text), 10.52 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/26170

ChangeSet@1.2575, 2007-05-05 13:35:44+02:00, rafal@quant.(none) +10 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 
  134 using TPC-B):
  
  This is a preliminary patch in preparation for the bug fix. The main 
  change is to make unpack_row() function non-destructive. That is, if 
  a column is not present in the row it will be left as it is in 
  the record to which we unpack (table->record[0]). If a caller of 
  unpack_row() wants the missing columns to be initialized with default
  values, it must do it itself. Function prepare_record() is added for
  that purpose.
  
  Other changes in this changeset:
  
  - Change signature of unpack_row(): don't report errors and don't
    setup table's rw_set here.
  
  - In Rows_log_event and derived classes, don't pass arguments to
    the execution primitives (do_...() member functions) but use class
    members instead.
  
  - Factor-out code used for opening tables in a Rows event to
    a separate method open_and_lock_tables().
  
  - Change the way errors are reported when filling fields with default
    values. Now user can see correct error number in SHOW SLAVE STATUS
    output.
  
  - The changes seems to fix rpl_ndb_extraCol test. Before it produced
    results different than the same test run on other storage engines.
    Now the results are identical. The result file is updated
    accordingly.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27099

ChangeSet@1.2577, 2007-05-21 21:23:19+02:00, rafal@quant.(none) +9 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 
  using TPC-B):
  
  This patch implements solution b described in the bug report with some
  modifications. Main modifications are:
  
  - make replace_record() function a method of Rows_log_event as an 
    instance of this class contains most of the data needed by the function.
  - make similar modifications to find_and_fetch_row() function. Also in
    this case row data is unpacked inside the function.
  - make modified versions of rpl_ndb_2xxx tests so that they work in
    the current tree and test that modified code correctly handles 
    ndb->other replication.

Hi,

I was watching push build today (5.1-telco) due to pushing in test changes I noticed that
rpl_ndb_mix_innodb.test has been failing for a while now. The test fails with "could not
sync with master ('select master_pos_wait('master-bin.000001', 228163)' returned NULL)"

Looking at it, the cause of the slave not syncing was due to the slave failing  with:

070607 22:13:52 [ERROR] Slave: Error in Update_rows event: row application failed,
Error_code: 0
070607 22:13:52 [ERROR] Slave: Error in Update_rows event: error during transaction
execution on table tpcb.branch, Error_code: 1105
070607 22:13:52 [Warning] Slave: Got error 4350 'Transaction already aborted' from NDB
Error_code: 1296
070607 22:13:52 [Warning] Slave: Unknown error Error_code: 1105
070607 22:13:52 [Warning] Slave: Unknown error Error_code: 1105

Also added to http://bugs.mysql.com/bug.php?id=27979

This is the patch from 5 May but updated to reflect the current rpl tree. As before, this is only preparation for the real bug fix which will appear here later.

<http://lists.mysql.com/commits/30211>

A new version of the patch fixing the bug can be found here <http://lists.mysql.com/commits/30319>. It addresses concerns raised during first review. Note that the patch should be applied over the preliminary patch <http://lists.mysql.com/commits/30211>.

A separate patch with tests should follow shortly.

Currently, writing tests which test replication ndb->xxx is not possible because of BUG#29569. Any test which would check that this patch works correctly must be postponed until that bug is fixed.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30441

ChangeSet@1.2530, 2007-07-06 16:58:18+02:00, rafal@quant.(none) +8 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 
  using TPC-B):
     
  Problem: A RBR event can contain incomplete row data (only key value and
  fields which have been changed). In that case, when the row is unpacked
  into record and written to a table, the missing fields get incorrect NULL
  values leading to master-slave inconsistency.
     
  Solution: Use values found in slave's table for columns which are not given
  in the rows event. The code for writing a single row uses the following 
  algorithm: 
  
  1. unpack row_data into table->record[0],
  2. try to insert record,
  3. if duplicate record found, fetch it into table->record[1],
  4. unpack row_data into table->record[1],
  5. write table->record[1] into the table.
    
  Where row_data is the row as stored in the data area of a rows event. 
  Thus:
    
  a) unpacking of row_data happens at the time when row is written into 
     a table,
    
  b) when unpacking (in step 4), only columns present in row_data are 
     overwritten - all other columns remain as they were found in the table.
     
  Since all data needed for the above algorithm is stored inside 
  Rows_log_event class, functions which locate and write rows are turned 
  into methods of that class.
  
  replace_record()     -> Rows_log_event::write_row()
  find_and_fetch_row() -> Rows_log_event::find_and_fetch_row()
    
  Both methods take row data from event's data buffer - the row being 
  processed is pointed by m_curr_row. They unpack the data as needed into 
  table's record buffers record[0] or record[1]. When row is unpacked, 
  m_curr_row_end is set to point at next row in the data buffer.
  
  Other changes introduced in this changeset:
  
  - Change signature of unpack_row(): don't report errors and don't
    setup table's rw_set here. Errors can happen only when setting default 
    values in prepare_record() function and are detected there.
   
  - In Rows_log_event and derived classes, don't pass arguments to
    the execution primitives (do_...() member functions) but use class
    members instead.
  
  - The changes seem to fix rpl_ndb_extraCol test. Before it produced
    results different than the same test run on other storage engines.
    Now the results are identical. The result file is updated
    accordingly.

A new version of the patch has been commited: <http://lists.mysql.com/commits/30441>. Due to the popular demand, the two patches are now merged into one. A previous commit error causing a lot of spurious changes to be present in the pre-patch is now fixed. Patch comments have been improved. Also, any changes/re-factoring which is not essential was removed.

Patches applied and initially reviewed. Waiting for additional assistance in reproducing the problem and verifying solution.

The latest patch is reveiwed and some suggestion are mailed.
Particularly, the suggestion to extend bitmap library with a function the current patch would benefit from.

I will add 2 more patches for this bug. First will fix a problem I found with initializing write_set inside Rows_log events. Second will add some test cases which verify that the replication works correct after the fix.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32767

ChangeSet@1.2531, 2007-08-20 19:11:30+02:00, rafal@quant.(none) +1 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B):
  
  This patch fixes the way write_set is initialized inside Rows_log_events 
  when there are extra columns on slave. Previously the extra columns were
  included in the write_set which is wrong. Now they are not included, as is
  the case in the original source tree.
  
  To correctly handle master/slave record width differences, the m_cols 
  bitmap sent in Rows_log_event should have correct width equal to the 
  number of columns on master. This was not the case because the witdth
  of the bitmap was rounded to nearest multiply of 8. The patch fixes this
  by removing width rounding.

A patch with test cases is comming soon.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32792

ChangeSet@1.2532, 2007-08-21 09:45:58+02:00, rafal@quant.(none) +6 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 
  using TPC-B):
  
  This patch introduces test rpl_ndb_2other which tests basic replication 
  from master using ndb tables to slave storing the same tables using 
  (possibly) different engine (myisam,innodb).
  
  Test is based on existing tests rpl_ndb_2myisam and rpl_ndb_2innodb. 
  However, these tests doesn't work for various reasons and currently are 
  disabled (see BUG#19227).
  
  The new test differs from the ones it is based on as follows:
  
  1. Single test tests replication with different storage engines on slave 
  (myisam, innodb, ndb).
  
  2. Include file extra/rpl_tests/rpl_ndb_2multi_eng.test containing 
  original tests is replaced by extra/rpl_tests/rpl_ndb_2multi_basic.test 
  which doesn't contain tests using partitioned tables as these don't work 
  currently. Instead, it tests replication to a slave which has more 
  columns than master.
  
  3. Include file include/rpl_multi_engine3.inc is replaced with 
  include/rpl_multi_engine2.inc. The later differs by performing slightly 
  different operations (updating more than one row in the table) and 
  clearing table with "TRUNCATE TABLE" statement instead of "DELETE FROM" 
  as replication of "DELETE" doesn't work well in this setting.
  
  4. Slave must use option --log-slave-updates=0 as otherwise execution of 
  replication events generated by ndb fails if table uses a different 
  storage engine on slave (see BUG#29569).

Results of rpl_ndb_2other test

Attachment: rpl_ndb_2other.log (application/octet-stream, text), 8.02 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33023

ChangeSet@1.2569, 2007-08-24 15:05:54+02:00, rafal@quant.(none) +7 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 
  using TPC-B):
   
  Problem: A RBR event can contain incomplete row data (only key value and
  fields which have been changed). In that case, when the row is unpacked
  into record and written to a table, the missing fields get incorrect NULL
  values leading to master-slave inconsistency.
   
  Solution: Use values found in slave's table for columns which are not given
  in the rows event. The code for writing a single row uses the following 
  algorithm: 
  
  1. unpack row_data into table->record[0],
  2. try to insert record,
  3. if duplicate record found, fetch it into table->record[1],
  4. unpack row_data into table->record[1],
  5. write table->record[1] into the table.
  
  Where row_data is the row as stored in the data area of a rows event. 
  Thus:
  
  a) unpacking of row_data happens at the time when row is written into 
   a table,
  
  b) when unpacking (in step 4), only columns present in row_data are 
   overwritten - all other columns remain as they were found in the table.
   
  Since all data needed for the above algorithm is stored inside 
  Rows_log_event class, functions which locate and write rows are turned 
  into methods of that class.
  
  replace_record()     -> Rows_log_event::write_row()
  find_and_fetch_row() -> Rows_log_event::find_and_fetch_row()
  
  Both methods take row data from event's data buffer - the row being 
  processed is pointed by m_curr_row. They unpack the data as needed into 
  table's record buffers record[0] or record[1]. When row is unpacked, 
  m_curr_row_end is set to point at next row in the data buffer.
  
  Other changes introduced in this changeset:
  
  - Change signature of unpack_row(): don't report errors and don't
  setup table's rw_set here. Errors can happen only when setting default 
  values in prepare_record() function and are detected there.
   
  - In Rows_log_event and derived classes, don't pass arguments to
  the execution primitives (do_...() member functions) but use class
  members instead.
  
  - Move old row handling code into log_event_old.cc to be used by 
  *_rows_log_event_old classes.

The last patch contains the same changes as already introduced by previous patches (<http://lists.mysql.com/commits/30441> from 6 Jul and <http://lists.mysql.com/commits/32767> from 20 Aug) but this time applied against
a fresh 5.1-targer-5.1.22 tree. A separate patch with changes requested by reviewers will follow + another patch with the test case.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33046

ChangeSet@1.2570, 2007-08-24 19:58:22+02:00, rafal@quant.(none) +6 -0
  BUG#21842: 
  
  This patch contains changes needed to support replication for a table
  which has extra columns on master as introduced by WL#3228 (before only
  extra slave-side columns were supported). It also contains some 
  improvements suggested by reviewers.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33098

ChangeSet@1.2571, 2007-08-25 13:16:43+02:00, rafal@quant.(none) +6 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 
  using TPC-B):
    
  (It is adaptation of a patch prepared before for 5.1-new-rpl tree to
   5.1-target-5.1.22) 
  
  This patch introduces test rpl_ndb_2other which tests basic replication 
  from master using ndb tables to slave storing the same tables using 
  (possibly) different engine (myisam,innodb).
    
  Test is based on existing tests rpl_ndb_2myisam and rpl_ndb_2innodb. 
  However, these tests doesn't work for various reasons and currently are 
  disabled (see BUG#19227).
    
  The new test differs from the ones it is based on as follows:
    
  1. Single test tests replication with different storage engines on slave 
  (myisam, innodb, ndb).
    
  2. Include file extra/rpl_tests/rpl_ndb_2multi_eng.test containing 
  original tests is replaced by extra/rpl_tests/rpl_ndb_2multi_basic.test 
  which doesn't contain tests using partitioned tables as these don't work 
  currently. Instead, it tests replication to a slave which has more or 
  less columns than master.
    
  3. Include file include/rpl_multi_engine3.inc is replaced with 
  include/rpl_multi_engine2.inc. The later differs by performing slightly 
  different operations (updating more than one row in the table) and 
  clearing table with "TRUNCATE TABLE" statement instead of "DELETE FROM" 
  as replication of "DELETE" doesn't work well in this setting.
    
  4. Slave must use option --log-slave-updates=0 as otherwise execution of 
  replication events generated by ndb fails if table uses a different 
  storage engine on slave (see BUG#29569).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33118

ChangeSet@1.2569, 2007-08-26 14:31:10+02:00, rafal@quant.(none) +13 -0
  BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 
  using TPC-B):
   
  Problem: A RBR event can contain incomplete row data (only key value and
  fields which have been changed). In that case, when the row is unpacked
  into record and written to a table, the missing fields get incorrect NULL
  values leading to master-slave inconsistency.
   
  Solution: Use values found in slave's table for columns which are not given
  in the rows event. The code for writing a single row uses the following 
  algorithm: 
  
  1. unpack row_data into table->record[0],
  2. try to insert record,
  3. if duplicate record found, fetch it into table->record[0],
  4. unpack row_data into table->record[0],
  5. write table->record[0] into the table.
  
  Where row_data is the row as stored in the data area of a rows event. 
  Thus:
  
  a) unpacking of row_data happens at the time when row is written into 
   a table,
  
  b) when unpacking (in step 4), only columns present in row_data are 
   overwritten - all other columns remain as they were found in the table.
   
  Since all data needed for the above algorithm is stored inside 
  Rows_log_event class, functions which locate and write rows are turned 
  into methods of that class.
  
  replace_record()     -> Rows_log_event::write_row()
  find_and_fetch_row() -> Rows_log_event::find_row()
  
  Both methods take row data from event's data buffer - the row being 
  processed is pointed by m_curr_row. They unpack the data as needed into 
  table's record buffers record[0] or record[1]. When row is unpacked, 
  m_curr_row_end is set to point at next row in the data buffer.
  
  Other changes introduced in this changeset:
  
  - Change signature of unpack_row(): don't report errors and don't
  setup table's rw_set here. Errors can happen only when setting default 
  values in prepare_record() function and are detected there.
   
  - In Rows_log_event and derived classes, don't pass arguments to
  the execution primitives (do_...() member functions) but use class
  members instead.
  
  - Move old row handling code into log_event_old.cc to be used by 
  *_rows_log_event_old classes.
  
  Also, a new test rpl_ndb_2other is added which tests basic replication 
  from master using ndb tables to slave storing the same tables using 
  (possibly) different engine (myisam,innodb).
    
  Test is based on existing tests rpl_ndb_2myisam and rpl_ndb_2innodb. 
  However, these tests doesn't work for various reasons and currently are 
  disabled (see BUG#19227).
    
  The new test differs from the ones it is based on as follows:
    
  1. Single test tests replication with different storage engines on slave 
  (myisam, innodb, ndb).
    
  2. Include file extra/rpl_tests/rpl_ndb_2multi_eng.test containing 
  original tests is replaced by extra/rpl_tests/rpl_ndb_2multi_basic.test 
  which doesn't contain tests using partitioned tables as these don't work 
  currently. Instead, it tests replication to a slave which has more or 
  less columns than master.
    
  3. Include file include/rpl_multi_engine3.inc is replaced with 
  include/rpl_multi_engine2.inc. The later differs by performing slightly 
  different operations (updating more than one row in the table) and 
  clearing table with "TRUNCATE TABLE" statement instead of "DELETE FROM" 
  as replication of "DELETE" doesn't work well in this setting.
    
  4. Slave must use option --log-slave-updates=0 as otherwise execution of 
  replication events generated by ndb fails if table uses a different 
  storage engine on slave (see BUG#29569).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33166

ChangeSet@1.2570, 2007-08-27 20:22:04+02:00, rafal@quant.(none) +1 -0
  BUG#21842: There was an inconsistency in the use of table->record[0] and 
  table->record[1] buffers inside Rows_log_event::find_row() function. 
  The patch fixes this.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33195

ChangeSet@1.2571, 2007-08-28 09:20:51+02:00, rafal@quant.(none) +3 -0
  BUG#21842: Exclude Rows_log_event members used in event application if 
  not compiled as a replication server - a fix from rpl clone now applied
  to 5.1.22 tree.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33196

ChangeSet@1.2571, 2007-08-28 10:14:45+02:00, rafal@quant.(none) +2 -0
  BUG#21842: Exclude Rows_log_event members used in event application if 
  not compiled as a replication server - a fix from rpl clone now applied
  to 5.1.22 tree.

Pushed into 5.1-target-5.1.22 and 5.1-new-rpl trees.

The problems mentioned by Antony are related to big/low-endian issues in replication. These are reported in BUG#29549. The suspicious code from rpl_utility.cc comes from WL#3328 and is now reported as BUG#30790.

This patch was never concerned with endianess issues. It only solves the problem of setting default/existing values for columns which are not present in Write_rows events. Its correctness was confirmed by reviewers and the rpl_ndb_2other test which passes unless run on big-endian machine where the other problems manifest themselves. 

Note that the endianess problems are now detected because only now the replication code is mature enough to try NDB -> non-NDB replication. Before, such setting caused slave to crash hopelessly, which was the original reason for reporting this bug.

Pushed into 5.1.23-beta

Pushed into 5.1.24-rc

Pushed into 6.0.5-alpha

Bugfix documented in the 5.1.23 and 6.0.5 changelogs.