Bug #29549 Endians: rpl_ndb_myisam2ndb,rpl_ndb_innodb2ndb and rpl_ndb_mix_innodb failed on
Submitted: 4 Jul 2007 12:44 Modified: 28 Nov 2007 10:01
Reporter: justin he Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S1 (Critical)
Version:5.1.21-beta OS:Solaris
Assigned to: Mats Kindahl CPU Architecture:Any
Tags: mysql-5.1-new-rpl

[4 Jul 2007 12:44] justin he
Description:
on up-to-date mysql-5.1-new-rpl, on Solaris sparc machine

How to repeat:
1) run the testcase rpl_ndb_myisam2ndb on Solaris
2) would find sql thread crash because of assertion failed in sql/log_event.cc, Rows_log_event::do_apply_event(RELAY_LOG_INFO const *rli)
in while loop: 
DBUG_ASSERT(row_end <= (const char*)m_rows_end) //failed
[7 Jul 2007 10:55] Sveta Smirnova
Thank you for the report.

Verified as described.
[10 Jul 2007 7:21] Guangbao Ni
Bug#28123 is a duplicate of this.
[10 Jul 2007 8:39] Andrei Elkin
The problem stems from that field::unpack deals with "local" (read from the table) and "remote" (read from replication event) instances. In the latter case endianess matters and that with combination of storages having different assumptions on endianess leads to the bug.
[10 Jul 2007 11:41] Lars Thalmann
Perhaps related to WL#3228.
[24 Aug 2007 8:42] Tomas Ulin
proposed patch

Attachment: tmp2.patch (text/x-patch), 822 bytes.

[27 Aug 2007 0:39] Guangbao Ni
after patching, rpl_ndb_mix_innodb still fail 
./mysql-test-run.pl --do-test=rpl_ndb_mix_innodb
=====================================

Attachment: rpl_ndb_mix_innodb.tar.gz (application/x-gzip, text), 23.68 KiB.

[27 Aug 2007 0:39] Guangbao Ni
mysql-test-run: WARNING: Forcing kill of process 11629
rpl_ndb.rpl_ndb_mix_innodb     [ fail ]

Errors are (from /export/home/ngb/target-5.1.22/mysql-test/var/log/mysqltest-time) :
mysqltest: In included file "./extra/rpl_tests/rpl_ndb_apply_status.test": At line 282: command "diff_files" failed with error 1
(the last lines may be the most important ones)

Aborting: rpl_ndb.rpl_ndb_mix_innodb failed in default mode. To continue, re-run with '--force'.
Stopping All Servers
mysql-test-run: WARNING: Forcing kill of process 11657
[27 Aug 2007 0:45] Guangbao Ni
after patching, rpl_ndb_innodb2ndb fails, log attached

Attachment: rpl_ndb_innodb2ndb.tar.gz (application/x-gzip, text), 2.68 KiB.

[27 Aug 2007 13:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33151

ChangeSet@1.2577, 2007-08-27 15:40:49+02:00, tomas@whalegate.ndb.mysql.com +2 -0
  Bug#29549 rpl_ndb_myisam2ndb,rpl_ndb_innodb2ndb failed on Solaris for pack_length issue
[28 Aug 2007 1:15] Guangbao Ni
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

mysql-test-run: WARNING: Forcing kill of process 22526
mysql-test-run: WARNING: Forcing kill of process 22534
rpl_ndb.rpl_ndb_mix_innodb     [ fail ]

Errors are (from /export/home/ngb/target_solaris/mysql-test/var/log/mysqltest-time) :
mysqltest: In included file "./extra/rpl_tests/rpl_ndb_apply_status.test": At line 282: command "diff_files" failed with error 1
(the last lines may be the most important ones)

Aborting: rpl_ndb.rpl_ndb_mix_innodb failed in default mode. To continue, re-run with '--force'.
Stopping All Servers
mysql-test-run: WARNING: Forcing kill of process 22550

From the file diff_files, it seems that the datatime and some other types can't match. So i think it is related to the bug#30024, bug#30133 and bug#30134
[28 Aug 2007 1:20] Guangbao Ni
the two different result files

Attachment: diff_files.tar.gz (application/x-gzip, text), 6.27 KiB.

[28 Aug 2007 5:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33192

ChangeSet@1.2582, 2007-08-28 07:42:43+02:00, tomas@whalegate.ndb.mysql.com +2 -0
  Bug#29549 rpl_ndb_myisam2ndb,rpl_ndb_innodb2ndb failed on Solaris for pack_length issue
[28 Aug 2007 10:08] Tomas Ulin
it seems endian is an even bigger problem than first discovered...
[29 Aug 2007 3:14] Guangbao Ni
The following is the smallest test case for rpl_ndb_mix_innodb.test
connection master;
--disable_warnings
DROP DATABASE IF EXISTS tpcb;
--enable_warnings
CREATE DATABASE tpcb;

CREATE TABLE tpcb.history (id MEDIUMINT NOT NULL AUTO_INCREMENT,aid INT,
                           tid INT, bid INT,  amount DECIMAL(10,2),
                           tdate DATETIME, teller CHAR(20), uuidf LONGBLOB,
                           filler CHAR(80),PRIMARY KEY (id));

# Switch tables on slave to use NDB
--sync_slave_with_master
USE tpcb;
ALTER TABLE history ENGINE NDB;

# Load DB tpcb and run some transactions
connection master;
USE tpcb;

INSERT INTO tpcb.history VALUES(NULL,77,6,10,'1.00', NOW(), USER(),
                             UUID(),'completed trans');
--sync_slave_with_master

--echo *** DUMP MASTER & SLAVE FOR COMPARE ********

--exec $MYSQL_DUMP -n -t --compact --order-by-primary --skip-extended-insert tpcb
history > $MYSQLTEST_VARDIR/tmp/master_apply_status.sql

--exec $MYSQL_DUMP_SLAVE -n -t --compact --order-by-primary --skip-extended-insert
tpcb history > $MYSQLTEST_VARDIR/tmp/slave_apply_status.sql

--echo ****** Do dumps compare ************

diff_files $MYSQLTEST_VARDIR/tmp/master_apply_status.sql $MYSQLTEST_VARDIR/tmp/slav
e_apply_status.sql;

## Note: Ths files should only get removed, if the above diff succeeds.

--exec rm $MYSQLTEST_VARDIR/tmp/master_apply_status.sql
--exec rm $MYSQLTEST_VARDIR/tmp/slave_apply_status.sql

# End of 5.1 Test

There are the following conclusions:
1, if the master and the slave have the same engine, there is no any problem.
2.if NOW(), USER() and UUID() is replaced by concrete value, there is no any problem.

So, NOW(), USER() and UUID() functions affect the result between master(engine=innodb) and slave(engine=ndb).
[29 Aug 2007 7:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33272

ChangeSet@1.2589, 2007-08-29 09:44:37+02:00, tomas@whalegate.ndb.mysql.com +2 -0
  Bug#29549 rpl_ndb_myisam2ndb,rpl_ndb_innodb2ndb failed on Solaris for pack_length issue
  - reverting patch as there where unknows sideeffects that we do not have time to follow up on just now
[29 Aug 2007 11:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33294

ChangeSet@1.2556, 2007-08-29 13:24:24+02:00, mats@kindahl-laptop.dnsalias.net +1 -0
  BUG#29549 (rpl_ndb_circular.test and rpl_ndb_log.test fail):
  
  The fields slave_running, io_thd, and sql_thread are guarded by an
  associated run_lock. A read of these fields were not guarded inside
  terminate_slave_threads(), which caused an assertion to fire and
  potentially can cause a segmentation fault in some rare cases.
  
  This patch move the reading of the above variables to occur inside
  terminate_slave_thread() guarded by the associated run_lock.
[4 Sep 2007 11:27] Lars Thalmann
See also BUG#24231.
[4 Sep 2007 12:29] Rafal Somla
Note that the last patch belongs to BUG#29968 and landed here by mistake.
[4 Sep 2007 12:45] Rafal Somla
As noted by Tomas [29 Aug 9:50] this problem is not specific to blobs. Here
is a simple test which shows problems in replicating other types of fields:
-------------------------------------------------------------------------
--source include/have_ndb.inc
--source include/have_innodb.inc
--source include/have_binlog_format_mixed_or_row.inc
--source include/master-slave.inc

connection master;
SET storage_engine=ndb;

connection slave;
SET storage_engine=myisam;

connection master;
CREATE TABLE t1 (a INT, b FLOAT);
SHOW CREATE TABLE t1;

sync_slave_with_master;
SHOW CREATE TABLE t1;

connection master;
insert into t1 values (1,3.333);
select * from t1;

sync_slave_with_master;
select * from t1;
---------------------------------------------------------------------------

If this test is run on big-endian engine (solaris) wrong values are inserted 
into t1 on slave because myisam driver assumes values are written 
little-endian but receives them saved big-endian.

Here is a simpler version of the test which doesn't require replication 
setting
---------------------------------------------------------------------------
CREATE TABLE t1 (a INT, b FLOAT);
SHOW CREATE TABLE t1;

#794 11:12:25 server id 1  end_log_pos 394 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
#      15f 79 21 dd 46   13   01 00 00 00   2b 00 00 00   8a 01 00 00   00 00
#      172 16 00 00 00 00 00 00 00  04 74 65 73 74 00 02 74 |.........test..t|
#      182 31 00 02 03 04 01 04 03  04 74 65 73 74 00 02 74 |1.......|
#       Table_map: `test`.`t1` mapped to number 22
#794 11:12:25 server id 1  end_log_pos 453 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
#      18a 79 21 dd 46   17   01 00 00 00   3b 00 00 00   c5 01 00 00   00 00
#      19d 10 00 00 00 00 00 00 00  05 1f e0 00 00 00 01 00 |................|
#      1ad 00 00 00 00 00 00 28 00  00 00 00 00 00 00 00 00 |................|
#      1bd 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |........|
#       Write_rows: table id 16
#794 11:12:25 server id 1  end_log_pos 491 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
#      1c5 79 21 dd 46   17   01 00 00 00   26 00 00 00   eb 01 00 00   10 00
#      1d8 16 00 00 00 00 00 01 00  02 03 fc 00 00 00 01 40 |................|
#      1e8 55 4f df  |UO.|
#       Write_rows: table id 22 flags: STMT_END_F

BINLOG '
eSHdRhMBAAAAKwAAAIoBAAAAABYAAAAAAAAABHRlc3QAAnQxAAIDBAEEAw==
eSHdRhcBAAAAOwAAAMUBAAAAABAAAAAAAAAABR/gAAAAAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAAA
AAA=
eSHdRhcBAAAAJgAAAOsBAAAQABYAAAAAAAEAAgP8AAAAAUBVT98=
';

select * from t1;
---------------------------------------------------------------------------

The binlog entry is taken from the first test which was run on solaris. One 
can see that values in the last Write_rows event are written big-endian 
(e.g bytes "00 00 00 01" starting from pos 1e3 and encoding integer value 1). 
This test can be run on any machine as myisam is endian-independent and will 
always missinterpret these binlong entries.

For the same reasons, new test rpl_ndb_2other (introduced in 5.1.22) fails with wrong values if run on big-endian architecture.
[4 Sep 2007 17:12] Bugs System
Pushed into 5.1.23-beta
[6 Sep 2007 20:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33863

ChangeSet@1.2571, 2007-09-06 22:05:03+02:00, mats@kindahl-laptop.dnsalias.net +4 -0
  BUG#29549 (Endians: test failures on Solaris):
  
  Refactoring code to add parameter to pack() and unpack() functions with
  purpose of indicating if data should be packed in little-endian or
  native order. Using new functions to always pack data for binary log
  in little-endian order.
  
  Eliminating several versions of virtual pack() and unpack() functions
  in favor for one single virtual function which is overridden in subclasses.
[7 Sep 2007 9:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33894

ChangeSet@1.2571, 2007-09-07 11:54:08+02:00, mats@kindahl-laptop.dnsalias.net +12 -0
  BUG#29549 (Endians: test failures on Solaris):
  
  Refactoring code to add parameter to pack() and unpack() functions with
  purpose of indicating if data should be packed in little-endian or
  native order. Using new functions to always pack data for binary log
  in little-endian order.
  
  Eliminating several versions of virtual pack() and unpack() functions
  in favor for one single virtual function which is overridden in subclasses.
[14 Sep 2007 16:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34287

ChangeSet@1.2571, 2007-09-14 18:51:59+02:00, mats@kindahl-laptop.dnsalias.net +20 -0
  BUG#29549 (Endians: test failures on Solaris):
  
  Refactoring code to add parameter to pack() and unpack() functions with
  purpose of indicating if data should be packed in little-endian or
  native order. Using new functions to always pack data for binary log
  in little-endian order. The purpose of this refactoring is to allow
  proper implementation of endian-agnostic pack() and unpack() functions.
  
  Eliminating several versions of virtual pack() and unpack() functions
  in favor for one single virtual function which is overridden in
  subclasses.
  
  Implementing pack() and unpack() functions for some field types that
  packed data in native format regardless of the value of the
  st_table_share::db_low_byte_first flag.
[2 Oct 2007 15:37] Chuck Bell
Patch approved.
[4 Oct 2007 19:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34928

ChangeSet@1.2571, 2007-10-04 21:41:18+02:00, mats@kindahl-laptop.dnsalias.net +20 -0
  BUG#29549 (Endians: test failures on Solaris):
  
  Refactoring code to add parameter to pack() and unpack() functions with
  purpose of indicating if data should be packed in little-endian or
  native order. Using new functions to always pack data for binary log
  in little-endian order. The purpose of this refactoring is to allow
  proper implementation of endian-agnostic pack() and unpack() functions.
  
  Eliminating several versions of virtual pack() and unpack() functions
  in favor for one single virtual function which is overridden in
  subclasses.
  
  Implementing pack() and unpack() functions for some field types that
  packed data in native format regardless of the value of the
  st_table_share::db_low_byte_first flag.
  
  The field types that were packed in native format regardless are:
  Field_real, Field_decimal, Field_tiny, Field_short, Field_medium,
  Field_long, and Field_longlong.
  
  Before the patch, row-based logging wrote the rows incorrectly on
  big-endian machines where the storage engine defined its own
  low_byte_first() to be FALSE on big-endian machines (the default
  is TRUE), while little-endian machines wrote the fields in correct
  order. The only known storage engine that does this is NDB. In effect,
  this means that row-based replication from or to a big-endian
  machine where the table was using NDB as storage engine failed if the
  other engine was either non-NDB or on a little-endian machine.
  
  With this patch, row-based logging is now always done in little-endian
  order, while ORDER BY uses the native order if the storage engine
  defines low_byte_first() to return FALSE for big-endian machines.
[5 Oct 2007 16:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34993

ChangeSet@1.2571, 2007-10-05 18:16:11+02:00, mats@kindahl-laptop.dnsalias.net +20 -0
  BUG#29549 (Endians: test failures on Solaris):
  
  Refactoring code to add parameter to pack() and unpack() functions with
  purpose of indicating if data should be packed in little-endian or
  native order. Using new functions to always pack data for binary log
  in little-endian order. The purpose of this refactoring is to allow
  proper implementation of endian-agnostic pack() and unpack() functions.
  
  Eliminating several versions of virtual pack() and unpack() functions
  in favor for one single virtual function which is overridden in
  subclasses.
  
  Implementing pack() and unpack() functions for some field types that
  packed data in native format regardless of the value of the
  st_table_share::db_low_byte_first flag.
  
  The field types that were packed in native format regardless are:
  Field_real, Field_decimal, Field_tiny, Field_short, Field_medium,
  Field_long, and Field_longlong.
  
  Before the patch, row-based logging wrote the rows incorrectly on
  big-endian machines where the storage engine defined its own
  low_byte_first() to be FALSE on big-endian machines (the default
  is TRUE), while little-endian machines wrote the fields in correct
  order. The only known storage engine that does this is NDB. In effect,
  this means that row-based replication from or to a big-endian
  machine where the table was using NDB as storage engine failed if the
  other engine was either non-NDB or on a little-endian machine.
  
  With this patch, row-based logging is now always done in little-endian
  order, while ORDER BY uses the native order if the storage engine
  defines low_byte_first() to return FALSE for big-endian machines.
[9 Oct 2007 14:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35199

ChangeSet@1.2572, 2007-10-09 16:08:00+02:00, mats@kindahl-laptop.dnsalias.net +3 -0
  BUG#29549 (Endians: rpl_ndb_myisam2ndb,rpl_ndb_innodb2ndb and rpl_ndb_mix_innodb
  failed on):
  
  Adding Field::max_data_length() to give the maximum number of bytes that
  Field::pack() will write.
[11 Oct 2007 16:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35386

ChangeSet@1.2571, 2007-10-11 18:18:05+02:00, mats@kindahl-laptop.dnsalias.net +20 -0
  BUG#29549 (Endians: test failures on Solaris):
  
  Refactoring code to add parameter to pack() and unpack() functions with
  purpose of indicating if data should be packed in little-endian or
  native order. Using new functions to always pack data for binary log
  in little-endian order. The purpose of this refactoring is to allow
  proper implementation of endian-agnostic pack() and unpack() functions.
  
  Eliminating several versions of virtual pack() and unpack() functions
  in favor for one single virtual function which is overridden in
  subclasses.
  
  Implementing pack() and unpack() functions for some field types that
  packed data in native format regardless of the value of the
  st_table_share::db_low_byte_first flag.
  
  The field types that were packed in native format regardless are:
  Field_real, Field_decimal, Field_tiny, Field_short, Field_medium,
  Field_long, Field_longlong, and Field_blob.
  
  Before the patch, row-based logging wrote the rows incorrectly on
  big-endian machines where the storage engine defined its own
  low_byte_first() to be FALSE on big-endian machines (the default
  is TRUE), while little-endian machines wrote the fields in correct
  order. The only known storage engine that does this is NDB. In effect,
  this means that row-based replication from or to a big-endian
  machine where the table was using NDB as storage engine failed if the
  other engine was either non-NDB or on a little-endian machine.
  
  With this patch, row-based logging is now always done in little-endian
  order, while ORDER BY uses the native order if the storage engine
  defines low_byte_first() to return FALSE for big-endian machines.
  
  In addition, the max_data_length() function available in Field_blob
  was generalized to the entire Field hierarchy to give the maximum
  number of bytes that Field::pack() will write.
[12 Oct 2007 16:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35486

ChangeSet@1.2573, 2007-10-12 18:22:31+02:00, mats@kindahl-laptop.dnsalias.net +1 -0
  BUG#29549 (Endians: rpl_ndb_myisam2ndb,rpl_ndb_innodb2ndb and rpl_ndb_mix_innodb failed on):
  Post-merge fixes. Setting write bit before calling Field::store() since the function asserts that
  the write bit has been set.
[27 Nov 2007 10:51] Bugs System
Pushed into 5.1.23-rc
[27 Nov 2007 10:54] Bugs System
Pushed into 6.0.4-alpha
[28 Nov 2007 10:01] Jon Stephens
Documented bugfix in 5.1.23 and 6.0.4 changelogs.

Mats, thank you for good comments/summary of issue.