Bug #21842 | Cluster fails to replicate to innodb or myisam with err 134 using TPC-B | ||
---|---|---|---|
Submitted: | 25 Aug 2006 18:21 | Modified: | 4 Sep 2007 12:55 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S2 (Serious) |
Version: | 5.1.12, 5.1.15 | OS: | Linux (Linux 64bit ) |
Assigned to: | Rafal Somla | CPU Architecture: | Any |
[25 Aug 2006 18:21]
Jonathan Miller
[25 Aug 2006 18:23]
Jonathan Miller
- But the TPC-B test only does select, inserts and deletes. In addition the Last_error is in Write_rows + But the TPC-B test only does select, inserts and updates, no deletes. In addition the Last_error is in Write_rows
[1 Oct 2006 23:27]
Jason Downing
I think I have this problem too. Does anyone know of any way of arranging tables to avoid this problem? I am replicating from ndb tables to MyISAM tables, and getting this same error. Any other suggestions to work around the problem?
[2 Oct 2006 4:40]
Jason Downing
I haven't managed to find a work around, so I have set up a cluster on the slave. I would very much rather use MyISAM on the slave. Is anyone trying to fix this bug? I will set up test system to repeat this bug and give access to MySQL, but I cannot do it for about 3 weeks as we are waiting for some new computers to arrive. Any info/work arounds will be gratefully received. Thanks, Jason
[4 Oct 2006 7:01]
Mats Kindahl
On the surface, it looks like this is related to BUG#22583 and BUG#22550. The error code 134 is an internal error code produced by the storage engine: the code 1105 is a proper external code. What makes this strange is that you get the same error with InnoDB. It would be a great help if the result of SHOW CREATE TABLE was added (after doing the ALTER TABLE), just to see that the table definitions are proper.
[20 Oct 2006 13:53]
Jonathan Miller
Your wish is my command O guru :-) Database changed mysql> alter table account engine=myisam; Query OK, 100000 rows affected (5.25 sec) Records: 100000 Duplicates: 0 Warnings: 0 mysql> SHOW CREATE TABLE account; +---------+--------------------------------------------- -------------------------------------------------------- | Table | Create Table +---------+--------------------------------------------- -------------------------------------------------------- | account | CREATE TABLE `account` ( `aid` int(11) NOT NULL DEFAULT '0', `bid` int(11) DEFAULT NULL, `balance` decimal(8,2) DEFAULT NULL, `filler` char(80) DEFAULT NULL, PRIMARY KEY (`aid`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | +---------+--------------------------------------------- -------------------------------------------------------- 1 row in set (0.00 sec) mysql>
[20 Oct 2006 13:54]
Jonathan Miller
mysql> alter table branch engine=innodb; Query OK, 10000 rows affected (1.97 sec) Records: 10000 Duplicates: 0 Warnings: 0 mysql> SHOW CREATE TABLE branch; +--------+------------------------------------------ ---------------------------------------------------- | Table | Create Table +--------+------------------------------------------ ---------------------------------------------------- | branch | CREATE TABLE `branch` ( `bid` int(11) NOT NULL DEFAULT '0', `balance` decimal(8,2) DEFAULT NULL, `filler` char(80) DEFAULT NULL, PRIMARY KEY (`bid`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +--------+------------------------------------------ ---------------------------------------------------- 1 row in set (0.00 sec) mysql>
[20 Oct 2006 14:03]
Jonathan Miller
A little further testing today showed this: On Master: mysql> update account set balance = 3.00 where aid = 1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 On slave (MyISAM): mysql> select balance from account where aid = 1; ERROR 1030 (HY000): Got error 134 from storage engine mysql> select balance from account where aid = 2; +---------+ | balance | +---------+ | 0.00 | +---------+ 1 row in set (0.00 sec) On master: mysql> select balance from branch where bid = 1; +---------+ | balance | +---------+ | 3.00 | +---------+ 1 row in set (0.00 sec) On Slave: (InnoDB) Not able to repeat by hand the 134: mysql> select balance from branch where bid = 1; +---------+ | balance | +---------+ | 3.00 | +---------+ 1 row in set (0.00 sec) So now I try to set back the one in the myISAM table. On master: mysql> update account set balance = 0 where aid = 1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 On Slave: Slave bombs due to getting the 134 in trying to read the record: Slave_IO_Running: Yes Slave_SQL_Running: No Last_Errno: 134 Last_Error: Error in Write_rows event: error during transaction execution on table TPCB.account Hope this helps /jeb
[13 Nov 2006 13:20]
Jonathan Miller
How to repeat: I have tried to recreate this using mysql-test, but the test passes and does not fail. Create cluster replication as follows Master host1: MySQLD <- Test against this one host2: ndbd, MySQLD <- For master replication host3: ndb_mgmd, ndbd host4: MySQLD <- For slave replication host5: ndbd host6: ndb_mgmd, ndbd 1) Start replication 2) On host 1 load the database host1$>perl load_tpcb.pl --sock 3) Once tables are created, loaded and replicated login to the slave mysqld and alter the tables in the TPCB database. mysql> alter table account engine=myisam; Query OK, 100000 rows affected (2.72 sec) Records: 100000 Duplicates: 0 Warnings: 0 mysql> alter table branch engine=myisam; Query OK, 10000 rows affected (1.36 sec) Records: 10000 Duplicates: 0 Warnings: 0 mysql> alter table teller engine=myisam; Query OK, 20000 rows affected (1.63 sec) Records: 20000 Duplicates: 0 Warnings: 0 mysql> alter table history engine=myisam; Query OK, 0 rows affected (1.27 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table trans engine=myisam; Query OK, 0 rows affected (1.14 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table sync engine=myisam; Query OK, 0 rows affected (1.37 sec) Records: 0 Duplicates: 0 Warnings: 0 4) On host1 start the tpcb test perl tpcb_driver.pl -ho host1 -u root --sock 5) Show slave status\G on the slave Last_Errno: 134 Last_Error: Error in Write_rows event: error during transaction execution on table TPCB.trans
[25 Jan 2007 14:05]
Jonathan Miller
070125 14:53:03 mysqld started 070125 14:53:04 InnoDB: Started; log sequence number 0 46409 070125 14:53:04 [Note] Starting MySQL Cluster Binlog Thread 070125 14:53:04 [Note] /home/ndbdev/jmiller/builds/libexec/mysqld: ready for connection s. Version: '5.1.15-beta-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution 070125 14:53:04 [Note] SCHEDULER: Loaded 0 events 070125 14:53:59 [Note] Slave SQL thread initialized, starting replication in log 'FIRST ' at position 0, relay log './ndb12-relay-bin.000001' position: 4 070125 14:53:59 [Note] Slave I/O thread: connected to master 'rep@ndb09:3306',replicati on started in log 'FIRST' at position 4 070125 15:04:11 [ERROR] Slave: Error in Write_rows event: row application failed, Error _code: 134 070125 15:04:11 [ERROR] Slave: Error in Write_rows event: error during transaction exec ution on table TPCB.account, Error_code: 134 070125 15:04:11 [ERROR] Slave (additional info): Unknown error Error_code: 1105 070125 15:04:11 [Warning] Slave: Unknown error Error_code: 1105 070125 15:04:11 [Warning] Slave: Unknown error Error_code: 1105 070125 15:04:11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'ndb09.000001' position 3724714
[13 Feb 2007 8:57]
Rafal Somla
Here is an error trace of the same failure but with a patched code which prints slightly more precise error messages: ------------------------------------------------------------------------ 070126 18:20:23 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './ndb12-relay-bin.000001' position: 4 070126 18:20:23 [Note] Slave I/O thread: Connected to master 'rep@ndb09:3306', Reading log 'FIRST' from position 4 <...> 070126 18:49:24 [ERROR] Slave: replace_record: error in rnd_pos() method of MyISAM handler. Error_code: 134 070126 18:49:24 [ERROR] Slave SQL thread: Error detected when executing event at pos 3860317 in ./ndb12-relay-bin.000006. 070126 18:49:24 [Note] Slave SQL thread exiting, replication stopped in log 'ndb09.000003' at position 3849944 ------------------------------------------------------------------------ Note: perror 134: "MySQL error code 134: Record was already deleted (or record file crashed)" Note: The offending event is a Write_rows event in a long sequence of such events. The error happened inside replace_record() fnuction called when Write_rows event is executed. This function first tries to inserd record using ha_write_row. If it fails, then get_dup_key() is called and then rnd_pos() which tries to locate the conflicting row. This is where the error is detected.
[13 Feb 2007 19:09]
Rafal Somla
I reproduced the problem in 5.1.15 tree.
[22 Mar 2007 13:12]
Rafal Somla
Waiting for fix to BUG#22583 to be pushed to see if it solves the problem.
[29 Mar 2007 16:20]
Jonathan Miller
mysql-test/include/rpl_multi_engine3.inc
Attachment: rpl_multi_engine3.inc (application/octet-stream, text), 2.36 KiB.
[29 Mar 2007 16:21]
Jonathan Miller
mysql-test/extra/rpl_tests/rpl_ndb_2multi_eng.test
Attachment: rpl_ndb_2multi_eng.test (application/octet-stream, text), 10.12 KiB.
[29 Mar 2007 16:23]
Jonathan Miller
NOTE: rm ./r/rpl_ndb_2myisam.result touch ./r/rpl_ndb_2myisam.result replace current files with ones attached to bug report ./mysql-test-run.pl --force --do-test=rpl_ndb_2m --mysqld=--binlog-format=row --ndb-extra-test
[29 Mar 2007 18:08]
Rafal Somla
Slave's binlog from the test run on recent rpl tree
Attachment: slave_tpcb.sql (application/octet-stream, text), 123.00 KiB.
[17 Apr 2007 20:26]
Jonathan Miller
Currently in the telco tree the MyISAM till get the same results: 070417 23:18:26 [Warning] Slave: Unknown error Error_code: 1105 070417 23:18:26 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 8383070417 23:18:58 [Note] Slave: received end packet from server, apparent master shutdown: 070417 23:18:58 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'master-bin.000001' position 8876 070417 23:18:58 [ERROR] Slave I/O thread: error reconnecting to master 'root@127.0.0.1:9306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 111' errno: 2013 retry-time: 1 retries: 10 070417 23:18:58 [Note] /data1/mysql-5.1-telco/sql/mysqld: Normal shutdown 070417 23:18:58 [Note] Event Scheduler: Purging the queue. 0 events 070417 23:18:58 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read 070417 23:18:58 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 8876 070417 23:18:58 [Note] Stopping Cluster Binlog 070417 23:18:58 [Note] Stopping Cluster Utility thread 070417 23:19:00 [Note] /data1/mysql-5.1-telco/sql/mysqld: Shutdown complete So I am guessing that the patch for the bit field has not been applied to this tree as of yet. But, the Innodb has much different results: *** 372,378 **** --- Check Update on slave --- SELECT id,hex(b1),vc,bc,d,f,total,y,t FROM t1 WHERE id = 412; id hex(b1) vc bc d f total y t ! 412 0 Testing MySQL databases is a cool Must make it bug free for the customer 654321.4321 15.21 0 1965 2006-02-22 --- Remove a record from t1 on master --- DELETE FROM t1 WHERE id = 42; --- Show current count on master for t1 --- --- 372,378 ---- --- Check Update on slave --- SELECT id,hex(b1),vc,bc,d,f,total,y,t FROM t1 WHERE id = 412; id hex(b1) vc bc d f total y t ! 412 0 NULL NULL NULL NULL 0 NULL 2006-02-22 --- Remove a record from t1 on master --- DELETE FROM t1 WHERE id = 42; --- Show current count on master for t1 --- *************** *** 382,388 **** --- Show current count on slave for t1 --- SELECT COUNT(*) FROM t1; COUNT(*) ! 4 DELETE FROM t1; --- End test 5 key partition testing --- --- Do Cleanup --- --- 382,388 ---- --- Show current count on slave for t1 --- SELECT COUNT(*) FROM t1; COUNT(*) ! 5 DELETE FROM t1; --- End test 5 key partition testing --- --- Do Cleanup --- Seems that some of the data is getting messed up on the update and the delete does not go through at all.
[18 Apr 2007 19:06]
Jonathan Miller
Replace existing in /extra/rpl_tests/
Attachment: rpl_ndb_2multi_eng.test (application/octet-stream, text), 10.52 KiB.
[5 May 2007 11:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/26170 ChangeSet@1.2575, 2007-05-05 13:35:44+02:00, rafal@quant.(none) +10 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): This is a preliminary patch in preparation for the bug fix. The main change is to make unpack_row() function non-destructive. That is, if a column is not present in the row it will be left as it is in the record to which we unpack (table->record[0]). If a caller of unpack_row() wants the missing columns to be initialized with default values, it must do it itself. Function prepare_record() is added for that purpose. Other changes in this changeset: - Change signature of unpack_row(): don't report errors and don't setup table's rw_set here. - In Rows_log_event and derived classes, don't pass arguments to the execution primitives (do_...() member functions) but use class members instead. - Factor-out code used for opening tables in a Rows event to a separate method open_and_lock_tables(). - Change the way errors are reported when filling fields with default values. Now user can see correct error number in SHOW SLAVE STATUS output. - The changes seems to fix rpl_ndb_extraCol test. Before it produced results different than the same test run on other storage engines. Now the results are identical. The result file is updated accordingly.
[21 May 2007 19:24]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27099 ChangeSet@1.2577, 2007-05-21 21:23:19+02:00, rafal@quant.(none) +9 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): This patch implements solution b described in the bug report with some modifications. Main modifications are: - make replace_record() function a method of Rows_log_event as an instance of this class contains most of the data needed by the function. - make similar modifications to find_and_fetch_row() function. Also in this case row data is unpacked inside the function. - make modified versions of rpl_ndb_2xxx tests so that they work in the current tree and test that modified code correctly handles ndb->other replication.
[8 Jun 2007 3:37]
Jonathan Miller
Hi, I was watching push build today (5.1-telco) due to pushing in test changes I noticed that rpl_ndb_mix_innodb.test has been failing for a while now. The test fails with "could not sync with master ('select master_pos_wait('master-bin.000001', 228163)' returned NULL)" Looking at it, the cause of the slave not syncing was due to the slave failing with: 070607 22:13:52 [ERROR] Slave: Error in Update_rows event: row application failed, Error_code: 0 070607 22:13:52 [ERROR] Slave: Error in Update_rows event: error during transaction execution on table tpcb.branch, Error_code: 1105 070607 22:13:52 [Warning] Slave: Got error 4350 'Transaction already aborted' from NDB Error_code: 1296 070607 22:13:52 [Warning] Slave: Unknown error Error_code: 1105 070607 22:13:52 [Warning] Slave: Unknown error Error_code: 1105 Also added to http://bugs.mysql.com/bug.php?id=27979
[4 Jul 2007 15:27]
Rafal Somla
This is the patch from 5 May but updated to reflect the current rpl tree. As before, this is only preparation for the real bug fix which will appear here later. <http://lists.mysql.com/commits/30211>
[4 Jul 2007 18:28]
Rafal Somla
A new version of the patch fixing the bug can be found here <http://lists.mysql.com/commits/30319>. It addresses concerns raised during first review. Note that the patch should be applied over the preliminary patch <http://lists.mysql.com/commits/30211>. A separate patch with tests should follow shortly.
[5 Jul 2007 10:12]
Rafal Somla
Currently, writing tests which test replication ndb->xxx is not possible because of BUG#29569. Any test which would check that this patch works correctly must be postponed until that bug is fixed.
[6 Jul 2007 14:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/30441 ChangeSet@1.2530, 2007-07-06 16:58:18+02:00, rafal@quant.(none) +8 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): Problem: A RBR event can contain incomplete row data (only key value and fields which have been changed). In that case, when the row is unpacked into record and written to a table, the missing fields get incorrect NULL values leading to master-slave inconsistency. Solution: Use values found in slave's table for columns which are not given in the rows event. The code for writing a single row uses the following algorithm: 1. unpack row_data into table->record[0], 2. try to insert record, 3. if duplicate record found, fetch it into table->record[1], 4. unpack row_data into table->record[1], 5. write table->record[1] into the table. Where row_data is the row as stored in the data area of a rows event. Thus: a) unpacking of row_data happens at the time when row is written into a table, b) when unpacking (in step 4), only columns present in row_data are overwritten - all other columns remain as they were found in the table. Since all data needed for the above algorithm is stored inside Rows_log_event class, functions which locate and write rows are turned into methods of that class. replace_record() -> Rows_log_event::write_row() find_and_fetch_row() -> Rows_log_event::find_and_fetch_row() Both methods take row data from event's data buffer - the row being processed is pointed by m_curr_row. They unpack the data as needed into table's record buffers record[0] or record[1]. When row is unpacked, m_curr_row_end is set to point at next row in the data buffer. Other changes introduced in this changeset: - Change signature of unpack_row(): don't report errors and don't setup table's rw_set here. Errors can happen only when setting default values in prepare_record() function and are detected there. - In Rows_log_event and derived classes, don't pass arguments to the execution primitives (do_...() member functions) but use class members instead. - The changes seem to fix rpl_ndb_extraCol test. Before it produced results different than the same test run on other storage engines. Now the results are identical. The result file is updated accordingly.
[6 Jul 2007 15:04]
Rafal Somla
A new version of the patch has been commited: <http://lists.mysql.com/commits/30441>. Due to the popular demand, the two patches are now merged into one. A previous commit error causing a lot of spurious changes to be present in the pre-patch is now fixed. Patch comments have been improved. Also, any changes/re-factoring which is not essential was removed.
[6 Jul 2007 21:11]
Chuck Bell
Patches applied and initially reviewed. Waiting for additional assistance in reproducing the problem and verifying solution.
[10 Jul 2007 17:45]
Andrei Elkin
The latest patch is reveiwed and some suggestion are mailed. Particularly, the suggestion to extend bitmap library with a function the current patch would benefit from.
[20 Aug 2007 17:10]
Rafal Somla
I will add 2 more patches for this bug. First will fix a problem I found with initializing write_set inside Rows_log events. Second will add some test cases which verify that the replication works correct after the fix.
[20 Aug 2007 17:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32767 ChangeSet@1.2531, 2007-08-20 19:11:30+02:00, rafal@quant.(none) +1 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): This patch fixes the way write_set is initialized inside Rows_log_events when there are extra columns on slave. Previously the extra columns were included in the write_set which is wrong. Now they are not included, as is the case in the original source tree. To correctly handle master/slave record width differences, the m_cols bitmap sent in Rows_log_event should have correct width equal to the number of columns on master. This was not the case because the witdth of the bitmap was rounded to nearest multiply of 8. The patch fixes this by removing width rounding.
[20 Aug 2007 17:39]
Rafal Somla
A patch with test cases is comming soon.
[21 Aug 2007 7:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32792 ChangeSet@1.2532, 2007-08-21 09:45:58+02:00, rafal@quant.(none) +6 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): This patch introduces test rpl_ndb_2other which tests basic replication from master using ndb tables to slave storing the same tables using (possibly) different engine (myisam,innodb). Test is based on existing tests rpl_ndb_2myisam and rpl_ndb_2innodb. However, these tests doesn't work for various reasons and currently are disabled (see BUG#19227). The new test differs from the ones it is based on as follows: 1. Single test tests replication with different storage engines on slave (myisam, innodb, ndb). 2. Include file extra/rpl_tests/rpl_ndb_2multi_eng.test containing original tests is replaced by extra/rpl_tests/rpl_ndb_2multi_basic.test which doesn't contain tests using partitioned tables as these don't work currently. Instead, it tests replication to a slave which has more columns than master. 3. Include file include/rpl_multi_engine3.inc is replaced with include/rpl_multi_engine2.inc. The later differs by performing slightly different operations (updating more than one row in the table) and clearing table with "TRUNCATE TABLE" statement instead of "DELETE FROM" as replication of "DELETE" doesn't work well in this setting. 4. Slave must use option --log-slave-updates=0 as otherwise execution of replication events generated by ndb fails if table uses a different storage engine on slave (see BUG#29569).
[21 Aug 2007 7:59]
Rafal Somla
Results of rpl_ndb_2other test
Attachment: rpl_ndb_2other.log (application/octet-stream, text), 8.02 KiB.
[24 Aug 2007 13:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33023 ChangeSet@1.2569, 2007-08-24 15:05:54+02:00, rafal@quant.(none) +7 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): Problem: A RBR event can contain incomplete row data (only key value and fields which have been changed). In that case, when the row is unpacked into record and written to a table, the missing fields get incorrect NULL values leading to master-slave inconsistency. Solution: Use values found in slave's table for columns which are not given in the rows event. The code for writing a single row uses the following algorithm: 1. unpack row_data into table->record[0], 2. try to insert record, 3. if duplicate record found, fetch it into table->record[1], 4. unpack row_data into table->record[1], 5. write table->record[1] into the table. Where row_data is the row as stored in the data area of a rows event. Thus: a) unpacking of row_data happens at the time when row is written into a table, b) when unpacking (in step 4), only columns present in row_data are overwritten - all other columns remain as they were found in the table. Since all data needed for the above algorithm is stored inside Rows_log_event class, functions which locate and write rows are turned into methods of that class. replace_record() -> Rows_log_event::write_row() find_and_fetch_row() -> Rows_log_event::find_and_fetch_row() Both methods take row data from event's data buffer - the row being processed is pointed by m_curr_row. They unpack the data as needed into table's record buffers record[0] or record[1]. When row is unpacked, m_curr_row_end is set to point at next row in the data buffer. Other changes introduced in this changeset: - Change signature of unpack_row(): don't report errors and don't setup table's rw_set here. Errors can happen only when setting default values in prepare_record() function and are detected there. - In Rows_log_event and derived classes, don't pass arguments to the execution primitives (do_...() member functions) but use class members instead. - Move old row handling code into log_event_old.cc to be used by *_rows_log_event_old classes.
[24 Aug 2007 13:13]
Rafal Somla
The last patch contains the same changes as already introduced by previous patches (<http://lists.mysql.com/commits/30441> from 6 Jul and <http://lists.mysql.com/commits/32767> from 20 Aug) but this time applied against a fresh 5.1-targer-5.1.22 tree. A separate patch with changes requested by reviewers will follow + another patch with the test case.
[24 Aug 2007 17:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33046 ChangeSet@1.2570, 2007-08-24 19:58:22+02:00, rafal@quant.(none) +6 -0 BUG#21842: This patch contains changes needed to support replication for a table which has extra columns on master as introduced by WL#3228 (before only extra slave-side columns were supported). It also contains some improvements suggested by reviewers.
[25 Aug 2007 11:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33098 ChangeSet@1.2571, 2007-08-25 13:16:43+02:00, rafal@quant.(none) +6 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): (It is adaptation of a patch prepared before for 5.1-new-rpl tree to 5.1-target-5.1.22) This patch introduces test rpl_ndb_2other which tests basic replication from master using ndb tables to slave storing the same tables using (possibly) different engine (myisam,innodb). Test is based on existing tests rpl_ndb_2myisam and rpl_ndb_2innodb. However, these tests doesn't work for various reasons and currently are disabled (see BUG#19227). The new test differs from the ones it is based on as follows: 1. Single test tests replication with different storage engines on slave (myisam, innodb, ndb). 2. Include file extra/rpl_tests/rpl_ndb_2multi_eng.test containing original tests is replaced by extra/rpl_tests/rpl_ndb_2multi_basic.test which doesn't contain tests using partitioned tables as these don't work currently. Instead, it tests replication to a slave which has more or less columns than master. 3. Include file include/rpl_multi_engine3.inc is replaced with include/rpl_multi_engine2.inc. The later differs by performing slightly different operations (updating more than one row in the table) and clearing table with "TRUNCATE TABLE" statement instead of "DELETE FROM" as replication of "DELETE" doesn't work well in this setting. 4. Slave must use option --log-slave-updates=0 as otherwise execution of replication events generated by ndb fails if table uses a different storage engine on slave (see BUG#29569).
[26 Aug 2007 12:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33118 ChangeSet@1.2569, 2007-08-26 14:31:10+02:00, rafal@quant.(none) +13 -0 BUG#21842 (Cluster fails to replicate to innodb or myisam with err 134 using TPC-B): Problem: A RBR event can contain incomplete row data (only key value and fields which have been changed). In that case, when the row is unpacked into record and written to a table, the missing fields get incorrect NULL values leading to master-slave inconsistency. Solution: Use values found in slave's table for columns which are not given in the rows event. The code for writing a single row uses the following algorithm: 1. unpack row_data into table->record[0], 2. try to insert record, 3. if duplicate record found, fetch it into table->record[0], 4. unpack row_data into table->record[0], 5. write table->record[0] into the table. Where row_data is the row as stored in the data area of a rows event. Thus: a) unpacking of row_data happens at the time when row is written into a table, b) when unpacking (in step 4), only columns present in row_data are overwritten - all other columns remain as they were found in the table. Since all data needed for the above algorithm is stored inside Rows_log_event class, functions which locate and write rows are turned into methods of that class. replace_record() -> Rows_log_event::write_row() find_and_fetch_row() -> Rows_log_event::find_row() Both methods take row data from event's data buffer - the row being processed is pointed by m_curr_row. They unpack the data as needed into table's record buffers record[0] or record[1]. When row is unpacked, m_curr_row_end is set to point at next row in the data buffer. Other changes introduced in this changeset: - Change signature of unpack_row(): don't report errors and don't setup table's rw_set here. Errors can happen only when setting default values in prepare_record() function and are detected there. - In Rows_log_event and derived classes, don't pass arguments to the execution primitives (do_...() member functions) but use class members instead. - Move old row handling code into log_event_old.cc to be used by *_rows_log_event_old classes. Also, a new test rpl_ndb_2other is added which tests basic replication from master using ndb tables to slave storing the same tables using (possibly) different engine (myisam,innodb). Test is based on existing tests rpl_ndb_2myisam and rpl_ndb_2innodb. However, these tests doesn't work for various reasons and currently are disabled (see BUG#19227). The new test differs from the ones it is based on as follows: 1. Single test tests replication with different storage engines on slave (myisam, innodb, ndb). 2. Include file extra/rpl_tests/rpl_ndb_2multi_eng.test containing original tests is replaced by extra/rpl_tests/rpl_ndb_2multi_basic.test which doesn't contain tests using partitioned tables as these don't work currently. Instead, it tests replication to a slave which has more or less columns than master. 3. Include file include/rpl_multi_engine3.inc is replaced with include/rpl_multi_engine2.inc. The later differs by performing slightly different operations (updating more than one row in the table) and clearing table with "TRUNCATE TABLE" statement instead of "DELETE FROM" as replication of "DELETE" doesn't work well in this setting. 4. Slave must use option --log-slave-updates=0 as otherwise execution of replication events generated by ndb fails if table uses a different storage engine on slave (see BUG#29569).
[27 Aug 2007 18:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33166 ChangeSet@1.2570, 2007-08-27 20:22:04+02:00, rafal@quant.(none) +1 -0 BUG#21842: There was an inconsistency in the use of table->record[0] and table->record[1] buffers inside Rows_log_event::find_row() function. The patch fixes this.
[28 Aug 2007 7:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33195 ChangeSet@1.2571, 2007-08-28 09:20:51+02:00, rafal@quant.(none) +3 -0 BUG#21842: Exclude Rows_log_event members used in event application if not compiled as a replication server - a fix from rpl clone now applied to 5.1.22 tree.
[28 Aug 2007 8:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/33196 ChangeSet@1.2571, 2007-08-28 10:14:45+02:00, rafal@quant.(none) +2 -0 BUG#21842: Exclude Rows_log_event members used in event application if not compiled as a replication server - a fix from rpl clone now applied to 5.1.22 tree.
[29 Aug 2007 7:11]
Rafal Somla
Pushed into 5.1-target-5.1.22 and 5.1-new-rpl trees.
[4 Sep 2007 12:55]
Rafal Somla
The problems mentioned by Antony are related to big/low-endian issues in replication. These are reported in BUG#29549. The suspicious code from rpl_utility.cc comes from WL#3328 and is now reported as BUG#30790. This patch was never concerned with endianess issues. It only solves the problem of setting default/existing values for columns which are not present in Write_rows events. Its correctness was confirmed by reviewers and the rpl_ndb_2other test which passes unless run on big-endian machine where the other problems manifest themselves. Note that the endianess problems are now detected because only now the replication code is mature enough to try NDB -> non-NDB replication. Before, such setting caused slave to crash hopelessly, which was the original reason for reporting this bug.
[4 Sep 2007 17:12]
Bugs System
Pushed into 5.1.23-beta
[5 Feb 2008 13:04]
Bugs System
Pushed into 5.1.24-rc
[5 Feb 2008 13:08]
Bugs System
Pushed into 6.0.5-alpha
[6 Mar 2008 5:25]
Jon Stephens
Bugfix documented in the 5.1.23 and 6.0.5 changelogs.