Bug #40258 | Replication failure on Innodb + RBR + UPDATE + ORDER BY | ||
---|---|---|---|
Submitted: | 22 Oct 2008 15:39 | Modified: | 13 Mar 2009 11:05 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S2 (Serious) |
Version: | 5.1-rpl, 5.1-bzr | OS: | Any |
Assigned to: | Guangbao Ni | CPU Architecture: | Any |
[22 Oct 2008 15:39]
Philip Stoev
[22 Oct 2008 15:52]
Philip Stoev
YY file
Attachment: bug40258.yy (application/octet-stream, text), 405 bytes.
[22 Oct 2008 15:52]
Philip Stoev
ZZ file
Attachment: bug40258.zz (text/plain), 106 bytes.
[22 Oct 2008 15:54]
Philip Stoev
To reproduce with the random query generator, clone the mysql-test-extra-6.0 tree and execute: $ cd mysql-test-extra-6.0/mysql-test/gentest/ $ perl runall.pl \ --basedir=/path/to/5.1-rpl \ --grammar=conf/rpl_diff.yy \ --gendata=conf/rpl_diff.zz \ --rpl_mode=row \ --engine=Innodb \ --mysqld=--innodb-lock-wait-timeout=1 \ --queries=2000 This will start the concurrent workload and will exit with an error once the slave thread stops. This test uses char(1) columns, however attempts to insert and compare words from the english language, which are slightly longer.
[22 Oct 2008 16:26]
Valeriy Kravchuk
Thank you for a bug report. Verified with main 5.1 tree from bzr. Master and slave diverge: openxs@suse:/home2/openxs/bzr/mysql-test-extra-6.0/mysql-test/gentest> perl runall.pl --basedir=/home2/openxs/dbs/5.1 --rpl_mode=row --gendata=/home2/openxs/bug40258.zz --grammar=/home2/openxs/bug40258.yy --mysqld=--innodb_lock_wait_timeout=1 --engine=InnoDB --queries=2000 # 23:28:23 Please see http://inside.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework. # 23:28:23 Starting # 23:28:23 # runall.pl \ # 23:28:23 # --basedir=/home2/openxs/dbs/5.1 \ # 23:28:23 # --rpl_mode=row \ # 23:28:23 # --gendata=/home2/openxs/bug40258.zz \ # 23:28:23 # --grammar=/home2/openxs/bug40258.yy \ # 23:28:23 # --mysqld=--innodb_lock_wait_timeout=1 \ # 23:28:23 # --engine=InnoDB \ # 23:28:23 # --queries=2000 # 23:28:23 Running perl mysql-test-run.pl --start-and-exit --mysqld=--innodb --mysqld="--innodb_lock_wait_timeout=1" --mysqld=--default-storage-engine=InnoDB --mysqld=--relay-log=master-relay-bin --mysqld=--loose-innodb --mysqld=--loose-falcon-debug-mask=2 --mysqld=--secure-file-priv= --mysqld=--max-allowed-packet=16Mb --master_port=19306 rpl_alter --slave_port=19308 . Logging: mysql-test-run.pl --start-and-exit --mysqld=--innodb --mysqld=--innodb_lock_wait_timeout=1 --mysqld=--default-storage-engine=InnoDB --mysqld=--relay-log=master-relay-bin --mysqld=--loose-innodb --mysqld=--loose-falcon-debug-mask=2 --mysqld=--secure-file-priv= --mysqld=--max-allowed-packet=16Mb --master_port=19306 rpl_alter --slave_port=19308 MySQL Version 5.1.30 Using dynamic switching of binlog format Using default engine 'InnoDB' Using ndbcluster when necessary, mysqld supports it Setting mysqld to support SSL connections Binaries are debug compiled Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 19306 Using MASTER_MYPORT1 = 19307 Using SLAVE_MYPORT = 19308 Using SLAVE_MYPORT1 = 19309 Using SLAVE_MYPORT2 = 19310 Using NDBCLUSTER_PORT = 9311 Using NDBCLUSTER_PORT_SLAVE = 9312 Using IM_PORT = 9313 Using IM_MYSQLD1_PORT = 9314 Using IM_MYSQLD2_PORT = 9315 - adding combinations Killing Possible Leftover Processes Removing Stale Files Creating Directories Installing Master Database Installing Slave1 Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- Servers started, exiting # 23:28:33 Establishing replication, mode row ... # 23:28:34 Starting # 23:28:34 gentest.pl \ # 23:28:34 --gendata=/home2/openxs/bug40258.zz \ # 23:28:34 --engine=InnoDB \ # 23:28:34 --rpl_mode=row \ # 23:28:34 --threads=10 \ # 23:28:34 --queries=2000 \ # 23:28:34 --duration=3600 \ # 23:28:34 --dsn1=dbi:mysql:host=127.0.0.1:port=19306:user=root:database=test \ # 23:28:34 --grammar=/home2/openxs/bug40258.yy # 23:28:34 Creating table table10_innodb_int_autoinc . # 23:28:34 Reporters: ErrorLog, Backtrace # 23:28:34 Validators: FalconErrors, ErrorMessageCorruption, ReplicationSlaveStatus # 23:28:34 Starting 10 processes, 2000 queries each, duration 3600 seconds. # 23:28:45 Started periodic reporting process... # 23:29:13 Child process completed successfully. # 23:29:43 Query: UPDATE `table10_innodb_int_autoinc` SET `char` = 'all' WHERE `char_key` < 'no' ORDER BY `char` failed: 1032 Can't find record in 'table10_innodb_int_autoinc' # 23:29:55 Query: DELETE FROM `table10_innodb_int_autoinc` WHERE `char` < 'something' LIMIT 1 failed: 1205 Lock wait timeout exceeded; try restarting transaction # 23:29:55 Query: DELETE FROM `table10_innodb_int_autoinc` WHERE `char` < "you're" LIMIT 1 failed: 1205 Lock wait timeout exceeded; try restarting transaction # 23:29:56 Child process completed successfully. # 23:30:02 Child process completed successfully. # 23:30:06 Child process completed successfully. # 23:30:07 Child process completed successfully. # 23:30:08 Child process completed successfully. # 23:30:09 Child process completed successfully. # 23:30:11 Child process completed successfully. # 23:30:11 Child process completed successfully. # 23:30:13 Child process completed successfully. # 23:30:18 Killing periodic reporting process with pid 4670... # 23:30:23 Test completed successfully. # 23:30:23 Waiting for slave to catch up..., file master-bin.000001, pos 882726 . # 23:30:23 Dumping server on port 19306... # 23:30:23 Dumping server on port 19308... # 23:30:23 Comparing SQL dumps... --- /tmp/server_4614_0.dump 2008-09-25 23:30:23.000000000 +0300 +++ /tmp/server_4614_1.dump 2008-09-25 23:30:23.000000000 +0300 @@ -691,6 +691,7 @@ INSERT INTO `table10_innodb_int_autoinc` VALUES ('m',996,'o'); INSERT INTO `table10_innodb_int_autoinc` VALUES ('m',998,'o'); INSERT INTO `table10_innodb_int_autoinc` VALUES (NULL,10,NULL); +INSERT INTO `table10_innodb_int_autoinc` VALUES (NULL,1623,'w'); INSERT INTO `table10_innodb_int_autoinc` VALUES (NULL,2567,'t'); INSERT INTO `table10_innodb_int_autoinc` VALUES (NULL,2568,'s'); INSERT INTO `table10_innodb_int_autoinc` VALUES (NULL,2571,'h');
[10 Mar 2009 2:20]
Guangbao Ni
I don't know why it alway firstly outputs "failed: 1054 Unknown column '%s' in 'where clause'.Further errors of this kind will be suppressed." error messages. I use the yy anz zz files attached in the bug. run it just as followings: [gni@work-server gentest]$ perl runall.pl --basedir=/home/gni/mysql/bzr/bugteam-5.1/ --grammar=conf/bug40258.yy --gendata=conf/bug40258.zz --rpl_mode=row --mysqld=--innodb-lock-wait-timeout=1 --queries=2000 # 23:12:45 Please see http://inside.mysql.com/wiki/Category:RandomQueryGenerator for more information on this test framework. # 23:12:45 Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. # 23:12:45 Starting # 23:12:45 # runall.pl \ # 23:12:45 # --basedir=/home/gni/mysql/bzr/bugteam-5.1/ \ # 23:12:45 # --grammar=conf/bug40258.yy \ # 23:12:45 # --gendata=conf/bug40258.zz \ # 23:12:45 # --rpl_mode=row \ # 23:12:45 # --mysqld=--innodb-lock-wait-timeout=1 \ # 23:12:45 # --queries=2000 # 23:12:46 Running perl mysql-test-run.pl --start-and-exit --mysqld="--innodb-lock-wait-timeout=1" --skip-ndb --mysqld=--relay-log=slave-relay-bin --mysqld=--loose-innodb --mysqld=--loose-falcon-debug-mask=2 --mysqld=--secure-file-priv= --mysqld=--max-allowed-packet=16Mb --mysqld=--loose-innodb-status-file=1 --master_port=19306 rpl_alter --slave_port=19308 2>&1 . ======================================================= WARNING: Using mysql-test-run.pl version 1! ======================================================= Logging: lib/v1/mysql-test-run.pl --start-and-exit --mysqld=--innodb-lock-wait-timeout=1 --skip-ndb --mysqld=--relay-log=slave-relay-bin --mysqld=--loose-innodb --mysqld=--loose-falcon-debug-mask=2 --mysqld=--secure-file-priv= --mysqld=--max-allowed-packet=16Mb --mysqld=--loose-innodb-status-file=1 --master_port=19306 rpl_alter --slave_port=19308 090309 23:12:48 [Warning] Forcing shutdown of 2 plugins MySQL Version 5.1.33 Using dynamic switching of binlog format Skipping ndbcluster Setting mysqld to support SSL connections Binaries are debug compiled Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 19306 Using MASTER_MYPORT1 = 19307 Using SLAVE_MYPORT = 19308 Using SLAVE_MYPORT1 = 19309 Using SLAVE_MYPORT2 = 19310 Using IM_PORT = 9313 Using IM_MYSQLD1_PORT = 9314 Using IM_MYSQLD2_PORT = 9315 - adding combinations Killing Possible Leftover Processes Removing Stale Files Creating Directories Installing Master Database Installing Slave1 Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- Servers started, exiting # 23:12:55 Establishing replication, mode row ... # 23:12:56 Starting # 23:12:56 gentest.pl \ # 23:12:56 --gendata=conf/bug40258.zz \ # 23:12:56 --rpl_mode=row \ # 23:12:56 --threads=10 \ # 23:12:56 --queries=2000 \ # 23:12:56 --duration=3600 \ # 23:12:56 --dsn1=dbi:mysql:host=127.0.0.1:port=19306:user=root:database=test \ # 23:12:56 --grammar=conf/bug40258.yy # 23:12:57 Creating table table10_int_autoinc . # 23:12:57 Reporters: ErrorLog, Backtrace # 23:12:57 Validators: ErrorMessageCorruption, ReplicationSlaveStatus # 23:12:57 Starting 10 processes, 2000 queries each, duration 3600 seconds. # 23:12:57 Query: UPDATE `table10_int_autoinc` SET `char` = 'been' WHERE `char` > e ORDER BY `char` failed: 1054 Unknown column '%s' in 'where clause'.Further errors of this kind will be suppressed. # 23:12:57 Query: UPDATE `table10_int_autoinc` SET `char` = c WHERE `char_key` = 'my' ORDER BY `char_key` failed: 1054 Unknown column '%s' in 'field list'. Further errors of this kind will be suppressed. # 23:12:57 Query: UPDATE `table10_int_autoinc` SET `char` = 'he' WHERE `char` < f ORDER BY `char_key` failed: 1054 Unknown column '%s' in 'where clause'. Further errors of this kind will be suppressed. # 23:12:57 Query: UPDATE `table10_int_autoinc` SET `char` = k WHERE `char` > 'been' ORDER BY `char` failed: 1054 Unknown column '%s' in 'field list'. Further errors of this kind will be suppressed. # 23:12:58 Started periodic reporting process... # 23:12:58 Query: DELETE FROM `table10_int_autoinc` WHERE `char_key` = o LIMIT 1 failed: 1054 Unknown column '%s' in 'where clause'. Further errors of this kind will be suppressed. # 23:12:58 Query: INSERT INTO `table10_int_autoinc` ( `char_key` ) VALUES (g ) failed: 1054 Unknown column '%s' in 'field list'. Further errors of this kind will be suppressed. # 23:12:58 Query: INSERT INTO `table10_int_autoinc` ( `char` ) VALUES ( d )failed: 1054 Unknown column '%s' in 'field list'. Further errors of this kind will be suppressed. # 23:12:58 Query: UPDATE `table10_int_autoinc` SET `char_key` = 'would' WHERE `char_key` = l ORDER BY `char` failed: 1054 Unknown column '%s' in 'whereclause'. Further errors of this kind will be suppressed. # 23:12:59 Query: UPDATE `table10_int_autoinc` SET `char` = j WHERE `char` = m ORDER BY `char` failed: 1054 Unknown column '%s' in 'where clause'. Further errors of this kind will be suppressed.
[10 Mar 2009 7:14]
Philip Stoev
Guangbao Ni, this test uses random queries. Therefore, messages such as "unknown column", "column specified twice", and transaction errors are to be expected due to the random nature of the test. Please ignore all those errors and focus on the replication-related errors, that is, on messages that the slave has stopped or that the master and the slave have diverged.
[10 Mar 2009 7:55]
Guangbao Ni
Hello Philip Stoev, Besides above error messages, many "1213 deadlock found when trying to get lock; try restarting transaction" error messages are outputed. But at last, the test case can be excuted succesfully from my test. So I want to know whether you still get the problem with newest source codes of 5.1/6.0-bugteam tree. Thanks, /Guangbao
[13 Mar 2009 11:05]
Philip Stoev
This bug is no longer repeatable on 5.1-bugteam and 6.0-main.