Bug #45694 | Deadlock in replicated statement is not retried | ||
---|---|---|---|
Submitted: | 24 Jun 2009 7:04 | Modified: | 16 Sep 2009 10:11 |
Reporter: | Jesper Wisborg Krogh | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.0.77, 5.0 5.1, azalea bzr | OS: | Linux |
Assigned to: | Alfranio Tavares Correia Junior | CPU Architecture: | Any |
Tags: | deadlock, innodb, replication |
[24 Jun 2009 7:04]
Jesper Wisborg Krogh
[24 Jun 2009 7:05]
Jesper Wisborg Krogh
Test case
Attachment: replication_deadlock.sql (text/x-sql), 1.19 KiB.
[24 Jun 2009 7:48]
Sveta Smirnova
Thank you for the report. What do you mean by node1 and node2? Are they master/slave connections or what?
[24 Jun 2009 9:24]
Jesper Wisborg Krogh
Sorry, yes node1 is the master and node2 is the slave.
[13 Jul 2009 4:57]
Susanne Ebrecht
Many thanks for writing a bug report. I am not able to repeat this with actual source tree (MySQL 5.0.85). I will look if it is repeatable with actual release.
[13 Jul 2009 7:11]
Susanne Ebrecht
I am not able to repeat this with MySQL 5.0.83. Please, try this actual version and give us feedback if all is fine with it. I used your test for reproducing.
[22 Jul 2009 0:27]
Jesper Wisborg Krogh
Hi, I just tried to install 5.0.83 on our test server and I can still replicate the problem. I used the "Linux (x86)" tar ball with the attached my.cnf. The following grants and replication setup was done: Node 1: > GRANT REPLICATION SLAVE ON *.* TO 'repl'@'node2' IDENTIFIED BY 'some_password'; > RESET MASTER; > CHANGE MASTER TO MASTER_HOST='node2', MASTER_PORT=3507, MASTER_USER='repl', MASTER_LOG_FILE='node2-bin.000001', MASTER_LOG_POS=98, MASTER_PASSWORD='some_password'; Node 2: > GRANT REPLICATION SLAVE ON *.* TO 'repl'@'node1' IDENTIFIED BY 'some_password'; > RESET MASTER; > CHANGE MASTER TO MASTER_HOST='node1', MASTER_PORT=3507, MASTER_USER='repl', MASTER_LOG_FILE='node1-bin.000001', MASTER_LOG_POS=98, MASTER_PASSWORD='some_password'; The steps were: 1) Run the SQL from the replication_deadlock.sql file to set up the schema and initial data. 2) Run the update on the master (node 1 in the replication_deadlock.sql file) 3) Wait (important!) for the statement in 2) to finish on the master before proceeding. 4) While the replicated statement is running on the slave, run the statements for node 2 from the replication_deadlock.sql file. This is the same procedure as original described, but I thought it would be worth restating them. I hope that helps.
[22 Jul 2009 0:28]
Jesper Wisborg Krogh
The my.cnf file used for the test setup
Attachment: my.cnf (application/octet-stream, text), 85 bytes.
[23 Jul 2009 1:34]
Jesper Wisborg Krogh
An idea - could this be related to bug #24989? Before 5.0.48 a deadlock inside a trigger would cause an error rather than a deadlock warning. Could it be that for that reason the deadlock is still considered a fatal error, which will prevent the retry?
[24 Jul 2009 6:14]
Sveta Smirnova
Thank you for the feedback. This is actually not a bug: if deadlock occurs transaction should be rolled back. Since version 5.1 you can use binary log format ROW to avoid such deadlocks. See also http://dev.mysql.com/doc/refman/5.1/en/replication-formats.html
[24 Jul 2009 7:02]
Jesper Wisborg Krogh
I don't agree that this is not a bug. According to the documentation (http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#sysvar_slave_transac...) when deadlock occurs in a replicated statement for an InnoDB table, then the transaction will be retried automatically when the slave_transaction_retries variable is greater than 0. If you try a simpler test case (see attached file), then the transaction will indeed retry as expected with just an warning in the error log: 090724 16:46:49 [ERROR] Slave: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'test'. Query: 'UPDAT E deadlock SET value = value + 1 WHERE id = 2', Error_code: 1213 The error code is also the same as in the original test case - the difference being whether there is an automatic retry or not. Regarding row based replication in 5.1 then that is not an option here as in the production case it is essential that the trigger actually runs on the slave.
[24 Jul 2009 7:03]
Jesper Wisborg Krogh
Test case where the transaction is retried
Attachment: replication_deadlock_2.sql (text/x-sql), 877 bytes.
[24 Jul 2009 7:36]
Sveta Smirnova
Thank you for the feedback. Verified as described.
[24 Jul 2009 7:37]
Sveta Smirnova
test case for the testsuite
Attachment: rpl_bug45694.test (application/octet-stream, text), 1.23 KiB.
[24 Jul 2009 7:38]
Sveta Smirnova
option file
Attachment: rpl_bug45694-slave.opt (application/octet-stream, text), 40 bytes.
[7 Aug 2009 18:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80391 3059 Alfranio Correia 2009-08-07 BUG#45694 Deadlock in replicated statement is not retried If the SQL Thread fails to execute an event due to a temporary error (e.g. ER_LOCK_DEADLOCK) and the option "--slave_transaction_retries" is set the SQL Thread should not be aborted and the transaction should be restarted from the beginning and re-executed. Unfortunately, a wrong interpretation of the THD::is_fatal_error was preventing this behavior. In a nutshell, "this variable is set to TRUE if an execution of a compound statement cannot continue. In particular, it is used to disable access to the CONTINUE or EXIT handlers of stored routines. So even temporary errors may have this variable set. To fix the bug, we have done what follows: DBUG_ENTER("has_temporary_error"); - if (thd->is_fatal_error) - DBUG_RETURN(0); - DBUG_EXECUTE_IF("all_errors_are_temporary_errors", if (thd->main_da.is_error()) {
[13 Aug 2009 18:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80772 3059 Alfranio Correia 2009-08-13 BUG#45694 Deadlock in replicated statement is not retried If the SQL Thread fails to execute an event due to a temporary error (e.g. ER_LOCK_DEADLOCK) and the option "--slave_transaction_retries" is set the SQL Thread should not be aborted and the transaction should be restarted from the beginning and re-executed. Unfortunately, a wrong interpretation of the THD::is_fatal_error was preventing this behavior. In a nutshell, "this variable is set to TRUE if an execution of a compound statement cannot continue. In particular, it is used to disable access to the CONTINUE or EXIT handlers of stored routines. So even temporary errors may have this variable set. To fix the bug, we have done what follows: DBUG_ENTER("has_temporary_error"); - if (thd->is_fatal_error) - DBUG_RETURN(0); - DBUG_EXECUTE_IF("all_errors_are_temporary_errors", if (thd->main_da.is_error()) {
[19 Aug 2009 15:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/81098 3059 Alfranio Correia 2009-08-19 BUG#45694 Deadlock in replicated statement is not retried If the SQL Thread fails to execute an event due to a temporary error (e.g. ER_LOCK_DEADLOCK) and the option "--slave_transaction_retries" is set the SQL Thread should not be aborted and the transaction should be restarted from the beginning and re-executed. Unfortunately, a wrong interpretation of the THD::is_fatal_error was preventing this behavior. In a nutshell, "this variable is set to TRUE if an execution of a compound statement cannot continue. In particular, it is used to disable access to the CONTINUE or EXIT handlers of stored routines. So even temporary errors may have this variable set. To fix the bug, we have done what follows: DBUG_ENTER("has_temporary_error"); - if (thd->is_fatal_error) - DBUG_RETURN(0); - DBUG_EXECUTE_IF("all_errors_are_temporary_errors", if (thd->main_da.is_error()) {
[24 Aug 2009 10:49]
Alfranio Tavares Correia Junior
Pushed to 5.1-bugteam.
[2 Sep 2009 16:42]
Bugs System
Pushed into 5.1.39 (revid:joro@sun.com-20090902154533-8actmfcsjfqovgsb) (version source revid:alfranio.correia@sun.com-20090824103744-fk8627t4c63eqznz) (merge vers: 5.1.39) (pib:11)
[9 Sep 2009 8:56]
Jon Stephens
Documented bugfix in the 5.1.39 changelog as follows: When slave_transaction_retries is set, a statement that replicates, but is then rolled back due to a deadlock on the slave, should be retried. However, in certain cases, replication was stopped with error 1213 (Deadlock found when trying to get lock; try restarting transaction) instead, even when this variable was set. Set status to NDI, waiting for additional pushes.
[14 Sep 2009 16:03]
Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090914155317-m1g9wodmndzdj4l1) (version source revid:alik@sun.com-20090914155317-m1g9wodmndzdj4l1) (merge vers: 5.4.4-alpha) (pib:11)
[16 Sep 2009 10:11]
Jon Stephens
Also documented in the 5.4.4 changelog. Closed.
[1 Oct 2009 5:59]
Bugs System
Pushed into 5.1.39-ndb-6.3.28 (revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (version source revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (merge vers: 5.1.39-ndb-6.3.28) (pib:11)
[1 Oct 2009 7:25]
Bugs System
Pushed into 5.1.39-ndb-7.0.9 (revid:jonas@mysql.com-20091001072547-kv17uu06hfjhgjay) (version source revid:jonas@mysql.com-20091001071652-irejtnumzbpsbgk2) (merge vers: 5.1.39-ndb-7.0.9) (pib:11)
[1 Oct 2009 13:25]
Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (version source revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (merge vers: 5.1.39-ndb-7.1.0) (pib:11)
[5 Oct 2009 10:50]
Bugs System
Pushed into 5.1.39-ndb-6.2.19 (revid:jonas@mysql.com-20091005103850-dwij2dojwpvf5hi6) (version source revid:jonas@mysql.com-20090930185117-bhud4ek1y0hsj1nv) (merge vers: 5.1.39-ndb-6.2.19) (pib:11)
[7 Oct 2009 1:47]
Paul DuBois
The 5.4 fix has been pushed into 5.4.3.
[14 May 2010 21:34]
Kevin Fries
Has a patch for this been ported into the 5.0 branch of mysql? I believe we're seeing this. Thanks
[25 Apr 2012 6:56]
Valeriy Kravchuk
Bug is NOT fixed in 5.0.x, even with 5.0.97. Sveta's original test case still ends up with error on slave: [openxs@chief mysql-test]$ ./mtr rpl_bug45694 Logging: ./mtr rpl_bug45694 MySQL Version 5.0.97 Skipping ndbcluster Setting mysqld to support SSL connections Binaries are debug compiled Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 9306 Using MASTER_MYPORT1 = 9307 Using SLAVE_MYPORT = 9308 Using SLAVE_MYPORT1 = 9309 Using SLAVE_MYPORT2 = 9310 Using IM_PORT = 9313 Using IM_MYSQLD1_PORT = 9314 Using IM_MYSQLD2_PORT = 9315 Killing Possible Leftover Processes Removing Stale Files Creating Directories Installing Master Database Installing Slave1 Database ======================================================= Starting Tests in the 'main' suite TEST RESULT TIME (ms) ------------------------------------------------------- rpl_bug45694 [ fail ] --- /home/openxs/dbs/5.0/mysql-test/r/rpl_bug45694.result 2012-04-25 09:54:11.110985973 +0300 +++ /home/openxs/dbs/5.0/mysql-test/r/rpl_bug45694.reject 2012-04-25 09:54:34.764717800 +0300 @@ -0,0 +1,91 @@ +stop slave; +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; +reset master; +reset slave; +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; +start slave; +DROP TABLE IF EXISTS deadlock1; +Warnings: +Note 1051 Unknown table 'deadlock1' +CREATE TABLE deadlock1 ( +id tinyint unsigned NOT NULL +) ENGINE=InnoDB; +DROP TABLE IF EXISTS deadlock2; +Warnings: +Note 1051 Unknown table 'deadlock2' +CREATE TABLE deadlock2 ( +id tinyint unsigned NOT NULL auto_increment, +somekey tinyint unsigned NOT NULL DEFAULT 0, +PRIMARY KEY (id), +INDEX (somekey) +) ENGINE=InnoDB; +CREATE TRIGGER aftupd_deadlock1 AFTER UPDATE ON deadlock1 FOR EACH ROW BEGIN +DECLARE v_key, v_sleep tinyint unsigned; +SELECT somekey INTO v_key +FROM deadlock2 +WHERE id = 1; +SELECT sleep(3) INTO v_sleep; +IF (v_key = 1) THEN +UPDATE deadlock2 SET somekey = somekey + 1 WHERE id = 1; +END IF; +END; +// +INSERT INTO deadlock1 VALUES +(0); +INSERT INTO deadlock2 VALUES +(1, 1); +UPDATE deadlock1 SET id = id + 1; +START TRANSACTION; +SELECT somekey +FROM deadlock2 +WHERE somekey = 1 +LOCK IN SHARE MODE; +somekey +1 +INSERT INTO deadlock1 VALUES (0); +INSERT INTO deadlock1 VALUES (0); +INSERT INTO deadlock1 VALUES (0); +SELECT sleep(3); +sleep(3) +0 +SELECT * +FROM deadlock2 +WHERE id = 1 +FOR UPDATE; +id somekey +1 1 +COMMIT; +show slave status; +Slave_IO_State Waiting for master to send event +Master_Host 127.0.0.1 +Master_User root +Master_Port 9306 +Connect_Retry 1 +Master_Log_File master-bin.000001 +Read_Master_Log_Pos 1597 +Relay_Log_File slave-relay-bin.000003 +Relay_Log_Pos 1545 +Relay_Master_Log_File master-bin.000001 +Slave_IO_Running Yes +Slave_SQL_Running No +Replicate_Do_DB +Replicate_Ignore_DB +Replicate_Do_Table +Replicate_Ignore_Table +Replicate_Wild_Do_Table +Replicate_Wild_Ignore_Table +Last_Errno 1213 +Last_Error Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE deadlock1 SET id = id + 1' +Skip_Counter 0 +Exec_Master_Log_Pos 1407 +Relay_Log_Space 1735 +Until_Condition None +Until_Log_File +Until_Log_Pos 0 +Master_SSL_Allowed No +Master_SSL_CA_File +Master_SSL_CA_Path +Master_SSL_Cert +Master_SSL_Cipher +Master_SSL_Key +Seconds_Behind_Master NULL ...