Bug #20697 | slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou | ||
---|---|---|---|
Submitted: | 26 Jun 2006 13:27 | Modified: | 21 Nov 2006 19:56 |
Reporter: | Sebastian Nohn | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.0.18-standard,5.0.22 | OS: | Linux (Linux) |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
[26 Jun 2006 13:27]
Sebastian Nohn
[27 Jun 2006 7:06]
Sebastian Nohn
No workaround possible.
[3 Jul 2006 12:41]
Andrei Elkin
Sebastian, thanks for your report. I read through logs and diagnosis could be violation of transaction atomicity unless some other, maybe slave side, connection concurrently inserted a record with id_advertisement==056553 and date_created==2006-06-26 15:10:04 so that we see the second slave's complaint 060626 15:10:07 [ERROR] Slave: Error 'Duplicate entry '2056553' Can you make sure that this does not happen? While you don't exclude this possibility it makes me me wondering if reported reported claim is correct: The insert has been commited to the slave: root@localhost [dms_2_0]> select date_created from advertisement where id_advertisement = 2056553; +---------------------+ | date_created | +---------------------+ | 2006-06-26 15:10:04 |
[3 Jul 2006 12:52]
Sebastian Nohn
I am sure that noone and nothing else performed an INSERT with the same primary key. The bug was reproduced on a separate machine not included into any development- or production-enviroment. My claim about the insert is exactly the problem. There is NO other entity doing that INSERT, that INSERT happens via replication as you can see from the relay.log. It happens in a transaction that is not properly rolled back and repeated in a LOCK (and maybe other) situation.
[19 Jul 2006 15:49]
MySQL Verification Team
testcase for 4.1.15->5.0.18 for csc issue. Will test 5.0-bk next.
Attachment: testcase.sql (text/x-delimtext), 1.15 KiB.
[20 Jul 2006 8:08]
MySQL Verification Team
Andrei, See attached .sql file which has repeatable testcase. Verified against 5.0.22 master --> 5.0.22 slave.
[17 Aug 2006 13:58]
Andrei Elkin
Shane, thanks for the script which I run to confess that obviously slave indeed does not rollback replicated transaction.
[22 Sep 2006 8:54]
Sebastian Nohn
Any news on this issue?
[5 Oct 2006 16:59]
Andrei Elkin
ETA upon the end of analysis by tomorrow.
[6 Oct 2006 6:44]
Andrei Elkin
Sebastian, a bit late note, sorry. i wonder if increasing innodb_lock_wait_timeout can help for you customer.
[6 Oct 2006 17:55]
Craig Howland
I am encountering the same problem on ubuntu. We have two slaves of the same master, one on ubuntu: ubuntu 2.6.15-23-amd64 x86_64 mysql Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (x86_64) using readline 5.1 and one on redhat: redhat 2.6.9-42.0.2.ELsmp x86_64 mysql Ver 14.12 Distrib 5.0.24a, for unknown-linux-gnu (x86_64) using readline 5.0 On the redhat machine, we have nothing else running besides replication. On the ubuntu machine, we are running some heavy duty reports that are locking some of the innodb tables into which replication is trying to insert. One of those reports is failing frequently with the error: ERROR 1213 (40001) at line 826: Deadlock found when trying to get lock; try restarting transaction Relication on the ubuntu machine is periodically failing with: Duplicate entry 'some number X' for key 1 When I go to investigate, I find that 'some number X' already exists in the table, and its attribute fields are identical to what replication is trying to insert. When I use the: set sql_slave_skip_counter = 1; technique, I hit the rest of the insert statements in that transaction, and they also each error out with the duplicate key error, and their data is also identical to what is already in the table. It looks like the entire transaction is being replayed (although I did not verify that every single statement in the transaction hit the error -- if one or two of them at the end of the transaction did not get the error, I would not have noticed). I have not yet tied the replication errors to the same time as the deadlock errors because I have not been looking for that, but the deadlock errors are happening more frequently (6/day) than the replication errors (1/2 days) The same transaction gets replicated to the redhat slave with no error. There is no process on either slave which modifies data in the tables in question. show variables like 'innodb_lock_wait_timeout'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | innodb_lock_wait_timeout | 50 | +--------------------------+-------+ To what value do you suggest changing this variable? In order to eliminate these two errors, we are currently modifying the reports to hit MyISAM copies of the InnoDB tables. A set of triggers on the InnoDB tables will insert the primary key of the changed rows in another InnoDB table, and a periodic process will read the list of changed primary keys and apply the changes to the MyISAM table.
[6 Oct 2006 21:35]
Andrei Elkin
Craig, I just submitted a patch for this problem. It is in that slave sql thread waits over innodb_lock_wait_timeout, and after that the replicated transaction failed to replay. You must have hit the same problem if your slave's error log containsthe line Slave: Error 'Lock wait timeout exceeded; try restarting transaction' on query. It would be safe for your data to raise the timeout value meanwhile the patch is under reviewing. The increment depends on your slave's local transactions duration and their rate. I hope my patch will be approved soon. Thanks for your report.
[9 Oct 2006 20:56]
Craig Howland
I increased the innodb_lock_wait_timeout to 100 and it hasn't had a noticeable impact. I'm going to increase it to 300 with which I think I'll have better luck. I had previously set innodb_support_xa to 0. Could this be related? Would it be worth setting this back to 1?
[10 Oct 2006 11:00]
Andrei Elkin
The patch appeared to go to closed bug#16228. Graig, i don't think this issue, requiring a record on the exceeded timeout, has any relation to (except from 14.5.4 `InnoDB' Startup Options and System Variables) * `innodb_support_xa' When set to `ON' or 1 (the default), this variable enables `InnoDB' support for two-phase commit in XA transactions. Enabling `innodb_support_xa' causes an extra disk flush for transaction preparation. If you don't care about using XA, you can disable this variable by setting it to `OFF' or 0 to reduce the number of disk flushes and get better `InnoDB' performance.
[11 Oct 2006 8:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/13450 ChangeSet@1.2297, 2006-10-11 11:07:20+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +1 -0 bug#20697 - results in sync with the test
[30 Oct 2006 15:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/14566 ChangeSet@1.2323, 2006-10-30 17:09:28+02:00, aelkin@dsl-hkigw8-feaef900-46.dhcp.inet.fi +2 -0 BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou post-merge fixes in test/result
[20 Nov 2006 15:04]
Lars Thalmann
Pushed into 5.0.32. Pushed into 5.1.14.
[21 Nov 2006 19:56]
Paul DuBois
Noted in 5.0.32, 5.1.14 changelogs. On slave servers, transactions that exceeded the lock wait timeout failed to roll back properly.