Bug #33072 | Falcon table deadlocks when concurrent multi-query delete trans. w/trig are run | ||
---|---|---|---|
Submitted: | 7 Dec 2007 19:34 | Modified: | 2 May 2008 23:25 |
Reporter: | Omer Barnir (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.4 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[7 Dec 2007 19:34]
Omer Barnir
[7 Dec 2007 19:35]
Omer Barnir
test files for running the test
Attachment: files.tar (application/x-tar, text), 10.00 KiB.
[7 Dec 2007 20:16]
Kevin Lewis
Omer, What is your value for falcon_lock_timeout? This timeout was added to solve Bug#29452-Falcon: two-way deadlock with unique index and trigger. Innodb uses the same kind of timeout to avoid a deadlock. But whereas the default value for innodb_lock_wait_timeout=50, the default value for falcon_lock_timeout=0 (which means no timeout). Please try this again with falcon_lock_timeout=50
[8 Dec 2007 2:27]
Omer Barnir
I reran the test with 'falcon_lock_timeout=50'. This does make a difference in the sense that there are no deadlock but it seems that the 'concurrent' threads are processed in a serial manner. If you run the above command with 10 concurrent threads for 300 seconds, and monitor the processlist you will see that the threads are all locked with *one* being updated. The updated status changes from one thread to another but the threads themselves do not change. After about 114-140 seconds the treads are released and others appear instead. If You run the same test with an innodb table you will see that *all* the threads are being in a status of 'updated' (not just one), and the threads constantly change (old ones disappear and new ones appear).
[14 Dec 2007 5:33]
Kevin Lewis
This deadlock does not happen when --falcon-lock-timeout is set to something other than zero. Care should be taken that this timeout is not too short. Suggest 500 miliseconds. Omer opened Bug#33191 to track a follow-on performance issue that Falcon threads seem to serialize with this test.
[15 Dec 2007 4:33]
Kevin Lewis
OK, per Omer's advice, this is being set to Documenting. The normal way out of this kind of deadlock, the way InnoDB does it and the way Falcon should do it, is to set --falcon-lock-timeout=500, or some value other than the default of zero.
[22 Dec 2007 21:05]
Kevin Lewis
Philip, Please specify the steps you used to get this current deadlock. It is not clear what you were doing differently from Omer's original test. Omer reported that the original test succeeded without a deadlock. Also note that the default for innodb_lock_wait_timeout = 50 seconds. falcon_lock_timeout is in miliseconds, so 50 is too small. Can you try this again with falcon_lock_timeout=10000?
[23 Dec 2007 9:16]
Philip Stoev
Kevin, yes you are right, the timeout *does* help in the sense that there is no classical deadlock -- I was not careful enough in my reading of SHOW PROCESSLIST. That said, here are a few more observations: * As Omer observed, statements become serialized, running at 1 statement each falcon_lock_timeout milliseconds. The stored procedure in question runs 100 such statements and completes in 100 * falcon_lock_timeout * number_of_threads (50) msec. No CPU activity takes place during the falcon_lock_timeout period. * SELECT statements on the same table started in a separate connection take anywhere from 10 seconds to 2 minutes to run. It appears to me this means both that they are affected by the timeout (was 10 seconds at the time of observation), that writes block reads and that writes starve reads (cpu usage is very low all the time). * No Innodb tables of any kind are involved. * Some queries will take up to twice their falcon_lock_timeout value -- if the value is 10 seconds, the query may get cleaned up at exactly 20 seconds. I am unable to say if the queries from the test case form a legitimate real-life-worklog (Omer has opened a separate bug for that), however I humbly suggest that the manual lists all situations where a timeout would apply v.s. deadlock detection using a graph - if the timeout applies when triggers and partitions are involved, the user should be aware of that since it is not immediately intuitive.
[23 Dec 2007 17:39]
Philip Stoev
Please note that all participants in the test case are two Falcon tables, one stored procedure and one trigger. They are all located in the file t/del_lock_init.test from the archive that Omer attached to this bug. There are no MyISAM, and no Innodb tables involved.
[23 Dec 2007 17:40]
Jeffrey Pugh
Pasting Philip's and Kevin's latest comments: KEVIN: Philip, There was an allusion in the bug comment that you are seeing this timeout where there are "Falcon transactions not touching on another engine". This means that you are doing a different test than what is documented in 33072. As I asked in the bug report, please add a detailed step by step 'how to reproduce' since I am very confused as to what you are doing here. Thanks for the observations about your testing. They go a long way in understanding the problem as a serialization of threads behind the wait lock. It is great to open a new bug, or add to Omer's 33191, which is essentially the same thing. But if your test sequence does not involve a MYISAM table and triggers like Omer's then it may be a different bug. Once again, I really don't know what test you are running if "Falcon transactions are not touching on another engine". PHILIP: What I just did is take Omer's files from the bug report and run them with this command line: ./mysql-test-run.pl --mysql=--falcon-lock-timeout=10000 --stress --stress-init-file=del_lock_init.txt --stress-test-file=del_lock_run.txt --stress-test-duration=6000 --stress-threads=5 which is essentially his initial scenario with an added =--falcon-lock-timeout=10000 What I observe is the lock pattern that Omer observed when he used --falcon-lock-timeout: Transactions eventually get resolved (and commit) however only after each one has reached the 10-th second of lifetime. Please note that Omer's example contains only two tables and one trigger. Both tables are Falcon -- there are no Innodb and no MyISAM tables involved. I hope this helps -- please let me know if I can do anything else -- for example, set up the scenario on a machine where you can log in and observe. Philip
[27 Dec 2007 5:23]
Kevin Lewis
The solution the pure deadlock described in this bug is to change the default lock timeout from zero (no timeout) to 50 seconds, like the setting in InnoDB. As part of this fix, Bug#33474 is also done which changes falcon_lock_timeout to falcon_lock_wait_timeout and changes the setting from milliseconds to seconds. This makes the setting more similar to innodb_lock_wait_timeout. Also, since the name has changed, existing scripts will not be left inadvertently referring to milliseconds instead of seconds. All engine starting scripts will have to change. It is important to make this change before Beta.
[11 Feb 2008 19:55]
Kevin Lewis
Patch is in mysql-6.0-falcon-team and mysql-6.0-release version 6.0.4
[12 Mar 2008 23:02]
Bugs System
Pushed into 6.0.4-alpha
[2 May 2008 23:25]
Paul DuBois
Noted in 6.0.4 changelog. The falcon_lock_timeout system variable, which had a value in milliseconds, has been replaced with falcon_lock_wait_timeout, which has a value in seconds. The default value of falcon_lock_wait_timeout is 50 seconds. This has been done for better name and unit consistency with the innodb_lock_wait_timeout system variable. Uses of the old variable should be converted to use the new variable.
[5 Jun 2009 17:39]
Zhang Lu
Well, I think you guys missed something here. When this "where id = name_const('local_var', 1234)" happens, the optimizer does not use index. So all of that became table scan and slow .....