Bug #7823 | FLUSH TABLES WITH READ LOCK + INSERT DELAYED = deadlock | ||
---|---|---|---|
Submitted: | 12 Jan 2005 2:49 | Modified: | 28 Apr 2005 16:07 |
Reporter: | Christopher L. Cousins | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.0.23 | OS: | Linux (Red Hat Enterprise Linux) |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
[12 Jan 2005 2:49]
Christopher L. Cousins
[12 Jan 2005 2:52]
Christopher L. Cousins
generate 'INSERT DELAYED ...' SQL file
Attachment: id.py (application/octet-stream, text), 435 bytes.
[12 Jan 2005 2:52]
Christopher L. Cousins
generate 'FLUSH TABLES WITH READ LOCK' SQL file
Attachment: ftwrl.py (application/octet-stream, text), 143 bytes.
[4 Feb 2005 14:37]
Ingo Strüwing
My test script. We cannot test concurrency in the normal MySQL test suite.
Attachment: bug7823-1.sh (application/x-sh, text), 5.51 KiB.
[4 Feb 2005 15:23]
Ingo Strüwing
Thank you for your bug report! What you found was just the tip of an iceberg. At first I was not able to reproduce this on Linux 2.6.9 or 2.6.10. So I "down"-graded my PC to Linux 2.4.27. This was a success. Your analyze is very good. And your fix did avoid one possible path to the described deadlock. But it is much more complicated than necessary. A pair of lock/unlock around mysql_lock_tables() is sufficient to avoid this deadlock. The latter fixes also a situation where one thread holds the global read lock while another one flushes the tables. Nevertheless, digging into this part of the code for the first time, I became curious, if this would be the only possible locking problem. To provoke more deadlocks, I inserted some sleeps into critical paths. And behold, this did indeed produce more of them. The situation became so complicated that I needed to compile an un-optimized mysqld for reliable debugging. I even excluded as much as possible (e.g. innodb) to get less threads and faster turn-around times. Since I have a notebook and a desktop, I ran all tests on two machines. One with 2.4 and one with 2.6. This rewarded me with different deadlock on both machines. So I jumped from one screen to the other, staring at debugger windows, figuring out why the threads got locked. At the beginning, this process was hard, since I did not have any idea, how this code works. After some successes in deadlock detection by shifting the sleeps around, I found even spin loops. Since these were even less obvious than the deadlocks, I decided to learn, how this code does really work. So I flooded it with DBUG_PRINTs with the tag "thrlock" so that I could select only my own trace messages. After some time I got somewhat familiar with this piece of code. Now I could place my sleeps with more effect and understood the locks/loops faster. This made me comfortable with rewriting part of it and fixing all the locking glitches I found. The result is that I fixed about seven deadlocks and two spin loops. I stopped searching after I did not find anything after shifting the sleeps three times in a row. This does not mean, that I can be sure that I found all possible problems. In other words, I look forward to your next report. ;) The changeset for the proposed changes is: bk commit - 4.0 tree (ingo:1.2040) BUG#7823 Date: Fri, 04 Feb 2005 14:32:31 +0100
[22 Mar 2005 7:43]
Ingo Strüwing
The patch was rejected by Monty. The solution is wrong. The delayed insert handler, handle_delayed_insert(), needs to call a mysql_lock_tables(), which does not block on a global read lock.
[22 Apr 2005 17:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24233
[27 Apr 2005 10:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24360
[27 Apr 2005 11:04]
Ingo Strüwing
The new solution is so that the handler will not wait for a global read lock. It will insert in spite of an existing global read lock. The idea is that the request for a global read lock is always coupled with a FLUSH TABLES. And the command FLUSH TABLES WITH READ LOCK will not return to the client until all existing write locks are gone and all tables are closed. The flush will awake the handler so that it writes all rows which are in its queue and terminate after closing its table. That way, all inserts which have already been accepted by the server, will be completed before the global read lock is acknowledged. If an insert command starts before a global read lock is requested, it protects against global read lock before creating the handler until after the handler has opened the table. That way the handler can cleanly be terminated by a FLUSH TABLES, but will be able to complete the insert before. If an insert command starts after a global read lock is requested, the protection against global read lock will fail and an direct (non-delayed) insert is tried. The corresponding table lock will however block until the global read lock is gone.
[27 Apr 2005 18:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24396
[27 Apr 2005 20:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24404
[28 Apr 2005 16:07]
Paul DuBois
Noted in 4.0.25, 4.1.12, 5.0.6 changelogs.