Bug #16986 | Deadlock condition with MyISAM tables | ||
---|---|---|---|
Submitted: | 31 Jan 2006 21:43 | Modified: | 27 Jun 2006 7:00 |
Reporter: | Raymond DeRoo | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: MyISAM storage engine | Severity: | S2 (Serious) |
Version: | 5.0.18 | OS: | Linux (Linux) |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
[31 Jan 2006 21:43]
Raymond DeRoo
[17 Feb 2006 11:13]
Valeriy Kravchuk
Verified just as described with latest 5.0.19-BK (ChangeSet@1.2049.1.3, 2006-02-16 10:00:14-06:00) on SuSE 9.3. I think, it is a potential showstopper.
[18 May 2006 6:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6548
[22 May 2006 16:07]
jocelyn fournier
Hi, Is this bug also present in the 4.x tree ? Thanks, Jocelyn
[23 May 2006 7:02]
Ingo Strüwing
Yes, it is. I'll check if we will fix it in 4.1 too.
[23 May 2006 16:39]
jocelyn fournier
Hi, Ok, so it seems I have the same problem with the 4.1 tree, which prevent me from running mysqlcheck -o in a cron. Regards, Jocelyn
[23 May 2006 18:08]
Ingo Strüwing
While this might be possible, I still doubt it. The test case contains a LOCK TABLE WRITE. Do you have this in your application? And even if you have, the deadlock implied the write lock was taken by the thread that ran OPTIMIZE. I don't believe that mysqlcheck does it. If you can repeat a lockup with mysqlcheck, can you please post the output of SHOW PROCESSLIST? Regards, Ingo
[23 May 2006 18:32]
jocelyn fournier
Hi, It seems the problem occurs when a backup (through mysqldump) and the optimize occur at the same time through the cron. Since the backup is launching some LOCK TABLE, it could hit a deadlock. Regards, Jocelyn
[24 May 2006 11:09]
Michael Widenius
The proposed patch touches a little too much things in my liking. I have provided Ingo with a smaller, much less intrusive patch that he is now testing. If the patch works, it's fine to also add this in 4.1
[26 May 2006 10:26]
Michael Widenius
Fix will be in 5.0.22
[26 May 2006 10:26]
Michael Widenius
Fix will be in 5.0.22 and 5.1.10
[26 May 2006 16:15]
Paul DuBois
Noted in 5.0.23, 5.1.10 changelogs. MyISAM table deadlock was possible if one thread issued a LOCK TABLES request for write locks and then an administrative statement such as OPTIMIZE TABLE, if between the two statements another client meanwhile issued a multiple-table SELECT for some of the locked tables.
[23 Jun 2006 9:07]
Ingo Strüwing
The stress test suite found a couple of deadlocks that are related to this bug fix.
[25 Jun 2006 16:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8208
[26 Jun 2006 12:49]
Michael Widenius
Approved, as long as the following change is done: Replace: VOID(pthread_cond_broadcast(&COND_refresh)); VOID(pthread_cond_broadcast(&COND_global_read_lock)); This will make it easer to ensure that we will always call the two broadcast at the same time.
[26 Jun 2006 17:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8256
[27 Jun 2006 7:00]
Ingo Strüwing
Addendum fixes after changing the condition variable for the global read lock. The stress test suite revealed some deadlocks. Some were related to the new condition variable (COND_global_read_lock) and some were general problems with the global read lock. It is now necessary to signal COND_global_read_lock whenever COND_refresh is signalled. We need to wait for the release of a global read lock if one is set before every operation that requires a write lock. But we must not wait if we have locked tables by LOCK TABLES. After setting a global read lock a thread waits until all write locks are released. Pushed to 5.0.23 main for the upcoming clone off. Not yet merged to 5.1.
[15 Oct 2008 11:02]
Ingo Strüwing
Here is the reason, why I made the above changes: The stress test suite discovered a possible deadlock around the global read lock handling: con1 starts FLUSH TABLES WITH READ LOCK. con2 starts INSERT INTO t1. con1 locks the global read lock. con2 opens t1. con1 flushes tables, waiting for t1 to become closed. con2 wants to lock t1. It notices the global read lock and waits... No protection against global read lock was involved here. The problem is that there is a time slice between lock_global_read_lock() and close_cached_tables(), where opening and locking of tables can be attempted. Opening will succeed, and locking (for writing) will block on the global read lock, without closing the opened tables, which in turn block close_cached_tables(). My fix was to acquire a protection against the global read lock before open_tables() for operations that take a write lock. Only write operations wait in mysql_lock_tables() if a global read lock exists. Since the change they sit and wait for a global read lock with no open table. Admittedly the use of a protection against the global read lock changes lock_global_read_lock() so that it needs to wait for the last DML to drop its protection against the global read lock. This isn't a problem for FLUSH TABLES WITH READ LOCK, because we always have close_cached_tables() after lock_global_read_lock(), which would have to wait for the end of the DMLs and their closing of the tables anyway. Unfortunately my patch doesn't change all write operations. At least UPDATE has been forgotten. If all write operations would be covered, the waiting for a global read lock would become obsolete in mysql_lock_tables(). If all write operations are changed to start with wait_if_global_read_lock() *and* the wait is removed from mysql_lock_tables(), the protection against global read lock could also go away. This is controlled by the second argument of wait_if_global_read_lock(), "abort_on_refresh". This change would cause writing operations to open_tables(), lock_tables(), do their work, unlock, and close, even if the global read lock is acquired right after they passed the check in wait_if_global_read_lock(). But then it means that the database can be changed while the global read lock exists. Only the completion of close_cached_tables() would signal the start of the modification-free phase.