| Bug #16986 | Deadlock condition with MyISAM tables | ||
|---|---|---|---|
| Submitted: | 31 Jan 2006 21:43 | Modified: | 27 Jun 2006 7:00 | 
| Reporter: | Raymond DeRoo | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: MyISAM storage engine | Severity: | S2 (Serious) | 
| Version: | 5.0.18 | OS: | Linux (Linux) | 
| Assigned to: | Ingo Strüwing | CPU Architecture: | Any | 
   [31 Jan 2006 21:43]
   Raymond DeRoo        
  
 
   [17 Feb 2006 11:13]
   Valeriy Kravchuk        
  Verified just as described with latest 5.0.19-BK (ChangeSet@1.2049.1.3, 2006-02-16 10:00:14-06:00) on SuSE 9.3. I think, it is a potential showstopper.
   [18 May 2006 6:28]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6548
   [22 May 2006 16:07]
   jocelyn fournier        
  Hi, Is this bug also present in the 4.x tree ? Thanks, Jocelyn
   [23 May 2006 7:02]
   Ingo Strüwing        
  Yes, it is. I'll check if we will fix it in 4.1 too.
   [23 May 2006 16:39]
   jocelyn fournier        
  Hi, Ok, so it seems I have the same problem with the 4.1 tree, which prevent me from running mysqlcheck -o in a cron. Regards, Jocelyn
   [23 May 2006 18:08]
   Ingo Strüwing        
  While this might be possible, I still doubt it. The test case contains a LOCK TABLE WRITE. Do you have this in your application? And even if you have, the deadlock implied the write lock was taken by the thread that ran OPTIMIZE. I don't believe that mysqlcheck does it. If you can repeat a lockup with mysqlcheck, can you please post the output of SHOW PROCESSLIST? Regards, Ingo
   [23 May 2006 18:32]
   jocelyn fournier        
  Hi, It seems the problem occurs when a backup (through mysqldump) and the optimize occur at the same time through the cron. Since the backup is launching some LOCK TABLE, it could hit a deadlock. Regards, Jocelyn
   [24 May 2006 11:09]
   Michael Widenius        
  The proposed patch touches a little too much things in my liking. I have provided Ingo with a smaller, much less intrusive patch that he is now testing. If the patch works, it's fine to also add this in 4.1
   [26 May 2006 10:26]
   Michael Widenius        
  Fix will be in 5.0.22
   [26 May 2006 10:26]
   Michael Widenius        
  Fix will be in 5.0.22 and 5.1.10
   [26 May 2006 16:15]
   Paul DuBois        
  Noted in 5.0.23, 5.1.10 changelogs. MyISAM table deadlock was possible if one thread issued a LOCK TABLES request for write locks and then an administrative statement such as OPTIMIZE TABLE, if between the two statements another client meanwhile issued a multiple-table SELECT for some of the locked tables.
   [23 Jun 2006 9:07]
   Ingo Strüwing        
  The stress test suite found a couple of deadlocks that are related to this bug fix.
   [25 Jun 2006 16:21]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8208
   [26 Jun 2006 12:49]
   Michael Widenius        
  Approved, as long as the following change is done: Replace: VOID(pthread_cond_broadcast(&COND_refresh)); VOID(pthread_cond_broadcast(&COND_global_read_lock)); This will make it easer to ensure that we will always call the two broadcast at the same time.
   [26 Jun 2006 17:16]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8256
   [27 Jun 2006 7:00]
   Ingo Strüwing        
  Addendum fixes after changing the condition variable for the global read lock. The stress test suite revealed some deadlocks. Some were related to the new condition variable (COND_global_read_lock) and some were general problems with the global read lock. It is now necessary to signal COND_global_read_lock whenever COND_refresh is signalled. We need to wait for the release of a global read lock if one is set before every operation that requires a write lock. But we must not wait if we have locked tables by LOCK TABLES. After setting a global read lock a thread waits until all write locks are released. Pushed to 5.0.23 main for the upcoming clone off. Not yet merged to 5.1.
   [15 Oct 2008 11:02]
   Ingo Strüwing        
  Here is the reason, why I made the above changes: The stress test suite discovered a possible deadlock around the global read lock handling: con1 starts FLUSH TABLES WITH READ LOCK. con2 starts INSERT INTO t1. con1 locks the global read lock. con2 opens t1. con1 flushes tables, waiting for t1 to become closed. con2 wants to lock t1. It notices the global read lock and waits... No protection against global read lock was involved here. The problem is that there is a time slice between lock_global_read_lock() and close_cached_tables(), where opening and locking of tables can be attempted. Opening will succeed, and locking (for writing) will block on the global read lock, without closing the opened tables, which in turn block close_cached_tables(). My fix was to acquire a protection against the global read lock before open_tables() for operations that take a write lock. Only write operations wait in mysql_lock_tables() if a global read lock exists. Since the change they sit and wait for a global read lock with no open table. Admittedly the use of a protection against the global read lock changes lock_global_read_lock() so that it needs to wait for the last DML to drop its protection against the global read lock. This isn't a problem for FLUSH TABLES WITH READ LOCK, because we always have close_cached_tables() after lock_global_read_lock(), which would have to wait for the end of the DMLs and their closing of the tables anyway. Unfortunately my patch doesn't change all write operations. At least UPDATE has been forgotten. If all write operations would be covered, the waiting for a global read lock would become obsolete in mysql_lock_tables(). If all write operations are changed to start with wait_if_global_read_lock() *and* the wait is removed from mysql_lock_tables(), the protection against global read lock could also go away. This is controlled by the second argument of wait_if_global_read_lock(), "abort_on_refresh". This change would cause writing operations to open_tables(), lock_tables(), do their work, unlock, and close, even if the global read lock is acquired right after they passed the check in wait_if_global_read_lock(). But then it means that the database can be changed while the global read lock exists. Only the completion of close_cached_tables() would signal the start of the modification-free phase.
