Bug #25648 | repair table hangs (deadlocks) with myisam_repair_threads > 1 | ||
---|---|---|---|
Submitted: | 16 Jan 2007 12:11 | Modified: | 7 Jun 2007 11:07 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: MyISAM storage engine | Severity: | S3 (Non-critical) |
Version: | 5.0.34 | OS: | Windows (w2k3) |
Assigned to: | Sergey Vojtovich | CPU Architecture: | Any |
Tags: | deadlock, myisam, myisam_repair_threads, repair table |
[16 Jan 2007 12:11]
Shane Bester
[6 Feb 2007 20:34]
Ingo Strüwing
This could be a duplicate of Bug#25042 (OPTIMIZE TABLE cause race condition in IO CACHE SHARE). Kristofer has a patch for it, but it won't fix this bug, probably. However Kristofer and I discussed about a fix, which could also fix this bug. I asked him if he will include this fix in his patch. Please stay tuned.
[8 Feb 2007 17:46]
Ingo Strüwing
Kristofer will add the fix for this problem when his current patch is approved. It is the prerequisite for this fix. But I am positive that it will fix this bug too.
[9 Mar 2007 9:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21573 ChangeSet@1.2595, 2007-03-09 10:53:45+01:00, thek@kpdesk.mysql.com +1 -0 Bug#25648 repair table hangs (deadlocks) with myisam_repair_threads > 1 - If the writer while the readers are a sleep, there won't be any process to unlock the io_cache later. This behavior caused the threaded algorithm in mi_repair_table to lose track on the number of running threads and later a dead lock would appear as a result of new threads waiting for threads which never would arrive. - In case that the writer has gone away, the reader-thread needs to increase the running_threads count on it's own.
[9 Mar 2007 11:00]
Ingo Strüwing
Ok from me with minor comment changes. Please see email.
[9 Mar 2007 11:03]
Ingo Strüwing
I don't think that a second review is absolutely necessary, but its up to Konstantin to decide. The patch is very small, but understanding why it fixes the problem could require some investigation for someone not so familiar with the io_cache_share.
[9 Mar 2007 12:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21594 ChangeSet@1.2595, 2007-03-09 13:22:34+01:00, thek@kpdesk.mysql.com +1 -0 Bug#25648 repair table hangs (deadlocks) with myisam_repair_threads > 1 - During parallel repair it could happen that the server lost track of the number of running threads and waited forever. - If the writer-thread left while the readers were asleep, there wouldn't be any thread to unlock the io_cache later. This behavior caused the threaded algorithm in mi_repair_table to lose track on the number of running threads. Later on, a dead lock would appear when new threads would be waiting for threads which never could arrive. - In case that the writer-thread has gone away, the reader-thread needs to increase the running_threads count on its own.
[9 Mar 2007 14:58]
Tomash Brechko
Sent review by e-mail, the patch requires some more thought.
[13 Mar 2007 9:49]
Kristofer Pettersson
These are my findings so far: When the example table b_hotel is loaded and “REPAIR TABLE b_hotel” is run, the server responds ok the first time. The second time it hangs in a dead lock. 23 threads are created (one for each key in the table). One of these threads is appointed writer-thread. In the dead lock, the writer-thread and some reader-threads were terminated leaving ~15 reader-threads behind. One thread is the repair master thread and it is waiting for all work to be completed: mi_check.c: mi_repair_parallel while( sort_info.threads_running ) pthread_mutex_wait(&sort_info.cond,&sort_info.mutex); The reader threads are in sort.c : thr_find_all_keys DBUG_PRINT("info", ("reading keys")); while (!(error= sort_param->sort_info->got_error) && !(error= (*sort_param->key_read)(sort_param, sort_keys[idx]))) { Key_read is calling the shared io-cache through: mysqld.exe!lock_io_cache(st_io_cache * cache=0x02042208, unsigned __int64 pos=100) Line 846 + 0x10 bytes C mysqld.exe!_my_b_read_r(st_io_cache * cache=0x02042208, unsigned char * Buffer=0x0d64fe0c, unsigned int Count=20) Line 985 + 0x11 bytes C mysqld.exe!_mi_read_cache(st_io_cache * info=0x02042208, unsigned char * buff=0x0d64fe0c, unsigned __int64 pos=100, unsigned int length=20, int flag=3) Line 84 + 0x16 bytes C mysqld.exe!sort_get_next_record(st_mi_sort_param * sort_param=0x02042200) Line 3175 + 0x2a bytes C mysqld.exe!sort_key_read(st_mi_sort_param * sort_param=0x02042200, void * key=0x020b3f76) Line 2987 + 0x9 bytes C > mysqld.exe!thr_find_all_keys(void * arg=0x02042200) Line 414 + 0x34 bytes C Inside lock_io_cache, the threads are stuck in two places, sometimes, all in any one place, sometimes in both: while ((!cshare->read_end || (cshare->pos_in_file < pos)) && cshare->running_threads) { DBUG_PRINT("io_cache_share", ("reader waits in lock")); pthread_cond_wait(&cshare->cond, &cshare->mutex); } Or while ((!cshare->read_end || (cshare->pos_in_file < pos)) && cshare->source_cache) { DBUG_PRINT("io_cache_share", ("reader waits in lock")); pthread_cond_wait(&cshare->cond, &cshare->mutex); } A possible solution might be related to a broken execution path when the writer leaves unexpectedly and the readers are set to EOF. In this case the mutex lock isn’t closed but left opened and the reader-threads exits immediatly. The variable that keeps the count on the threads in this lock is cshare->running_threads. This variable is set to cshare->total_threads when all threads have left the lock and should be zero when all threads are waiting on the lock. The broken execution path existed because running_threads wasn’t increased when a reader-thread left the function lock_io_cache without claiming the lock, and thus the running_threads count could end up negative. This happens on the following condition (inside lock_io_cache): if (!cshare->read_end || (cshare->pos_in_file < pos)) { DBUG_PRINT("io_cache_share", ("reader found writer removed. EOF")); cshare->read_end= cshare->buffer; /* Empty buffer. */ cshare->error= 0; /* EOF is not an error. */ } The suggtested patch for this is if (!cshare->read_end || (cshare->pos_in_file < pos)) { DBUG_PRINT("io_cache_share", ("reader found writer removed. EOF")); /* If (cshare->pos_in_file < pos) is true above, this block will be executed once for every thread, so we increase thread counter by one. However, if (cshare->pos_in_file < pos) is false, !cshare->read_end will be true only once, as we set cshare->read_end below. In that case we restore cshare->running_threads right away. */ if (cshare->pos_in_file < pos) { cshare->running_threads++; } else { cshare->running_threads= cshare->total_threads; } cshare->read_end= cshare->buffer; /* Empty buffer. */ cshare->error= 0; /* EOF is not an error. */ } Unfortunatly this does not resolve the dead lock.
[7 Jun 2007 11:07]
Sergey Vojtovich
Cannot repeat this with recent sources anymore. Tested various combinations on different windows versions. Shane failed to repeat the problem too.
[23 Sep 2013 7:13]
MySQL Verification Team
note to self. bkecman sees this on recent 5.6 and 5.5: http://pastebin.com/raw.php?i=p3YbrkqZ