| Bug #25648 | repair table hangs (deadlocks) with myisam_repair_threads > 1 | ||
|---|---|---|---|
| Submitted: | 16 Jan 2007 12:11 | Modified: | 7 Jun 2007 11:07 |
| Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: MyISAM storage engine | Severity: | S3 (Non-critical) |
| Version: | 5.0.34 | OS: | Windows (w2k3) |
| Assigned to: | Sergey Vojtovich | CPU Architecture: | Any |
| Tags: | deadlock, myisam, myisam_repair_threads, repair table | ||
[6 Feb 2007 20:34]
Ingo Strüwing
This could be a duplicate of Bug#25042 (OPTIMIZE TABLE cause race condition in IO CACHE SHARE). Kristofer has a patch for it, but it won't fix this bug, probably. However Kristofer and I discussed about a fix, which could also fix this bug. I asked him if he will include this fix in his patch. Please stay tuned.
[8 Feb 2007 17:46]
Ingo Strüwing
Kristofer will add the fix for this problem when his current patch is approved. It is the prerequisite for this fix. But I am positive that it will fix this bug too.
[9 Mar 2007 9:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21573 ChangeSet@1.2595, 2007-03-09 10:53:45+01:00, thek@kpdesk.mysql.com +1 -0 Bug#25648 repair table hangs (deadlocks) with myisam_repair_threads > 1 - If the writer while the readers are a sleep, there won't be any process to unlock the io_cache later. This behavior caused the threaded algorithm in mi_repair_table to lose track on the number of running threads and later a dead lock would appear as a result of new threads waiting for threads which never would arrive. - In case that the writer has gone away, the reader-thread needs to increase the running_threads count on it's own.
[9 Mar 2007 11:00]
Ingo Strüwing
Ok from me with minor comment changes. Please see email.
[9 Mar 2007 11:03]
Ingo Strüwing
I don't think that a second review is absolutely necessary, but its up to Konstantin to decide. The patch is very small, but understanding why it fixes the problem could require some investigation for someone not so familiar with the io_cache_share.
[9 Mar 2007 12:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21594 ChangeSet@1.2595, 2007-03-09 13:22:34+01:00, thek@kpdesk.mysql.com +1 -0 Bug#25648 repair table hangs (deadlocks) with myisam_repair_threads > 1 - During parallel repair it could happen that the server lost track of the number of running threads and waited forever. - If the writer-thread left while the readers were asleep, there wouldn't be any thread to unlock the io_cache later. This behavior caused the threaded algorithm in mi_repair_table to lose track on the number of running threads. Later on, a dead lock would appear when new threads would be waiting for threads which never could arrive. - In case that the writer-thread has gone away, the reader-thread needs to increase the running_threads count on its own.
[9 Mar 2007 14:58]
Tomash Brechko
Sent review by e-mail, the patch requires some more thought.
[13 Mar 2007 9:49]
Kristofer Pettersson
These are my findings so far:
When the example table b_hotel is loaded and “REPAIR TABLE b_hotel” is run, the server responds ok the first time. The second time it hangs in a dead lock.
23 threads are created (one for each key in the table). One of these threads is appointed writer-thread.
In the dead lock, the writer-thread and some reader-threads were terminated leaving ~15 reader-threads behind.
One thread is the repair master thread and it is waiting for all work to be completed:
mi_check.c: mi_repair_parallel
while( sort_info.threads_running )
pthread_mutex_wait(&sort_info.cond,&sort_info.mutex);
The reader threads are in
sort.c : thr_find_all_keys
DBUG_PRINT("info", ("reading keys"));
while (!(error= sort_param->sort_info->got_error) &&
!(error= (*sort_param->key_read)(sort_param, sort_keys[idx])))
{
Key_read is calling the shared io-cache through:
mysqld.exe!lock_io_cache(st_io_cache * cache=0x02042208, unsigned __int64 pos=100) Line 846 + 0x10 bytes C
mysqld.exe!_my_b_read_r(st_io_cache * cache=0x02042208, unsigned char * Buffer=0x0d64fe0c, unsigned int Count=20) Line 985 + 0x11 bytes C
mysqld.exe!_mi_read_cache(st_io_cache * info=0x02042208, unsigned char * buff=0x0d64fe0c, unsigned __int64 pos=100, unsigned int length=20, int flag=3) Line 84 + 0x16 bytes C
mysqld.exe!sort_get_next_record(st_mi_sort_param * sort_param=0x02042200) Line 3175 + 0x2a bytes C
mysqld.exe!sort_key_read(st_mi_sort_param * sort_param=0x02042200, void * key=0x020b3f76) Line 2987 + 0x9 bytes C
> mysqld.exe!thr_find_all_keys(void * arg=0x02042200) Line 414 + 0x34 bytes C
Inside lock_io_cache, the threads are stuck in two places, sometimes, all in any one place, sometimes in both:
while ((!cshare->read_end || (cshare->pos_in_file < pos)) &&
cshare->running_threads)
{
DBUG_PRINT("io_cache_share", ("reader waits in lock"));
pthread_cond_wait(&cshare->cond, &cshare->mutex);
}
Or
while ((!cshare->read_end || (cshare->pos_in_file < pos)) &&
cshare->source_cache)
{
DBUG_PRINT("io_cache_share", ("reader waits in lock"));
pthread_cond_wait(&cshare->cond, &cshare->mutex);
}
A possible solution might be related to a broken execution path when the writer leaves unexpectedly and the readers are set to EOF.
In this case the mutex lock isn’t closed but left opened and the reader-threads exits immediatly. The variable that keeps the count on the threads in this lock is cshare->running_threads. This variable is set to cshare->total_threads when all threads have left the lock and should be zero when all threads are waiting on the lock.
The broken execution path existed because running_threads wasn’t increased when a reader-thread left the function lock_io_cache without claiming the lock, and thus the running_threads count could end up negative. This happens on the following condition (inside lock_io_cache):
if (!cshare->read_end || (cshare->pos_in_file < pos))
{
DBUG_PRINT("io_cache_share", ("reader found writer removed. EOF"));
cshare->read_end= cshare->buffer; /* Empty buffer. */
cshare->error= 0; /* EOF is not an error. */
}
The suggtested patch for this is
if (!cshare->read_end || (cshare->pos_in_file < pos))
{
DBUG_PRINT("io_cache_share", ("reader found writer removed. EOF"));
/*
If (cshare->pos_in_file < pos) is true above, this block will
be executed once for every thread, so we increase thread
counter by one. However, if (cshare->pos_in_file < pos) is
false, !cshare->read_end will be true only once, as we set
cshare->read_end below. In that case we restore
cshare->running_threads right away.
*/
if (cshare->pos_in_file < pos)
{
cshare->running_threads++;
}
else
{
cshare->running_threads= cshare->total_threads;
}
cshare->read_end= cshare->buffer; /* Empty buffer. */
cshare->error= 0; /* EOF is not an error. */
}
Unfortunatly this does not resolve the dead lock.
[7 Jun 2007 11:07]
Sergey Vojtovich
Cannot repeat this with recent sources anymore. Tested various combinations on different windows versions. Shane failed to repeat the problem too.
[23 Sep 2013 7:13]
MySQL Verification Team
note to self. bkecman sees this on recent 5.6 and 5.5: http://pastebin.com/raw.php?i=p3YbrkqZ

Description: Issuing REPAIR TABLE <table> on a table when you have myisam_repair_tables=2 causes the repair to hang indefinitely. I attached a debugger to the hung process to get a stack trace: The one thread of the REPAIR TABLE: mysqld-max-nt.exe!_pthread_cond_wait() + 0x24 bytes mysqld-max-nt.exe!__my_b_read() + 0x2a7 bytes mysqld-max-nt.exe!__my_b_read_r() + 0xcb bytes mysqld-max-nt.exe!__mi_read_cache() + 0x137 bytes mysqld-max-nt.exe!_filecopy() + 0x99d bytes mysqld-max-nt.exe!_mi_sort_index() + 0x375 bytes mysqld-max-nt.exe!_thr_find_all_keys() + 0x29d bytes mysqld-max-nt.exe!_pthread_start() + 0x3b bytes mysqld-max-nt.exe!_callthreadstart() Line 293 + 0x6 bytes mysqld-max-nt.exe!_threadstart(void * ptd=0x09585dd8) Line 275 One thread waiting to INSERT into the table being repaired: mysqld-max-nt.exe!_pthread_cond_wait() + 0x24 bytes mysqld-max-nt.exe!wait_for_refresh() + 0x4a bytes mysqld-max-nt.exe!open_table() + 0x76e bytes mysqld-max-nt.exe!open_tables() + 0x12b bytes mysqld-max-nt.exe!open_and_lock_tables() + 0x1a bytes mysqld-max-nt.exe!mysql_insert() + 0x1cf bytes mysqld-max-nt.exe!mysql_execute_command() + 0x17f4 bytes PROCESSLIST: Id: 275 User: root Host: db: test Command: Query Time: 2047 State: Repair with 9 threads Info: REPAIR TABLE `t1` Id: 273 User: root Host: db: test ommand: Query Time: 2043 State: Waiting for table Info: REPLACE INTO `t1` SET `id` = 2827 mysql> select version() +------------+ | version() | +------------+ | 5.0.34-log | +------------+ 1 row in set (0.02 sec) The only way to return to normal operations is to shutdown mysqld completely. KILL doesn't work because the threads hang in 'Killed' state... How to repeat: will upload testcase soon. Suggested fix: fix myisam_repair_threads to not deadlock, or disable multi-thread repairs.