| Bug #68961 | mysql replication hangs | ||
|---|---|---|---|
| Submitted: | 15 Apr 2013 9:04 | Modified: | 24 Dec 2013 15:55 |
| Reporter: | Michal Grzedzicki | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
| Version: | 5.5.28 | OS: | Linux (64 bit) |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | btr_search_latch, deadlock, innodb, latch, mysqld | ||
[15 Apr 2013 9:05]
Michal Grzedzicki
gdb trace output
Attachment: mysql_gdb.txt (text/plain), 92.01 KiB.
[15 Apr 2013 17:57]
Sveta Smirnova
Thank you for the report. GDB backtrace shows some optimization of queries running. To prove this is MySQL bug we need repeatable test case. Do you have some job running once per week on slaves in time when you observer this behavior?
[15 Apr 2013 21:11]
Michal Grzedzicki
trace from another machine with same problem
Attachment: mysql_gdb2.txt (text/plain), 69.21 KiB.
[15 Apr 2013 21:25]
Sveta Smirnova
Thank you for the feedback. Do you have some job running once per week on one of affected machines in time when you observer this behavior?
[15 Apr 2013 21:30]
Michal Grzedzicki
no there are no cronjobs running when the database hangs, deadlocks aren't periodical, it seems to happen more if there is more traffick on the site
[25 Apr 2013 19:59]
Sveta Smirnova
Thank you for the feedback. This still looks more like support request for me. For example, such issue can happen if you have problems with disk or not best tuned InnoDB options, some of queries can affect this. For start: please check if you did not set innodb_lock_wait_timeout to value larger than its's default 50 seconds and check output of SHOW ENGINE INNODB STATUS and SHOW FULL PROCESSLIST during problem time. Pay attention for queries and transactions in blocked state, check if slave SQL thread involved (since you see this on slaves only).
[6 May 2013 10:25]
Michal Grzedzicki
innodb_lock_wait_timeout is a litle higher (80) then the default 50 The deadlock persists for much more the this. The database is completly unresponsive. I can't send any command to the database, eg. show processlist All the threads are waiting for locks. I think btr_search_latch which deadlock here is not the lock refered in innodb_lock_wait_timeout.
[19 Sep 2013 17:29]
MySQL Verification Team
http://bugs.mysql.com/bug.php?id=62018 !
[24 Dec 2013 15:55]
Sveta Smirnova
Shane, thank you for the bug #. I will set this report as duplicate of bug #62018 for now, but we need to check it one more time when bug #62018 is fixed.

Description: usually once in a week our mysql 5.5.28 hangs with thiese information in errorlog, it only happens on the replication slaves I think there might be a deadlock in Thread 47 (Thread 0x7f76ff3db700 (LWP 17504)) If I understand this correctly, thread 47 (140149065103104) is trying to lock btr_search_latch exclusivelly while still holding a shared lock on it. ===================================== 130412 12:17:50 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 20 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 1748117 1_second, 1748111 sleeps, 174455 10_second, 18939 background, 17671 flush srv_master_thread log flush and writes: 1762727 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 20509153, signal count 399753864 --Thread 140149064099584 has waited at trx0trx.c line 405 for 708.00 seconds the semaphore: Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1 waiters flag 1 --Thread 140149065103104 has waited at btr0sea.c line 1134 for 708.00 seconds the semaphore: ********************************************* btr_search_drop_page_hash_index() 1130 if (UNIV_LIKELY_NULL(heap)) { 1131 mem_heap_free(heap); 1132 } 1133 1134 rw_lock_x_lock(&btr_search_latch); 1135 1136 if (UNIV_UNLIKELY(!block->index)) { 1137 /* Someone else has meanwhile dropped the hash index */ 1138 1139 goto cleanup; 1140 } ********************************************** X-lock (wait_ex) on RW-latch at 0x3eeb0a48 created in file btr0sea.c line 178 a writer (thread id 140149065103104) has reserved it in mode wait exclusive number of readers 1, waiters flag 0, lock_word: ffffffffffffffff Last time read locked in file btr0sea.c line 1057 ***************************************************** btr_search_drop_page_hash_index() 1056 retry: 1057 rw_lock_s_lock(&btr_search_latch); 1058 index = block->index; ******************************************************* Last time write locked in file /usr/src/librus/mysql-5.5.28/storage/innobase/btr/btr0sea.c line 633 ************************************************************************* btr_search_info_update_slow() 626 if (cursor->flag == BTR_CUR_HASH_FAIL) { 627 /* Update the hash node reference, if appropriate */ 628 629 #ifdef UNIV_SEARCH_PERF_STAT 630 btr_search_n_hash_fail++; 631 #endif /* UNIV_SEARCH_PERF_STAT */ 632 633 rw_lock_x_lock(&btr_search_latch); 634 635 btr_search_update_hash_ref(info, block, cursor); 636 637 rw_lock_x_unlock(&btr_search_latch); 638 } ************************************************************************ --Thread 140194171221760 has waited at trx0trx.c line 210 for 708.00 seconds the semaphore: Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1 waiters flag 1 --Thread 140194171422464 has waited at trx0trx.c line 210 for 708.00 seconds the semaphore: Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1 waiters flag 1 --Thread 140148808996608 has waited at trx0trx.c line 754 for 708.00 seconds the semaphore: Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1 waiters flag 1 --Thread 140149090682624 has waited at srv0srv.c line 2291 for 708.00 seconds the semaphore: Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1 waiters flag 1 --Thread 140149099075328 has waited at row0mysql.c line 2262 for 708.00 seconds the semaphore: Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1 waiters flag 1 Mutex spin waits 1750057502, rounds 869143490, OS waits 3752336 RW-shared spins 117254672, rounds 619941247, OS waits 14143536 RW-excl spins 5703352, rounds 631489507, OS waits 2301392 Spin rounds per wait: 0.50 mutex, 5.29 RW-shared, 110.72 RW-excl FAIL TO OBTAIN KERNEL MUTEX, SKIP LOCK INFO PRINTING How to repeat: I don't know.