Bug #68961 mysql replication hangs
Submitted: 15 Apr 2013 9:04 Modified: 24 Dec 2013 15:55
Reporter: Michal Grzedzicki Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.5.28 OS:Linux (64 bit)
Assigned to: CPU Architecture:Any
Tags: btr_search_latch, deadlock, innodb, latch, mysqld

[15 Apr 2013 9:04] Michal Grzedzicki
Description:
usually once in a week our mysql 5.5.28 hangs with thiese information in errorlog, it only happens on the replication slaves

I think there might be a deadlock in Thread 47 (Thread 0x7f76ff3db700 (LWP 17504))

If I understand this correctly, thread 47 (140149065103104) is trying to lock btr_search_latch exclusivelly while still holding a shared lock on it.

=====================================
130412 12:17:50 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1748117 1_second, 1748111 sleeps, 174455 10_second, 18939 background, 17671 flush
srv_master_thread log flush and writes: 1762727
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 20509153, signal count 399753864
--Thread 140149064099584 has waited at trx0trx.c line 405 for 708.00 seconds the semaphore:
Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1
waiters flag 1
--Thread 140149065103104 has waited at btr0sea.c line 1134 for 708.00 seconds the semaphore:
*********************************************
btr_search_drop_page_hash_index()
   1130         if (UNIV_LIKELY_NULL(heap)) {
   1131                 mem_heap_free(heap);
   1132         }
   1133 
   1134         rw_lock_x_lock(&btr_search_latch);
   1135 
   1136         if (UNIV_UNLIKELY(!block->index)) {
   1137                 /* Someone else has meanwhile dropped the hash index */
   1138 
   1139                 goto cleanup;
   1140         }
**********************************************
X-lock (wait_ex) on RW-latch at 0x3eeb0a48 created in file btr0sea.c line 178
a writer (thread id 140149065103104) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file btr0sea.c line 1057
*****************************************************
btr_search_drop_page_hash_index()
   1056 retry:
   1057         rw_lock_s_lock(&btr_search_latch);
   1058         index = block->index;
*******************************************************
Last time write locked in file /usr/src/librus/mysql-5.5.28/storage/innobase/btr/btr0sea.c line 633

*************************************************************************
btr_search_info_update_slow()
    626         if (cursor->flag == BTR_CUR_HASH_FAIL) {
    627                 /* Update the hash node reference, if appropriate */
    628 
    629 #ifdef UNIV_SEARCH_PERF_STAT
    630                 btr_search_n_hash_fail++;
    631 #endif /* UNIV_SEARCH_PERF_STAT */
    632 
    633                 rw_lock_x_lock(&btr_search_latch);
    634 
    635                 btr_search_update_hash_ref(info, block, cursor);
    636 
    637                 rw_lock_x_unlock(&btr_search_latch);
    638         }
************************************************************************

--Thread 140194171221760 has waited at trx0trx.c line 210 for 708.00 seconds the semaphore:
Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1
waiters flag 1
--Thread 140194171422464 has waited at trx0trx.c line 210 for 708.00 seconds the semaphore:
Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1
waiters flag 1
--Thread 140148808996608 has waited at trx0trx.c line 754 for 708.00 seconds the semaphore:
Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1
waiters flag 1
--Thread 140149090682624 has waited at srv0srv.c line 2291 for 708.00 seconds the semaphore:
Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1
waiters flag 1
--Thread 140149099075328 has waited at row0mysql.c line 2262 for 708.00 seconds the semaphore:
Mutex at 0x1bb25d8 created file srv0srv.c line 1018, lock var 1
waiters flag 1
Mutex spin waits 1750057502, rounds 869143490, OS waits 3752336
RW-shared spins 117254672, rounds 619941247, OS waits 14143536
RW-excl spins 5703352, rounds 631489507, OS waits 2301392
Spin rounds per wait: 0.50 mutex, 5.29 RW-shared, 110.72 RW-excl
FAIL TO OBTAIN KERNEL MUTEX, SKIP LOCK INFO PRINTING

How to repeat:
I don't know.
[15 Apr 2013 9:05] Michal Grzedzicki
gdb trace output

Attachment: mysql_gdb.txt (text/plain), 92.01 KiB.

[15 Apr 2013 17:57] Sveta Smirnova
Thank you for the report.

GDB backtrace shows some optimization of queries running. To prove this is MySQL bug we need repeatable test case. Do you have some job running once per week on slaves in time when you observer this behavior?
[15 Apr 2013 21:11] Michal Grzedzicki
trace from another machine with same problem

Attachment: mysql_gdb2.txt (text/plain), 69.21 KiB.

[15 Apr 2013 21:25] Sveta Smirnova
Thank you for the feedback.

Do you have some job running once per week on one of affected machines in time when you observer this behavior?
[15 Apr 2013 21:30] Michal Grzedzicki
no there are no cronjobs running when the database hangs,

deadlocks aren't periodical, it seems to happen more if there is more traffick on the site
[25 Apr 2013 19:59] Sveta Smirnova
Thank you for the feedback.

This still looks more like support request for me. For example, such issue can happen if you have problems with disk or not best tuned InnoDB options, some of queries can affect this.

For start: please check if you did not set innodb_lock_wait_timeout to value larger than its's default 50 seconds and check output of SHOW ENGINE INNODB STATUS and SHOW FULL PROCESSLIST during problem time. Pay attention for queries and transactions in blocked state, check if slave SQL thread involved (since you see this on slaves only).
[6 May 2013 10:25] Michal Grzedzicki
innodb_lock_wait_timeout is a litle higher (80) then the default 50

The deadlock persists for much more the this.

The database is completly unresponsive.
I can't send any command to the database, eg. show processlist

All the threads are waiting for locks.

I think btr_search_latch which deadlock here is not the lock refered in innodb_lock_wait_timeout.
[19 Sep 2013 17:29] MySQL Verification Team
http://bugs.mysql.com/bug.php?id=62018 !
[24 Dec 2013 15:55] Sveta Smirnova
Shane,

thank you for the bug #. I will set this report as duplicate of bug #62018 for now, but we need to check it one more time when bug #62018 is fixed.