MySQL Bugs: #57579: blobs cause a thread to wait for itself endlessly, before long semaphore death

Bug #57579	blobs cause a thread to wait for itself endlessly, before long semaphore death
Submitted:	19 Oct 2010 19:10	Modified:	9 Dec 2010 18:44
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.5.6, 5.5.8-debug	OS:	Any
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
this [cut] 5.5.6 output summarizes the problem:

--Thread 4372 has waited at btr0cur.c line 4591 for 933.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 16218B78 created in file buf0buf.c line 898
a writer (thread id 4372) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffff
Last time read locked in file btr0cur.c line 3266
Last time write locked in file btr0btr.c line 397

"Main thread id 4372, state: flushing log"
Stack trace:

ntdll.dll!_ZwWaitForSingleObject
kernel32.dll!_WaitForSingleObjectEx
kernel32.dll!_WaitForSingleObject
mysqld.exe!os_event_wait_low  Line 535	
mysqld.exe!sync_array_wait_event  Line 459	
mysqld.exe!rw_lock_x_lock_wait  Line 537	
mysqld.exe!rw_lock_x_lock_low  Line 576
mysqld.exe!rw_lock_x_lock_func  Line 628
mysqld.exe!buf_page_get_gen  Line 3069	
mysqld.exe!btr_free_externally_stored_field  Line 4591
mysqld.exe!btr_rec_free_externally_stored_fields  Line 4694
mysqld.exe!btr_cur_pessimistic_delete  Line 3058
mysqld.exe!row_purge_remove_clust_if_poss_low  Line 177	
mysqld.exe!row_purge_remove_clust_if_poss  Line 211
mysqld.exe!row_purge_del_mark  Line 484	
mysqld.exe!row_purge  Line 756	
mysqld.exe!row_purge_step  Line 801	
mysqld.exe!que_thr_step  Line 1247 + 0x6 bytes	
mysqld.exe!que_run_threads_low  Line 1310 + 0x6 bytes	
mysqld.exe!que_run_threads  Line 1349	
mysqld.exe!trx_purge  Line 1197	
mysqld.exe!srv_master_do_purge  Line 2539 
mysqld.exe!srv_master_thread  Line 2849	

How to repeat:
not easy.  run a generic blob dml workload (see attached zlibdml.php )in 2 threads against a server started like:

mysqld --console --innodb-buffer-pool-size=500M --innodb-fast-shutdown=0 
--innodb-file-per-table=1  --innodb-flush-log-at-trx-commit=0 
--innodb-log-file-size=100M --skip-gr --skip-na

it might take some hours to repeat, or a few minutes.

some details....

Attachment: bug57579_5.5.6_error_stack.txt (text/plain), 20.12 KiB.

run this is two or more threads for however long it takes...

Attachment: zlibdml.php (, text), 6.35 KiB.

still analyzing this, and need to test with a fix for bug #55543 ...

debug infos from 5.5.8-debug where i repeated with 3 threads of zlibdml.php and 2 hours runtime

Attachment: bug57579_5.5.8-debug-repeated-gdb-info.txt (text/plain), 75.92 KiB.

more 5.5.8 debug of another hang infos from latest build

Attachment: bug57579_mysql-5.5-innodb-revno3192-gdb-debug.txt (text/plain), 135.98 KiB.

Marko, 4 threads of *this* zlibdml on 5.5 hangs in a few minutes. let me know if you need more details??

Attachment: zlibdml.php (, text), 7.07 KiB.

Devs: when looking into this, please check if this bug also exists in the 5.1 code. If so, please fix for 5.1 as well. 

Triage: please set target to 5.1 ftm. Though Shane and myself were unable to reproduce in 5.1 for the moment (ref. customer case).

I was able to trigger a failure of the UNIV_SYNC_DEBUG assertion ut_ad(sync_thread_levels_empty_gen(TRUE)); in log_free_check(). The fix should be this simple:

=== modified file 'storage/innobase/btr/btr0cur.c'
--- storage/innobase/btr/btr0cur.c	revid:jimmy.yang@oracle.com-20101029045543-zhp7e5ttuyg1eqkv
+++ storage/innobase/btr/btr0cur.c	2010-11-01 08:15:41 +0000
@@ -3277,6 +3277,7 @@ btr_estimate_n_rows_in_range_on_level(
 		    || btr_page_get_level_low(page) != level) {
 
 			/* The page got reused for something else */
+			mtr_commit(&mtr);
 			goto inexact;
 		}

This particular bug does not exist in 5.1 or earlier.

i'll need to rebuild 5.1 with UNIV_SYNC_DEBUG defined and leave this testcase running in case it finds anything.  if it does, i have to open a new bug.

i was never able to reproduce this bug in 5.1 :-/

Added to changelog:

Heavy concurrent updates of a BLOB column in an InnoDB table could
cause a hang.

Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)

Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)