Bug #57579 blobs cause a thread to wait for itself endlessly, before long semaphore death
Submitted: 19 Oct 2010 19:10 Modified: 9 Dec 2010 18:44
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.5.6, 5.5.8-debug OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Triage: Triaged: D1 (Critical) / R1 (None/Negligible) / E1 (None/Negligible)

[19 Oct 2010 19:10] Shane Bester
Description:
this [cut] 5.5.6 output summarizes the problem:

--Thread 4372 has waited at btr0cur.c line 4591 for 933.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 16218B78 created in file buf0buf.c line 898
a writer (thread id 4372) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffff
Last time read locked in file btr0cur.c line 3266
Last time write locked in file btr0btr.c line 397

"Main thread id 4372, state: flushing log"
Stack trace:

ntdll.dll!_ZwWaitForSingleObject
kernel32.dll!_WaitForSingleObjectEx
kernel32.dll!_WaitForSingleObject
mysqld.exe!os_event_wait_low  Line 535	
mysqld.exe!sync_array_wait_event  Line 459	
mysqld.exe!rw_lock_x_lock_wait  Line 537	
mysqld.exe!rw_lock_x_lock_low  Line 576
mysqld.exe!rw_lock_x_lock_func  Line 628
mysqld.exe!buf_page_get_gen  Line 3069	
mysqld.exe!btr_free_externally_stored_field  Line 4591
mysqld.exe!btr_rec_free_externally_stored_fields  Line 4694
mysqld.exe!btr_cur_pessimistic_delete  Line 3058
mysqld.exe!row_purge_remove_clust_if_poss_low  Line 177	
mysqld.exe!row_purge_remove_clust_if_poss  Line 211
mysqld.exe!row_purge_del_mark  Line 484	
mysqld.exe!row_purge  Line 756	
mysqld.exe!row_purge_step  Line 801	
mysqld.exe!que_thr_step  Line 1247 + 0x6 bytes	
mysqld.exe!que_run_threads_low  Line 1310 + 0x6 bytes	
mysqld.exe!que_run_threads  Line 1349	
mysqld.exe!trx_purge  Line 1197	
mysqld.exe!srv_master_do_purge  Line 2539 
mysqld.exe!srv_master_thread  Line 2849	

How to repeat:
not easy.  run a generic blob dml workload (see attached zlibdml.php )in 2 threads against a server started like:

mysqld --console --innodb-buffer-pool-size=500M --innodb-fast-shutdown=0 
--innodb-file-per-table=1  --innodb-flush-log-at-trx-commit=0 
--innodb-log-file-size=100M --skip-gr --skip-na

it might take some hours to repeat, or a few minutes.
[19 Oct 2010 19:12] Shane Bester
some details....

Attachment: bug57579_5.5.6_error_stack.txt (text/plain), 20.12 KiB.

[19 Oct 2010 19:13] Shane Bester
run this is two or more threads for however long it takes...

Attachment: zlibdml.php (, text), 6.35 KiB.

[19 Oct 2010 19:16] Shane Bester
still analyzing this, and need to test with a fix for bug #55543 ...
[20 Oct 2010 5:06] Shane Bester
debug infos from 5.5.8-debug where i repeated with 3 threads of zlibdml.php and 2 hours runtime

Attachment: bug57579_5.5.8-debug-repeated-gdb-info.txt (text/plain), 75.92 KiB.

[21 Oct 2010 2:51] Shane Bester
more 5.5.8 debug of another hang infos from latest build

Attachment: bug57579_mysql-5.5-innodb-revno3192-gdb-debug.txt (text/plain), 135.98 KiB.

[24 Oct 2010 5:35] Shane Bester
Marko, 4 threads of *this* zlibdml on 5.5 hangs in a few minutes. let me know if you need more details??

Attachment: zlibdml.php (, text), 7.07 KiB.

[25 Oct 2010 8:16] Roel Van de Paar
Devs: when looking into this, please check if this bug also exists in the 5.1 code. If so, please fix for 5.1 as well. 

Triage: please set target to 5.1 ftm. Though Shane and myself were unable to reproduce in 5.1 for the moment (ref. customer case).
[1 Nov 2010 8:23] Marko Mäkelä
I was able to trigger a failure of the UNIV_SYNC_DEBUG assertion ut_ad(sync_thread_levels_empty_gen(TRUE)); in log_free_check(). The fix should be this simple:

=== modified file 'storage/innobase/btr/btr0cur.c'
--- storage/innobase/btr/btr0cur.c	revid:jimmy.yang@oracle.com-20101029045543-zhp7e5ttuyg1eqkv
+++ storage/innobase/btr/btr0cur.c	2010-11-01 08:15:41 +0000
@@ -3277,6 +3277,7 @@ btr_estimate_n_rows_in_range_on_level(
 		    || btr_page_get_level_low(page) != level) {
 
 			/* The page got reused for something else */
+			mtr_commit(&mtr);
 			goto inexact;
 		}

This particular bug does not exist in 5.1 or earlier.
[2 Nov 2010 5:09] Shane Bester
i'll need to rebuild 5.1 with UNIV_SYNC_DEBUG defined and leave this testcase running in case it finds anything.  if it does, i have to open a new bug.
[18 Nov 2010 9:48] Shane Bester
i was never able to reproduce this bug in 5.1 :-/
[30 Nov 2010 23:42] John Russell
Added to changelog:

Heavy concurrent updates of a BLOB column in an InnoDB table could
cause a hang.
[5 Dec 2010 12:39] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[16 Dec 2010 22:28] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)