Bug #40090 Errornous query cache thread synchronization causes server to freeze forever
Submitted: 16 Oct 2008 20:26 Modified: 28 Nov 2008 6:04
Reporter: Patric Müller Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Query Cache Severity:S1 (Critical)
Version:5.1.26 OS:Linux
Assigned to: CPU Architecture:Any

[16 Oct 2008 20:26] Patric Müller
Description:
I have several problems with mysql 5.1.26 when running an instance under a high load and using the query cache (query cache size: 75 MB).
After around 10 - 30 minutes, mysql server freezes and processes no more sql statements. 

When having less load, this problem does not occour.

After debugging the server instance, I noticed that all active threads are waiting for a pthread condition signal inside query cache code, no thread was active any more, and therefore all threads waited forever to wake up.

Deactivating query cache solved the problem - for now. Of course this leads to even higher load of our myqsl instance.

Please find the relevant gdb backtrace excerpt below:

Thread 448 (process 14694):
#0  0xa000000000010640 in __kernel_syscall_via_break ()
#1  0x20000000000548b0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x40000000006abe20 in query_cache_insert (net=0x20000000ec5009b8, packet=0x20000000ec503130 "\001", length=4545) at sql_cache.cc:1627
#3  0x4000000000329770 in net_real_write (net=0x20000000ec5009b8, packet=0x20000000ec503130 "\001", len=2305843013178362296) at net_serv.cc:565
#4  0x400000000032a1a0 in net_flush (net=0x20000000ec5009b8) at net_serv.cc:342
#5  0x4000000000330330 in net_send_eof (thd=0x20000000ec501e89, server_status=2, total_warn_count=0) at protocol.cc:205
#6  0x4000000000330150 in net_end_statement (thd=0x20000000ec5009b8) at protocol.cc:400
#7  0x400000000036adf0 in $$1$338_0$TAG$DEFAULT$0$395 () at sql_parse.cc:1503
#8  0x4000000000368c30 in do_command (thd=0x20000000ec5008d0) at sql_parse.cc:794
#9  0x400000000035cd80 in handle_one_connection (arg=0x20000000ec5009b8) at sql_connect.cc:1115
#10 0x200000000004e7e0 in start_thread () from /lib/tls/libpthread.so.0
#11 0x2000000000265ed0 in __clone2 () from /lib/tls/libc.so.6.1

Thread 449 (process 14693):
#0  0xa000000000010640 in __kernel_syscall_via_break ()
#1  0x20000000000548b0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x40000000006b0f30 in Query_cache::invalidate_table (this=0x6000000000212fe0, thd=0x20000000ec50a4d0, key=0x6000000005e5f870 "partyfans_1", key_length=28)
    at sql_cache.cc:1627
#3  0x40000000006b0d50 in Query_cache::invalidate_table (this=0x6000000000212fe0, thd=0x20000000ec50a4d0, table=0x6000000006007c30) at sql_cache.cc:2526
#4  0x40000000006b0c60 in Query_cache::invalidate_table (this=0x6000000000212fe0, thd=0x20000000ec50a4d0, table_list=0x200000013c570648) at sql_cache.cc:2510
#5  0x40000000006aed90 in Query_cache::invalidate (this=0x6000000000212fe0, thd=0x20000000ec50a4d0, tables_used=0x200000013c570648, using_transactions=-40 'Ø')
    at sql_cache.cc:1506
#6  0x40000000004fde00 in mysql_update (thd=0x20000000ec50a4d0, table_list=0x20000000ec50a4d0, fields=@0x20000000ec50c3a8, values=@0x0, conds=0x20000000ec50a4d0, 
    order_num=100695096, order=0x6000000006007ce8, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:775
#7  0x4000000000374c20 in $$1$344_0$TAG$04$0$371 () at sql_parse.cc:2841
#8  0x4000000000381f70 in mysql_parse (thd=0x20000000ec50a4d0, inBuf=0x20000000ec50bdb0 "", length=97, found_semicolon=0x20000000b4cfe920) at sql_parse.cc:5645
#9  0x400000000036a7e0 in $$1$338_0$TAG$03$0$395 () at sql_parse.cc:1137
#10 0x4000000000368c30 in do_command (thd=0x20000000ec50a4d0) at sql_parse.cc:794
#11 0x400000000035cd80 in handle_one_connection (arg=0x20000000ec50a5b8) at sql_connect.cc:1115
#12 0x200000000004e7e0 in start_thread () from /lib/tls/libpthread.so.0
#13 0x2000000000265ed0 in __clone2 () from /lib/tls/libc.so.6.1

Thread 429 (process 14759):
#0  0xa000000000010640 in __kernel_syscall_via_break ()
#1  0x20000000000548b0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x40000000006b0f30 in Query_cache::invalidate_table (this=0x6000000000212fe0, thd=0x20000000ec55cff0, key=0x6000000005e5f870 "partyfans_1", key_length=28)
    at sql_cache.cc:1627
#3  0x40000000006b0d50 in Query_cache::invalidate_table (this=0x6000000000212fe0, thd=0x20000000ec55cff0, table=0x6000000005fbd830) at sql_cache.cc:2526
#4  0x40000000006b0c60 in Query_cache::invalidate_table (this=0x6000000000212fe0, thd=0x20000000ec55cff0, table_list=0x6000000006060398) at sql_cache.cc:2510
#5  0x40000000006aed90 in Query_cache::invalidate (this=0x6000000000212fe0, thd=0x20000000ec55cff0, tables_used=0x6000000006060398, using_transactions=40 '(')
    at sql_cache.cc:1506
#6  0x40000000004fde00 in mysql_update (thd=0x20000000ec55cff0, table_list=0x20000000ec55cff0, fields=@0x20000000ec55eec8, values=@0x0, conds=0x20000000ec55cff0, 
    order_num=100390968, order=0x6000000005fbd8e8, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:775
#7  0x4000000000374c20 in $$1$344_0$TAG$04$0$371 () at sql_parse.cc:2841
#8  0x4000000000381f70 in mysql_parse (thd=0x20000000ec55cff0, inBuf=0x20000000ec55e8d0 "", length=97, found_semicolon=0x200000015468a920) at sql_parse.cc:5645
#9  0x400000000036a7e0 in $$1$338_0$TAG$03$0$395 () at sql_parse.cc:1137
#10 0x4000000000368c30 in do_command (thd=0x20000000ec55cff0) at sql_parse.cc:794
#11 0x400000000035cd80 in handle_one_connection (arg=0x20000000ec55d0d8) at sql_connect.cc:1115
#12 0x200000000004e7e0 in start_thread () from /lib/tls/libpthread.so.0
#13 0x2000000000265ed0 in __clone2 () from /lib/tls/libc.so.6.1

How to repeat:
High load, massive inserting/updating/selecting, actived query cache.
(sorry, I do currently not have more information how to repeat this bug outside our production environment)
[28 Oct 2008 6:04] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.1.29, and inform about the results.
[29 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[7 Jun 2010 20:25] James Day
There's a good chance that this was fixed by the work on bug #43758. 

For Windows installations only, a different pthread_cond_wait issue was fixed in bug #47768.

In general it's a good idea to upgrade to at least version 5.1.42 whenever the query cache shows up as a potential issue - check by turning it off - because we've fixed a number of hangs or delays where it was involved in that version and those leading up to it.