MySQL Bugs: #24199: MySQL Crashes during FLUSH QUERY CACHE

Bug #24199	MySQL Crashes during FLUSH QUERY CACHE
Submitted:	10 Nov 2006 18:38	Modified:	30 Jan 2007 9:24
Reporter:	Morgan Collins	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	5.0.26	OS:	Linux (Linux)
Assigned to:		CPU Architecture:	Any

Description:
I have 2 MySQL servers in a master/slave configuration. I have a perl script that defrags the query cache every 30 minutes. Starting with 5.0.26, mysql on either server now crashes at least once a week. I have had to disable the query cache entirely on the master, as not defragging the query cache results in the server crashing for other queries. The server has not crashed sine the query cache has been disabled, however the slave continues to with query cache on.

# cat /etc/mysql/my.cnf | grep '^query_cache'
query_cache_type=1
query_cache_size=32M
query_cache_limit=2M

# cat /etc/gentoo-release 
Gentoo Base System version 1.12.6

# mysql --version
mysql  Ver 14.12 Distrib 5.0.26, for pc-linux-gnu (i686) using readline 5.1

# uname -a
Linux sql1 2.6.15-gentoo-r1 #1 SMP Fri Mar 31 20:21:32 PST 2006 i686 Intel(R) Xeon(TM) CPU 2.80GHz GenuineIntel GNU/Linux

# free
             total       used       free     shared    buffers     cached
Mem:       4154588    4024072     130516          0      40260    2447044
-/+ buffers/cache:    1536768    2617820
Swap:       979924         84     979840

# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 0  2     84 129276  38732 2449320    0    0   536  1460 1620  1678  4  1 80 15
 1  3     84 128656  38688 2450044    0    0  3720     0 1772  1024  5  3 68 24
 0  1     84 128416  38704 2450232    0    0  7136   300 1599   927  5  2 57 36
 0  1     84 128532  38604 2450128    0    0  6884   616 2434  2152  9  5 63 23
 1  1     84 128656  38568 2450028    0    0  6272     0 1536   931  7  5 74 15
 0  1     84 128780  38592 2450208    0    0  2160   788 1307  1645  6  4 73 16
 0  1     84 128532  38592 2449392    0    0   280   108 1065   855  1  1 85 13
 1  0     84 129896  38596 2448300    0    0  1248   132 1143  1462  6  2 78 13

# cat /var/log/mysqld.err
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1073741824
read_buffer_size=2093056
max_used_connections=251
max_connections=250
threads_connected=21
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1461270 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x4721d938
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x554941c8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x8167fcf
0xb7dc4edd
0x823f1b0
0x823f42d
0x823f4b8
0x81772b9
0x817bc77
0x8182dd0
0x818334b
0x8184684
0x8185150
0xb7dc2167
0xb7c13b7e
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x99efb18 = flush query cache
thd->thread_id=497890
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

# gdb
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1711479712 (LWP 4954)]
0x08157655 in Field_string::type ()
(gdb) bt
#0  0x08157655 in Field_string::type ()
#1  0x081fa60c in get_sweep_read_cost ()
#2  0x08200e9c in SEL_ARG::tree_delete ()
#3  0x08203048 in SQL_SELECT::test_quick_select ()
#4  0x081bd230 in st_table_list::print ()
#5  0x081bdf61 in JOIN::optimize ()
#6  0x081c4e7a in mysql_select ()
#7  0x081c5730 in handle_select ()
#8  0x08181007 in mysql_execute_command ()
#9  0x08182dd0 in mysql_parse ()
#10 0x081e4cb2 in Query_log_event::exec_event ()
#11 0x081e5308 in Query_log_event::exec_event ()
#12 0x08245885 in handle_slave_sql ()
#13 0xb7e47167 in start_thread () from /lib/libpthread.so.0
#14 0xb7c98b7e in clone () from /lib/libc.so.6

NOTE: The stack dump and the mysql log entry were taken from different compilations of mysql and during different crashes. However ps shows that my flush query cache script is running when the server crashed.

I will be downgrading the slave to 5.0.25 to see if the problem persists. I will update the bug with the outcome.

How to repeat:
run flush query cache; every 30 minutes for a week

Suggested fix:
defragging the query cache shouldn't crash the server, and not defragging the query cache should not result in the server crashing for other queries. The query cache should be able to be used.

Thank you for a problem report. Please, try to resolve this stack trace:

0x8167fcf
0xb7dc4edd
0x823f1b0
0x823f42d
0x823f4b8
0x81772b9
0x817bc77
0x8182dd0
0x818334b
0x8184684
0x8185150
0xb7dc2167
0xb7c13b7e

on a server where FLUSH QUERY CACHE really crashed. Send the results of:

getconf GNU_LIBPTHREAD_VERSION

also.

Stack Dump with symbols

Attachment: mysql.txt (text/plain), 53 bytes.

I have attached a backtrace with full debugging enabled, and with symbols. Hopefully this will be more helpful. It doesn't look like it crashed during a FLUSH QUERY CACHE, though when I turn query cache off I don't seam to have this problem.

Additionally, here is the output of info threads:

(gdb) info threads
* 16 Thread 1717521312 (LWP 24630)  0x081efaf6 in mysql_select (thd=0xa1fe328, rref_pointer_array=0xa1fe71c, tables=0xa20e2c0, wild_num=0, 
    fields=@0xa1fe680, conds=0xa20e590, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=1375748096, result=0xa20e688, 
    unit=0xa1fe3cc, select_lex=0xa1fe5f4) at sql_select.cc:1974
  15 Thread 1717320608 (LWP 24629)  0xffffe410 in ?? ()
  14 Thread 1715469216 (LWP 24013)  0xffffe410 in ?? ()
  13 Thread -1754207328 (LWP 23205)  0x08429593 in ha_key_cmp (keyseg=0xa6fb288, a=0xa7126a0 "", b=0x645cbfd8 "", key_length=2224, nextflag=65, 
    diff_pos=0x97707d14) at my_handler.c:119
  12 Thread -1695458400 (LWP 23128)  0xffffe410 in ?? ()
  11 Thread -1703851104 (LWP 23129)  0xffffe410 in ?? ()
  10 Thread -1712243808 (LWP 23130)  0xffffe410 in ?? ()
  9 Thread -1720636512 (LWP 23131)  0xffffe410 in ?? ()
  8 Thread -1737421920 (LWP 23135)  0xffffe410 in ?? ()
  7 Thread -1745814624 (LWP 23136)  0xffffe410 in ?? ()
  6 Thread -1763542112 (LWP 23138)  0xffffe410 in ?? ()
  5 Thread 1717722016 (LWP 23157)  0xffffe410 in ?? ()
  4 Thread 1717521312 (LWP 24630)  0x0816dca5 in Field_string::type (this=0x902c560) at field.h:1015
  3 Thread 1717320608 (LWP 24629)  0xffffe410 in ?? ()
  2 Thread 1717119904 (LWP 23165)  0xffffe410 in ?? ()
  1 Thread -1212630160 (LWP 23120)  0xffffe410 in ?? ()

# getconf GNU_LIBPTHREAD_VERSION
NPTL 2.4

Thanks

Stack Dump with symbols

Attachment: mysql-full-debug.txt (text/plain), 12.25 KiB.

Just to verify, I copies note_messageType.* away before I killed gdb. I then went and found the query in the relay log. I stopped mysql, copied the old note_messageType back and started mysql with replication stopped and reran the INSERT, and mysql did not crash.

Here is the full query from the relay log:

INSERT INTO chartconnect.note_messageType                                                                                        (                                                                                                communityID,                                                                                                noteID,                                                                                                messageTypeID                                                                                        )                                                                                        SELECT                                                                                                24,                                                                                                4636563,                                                                                                messageType.messageTypeID                                                                                        FROM                                                                                                chartconnect.messageType                                                                                        WHERE                                                                                          messageType.name = 'Rad Comment';

I was able to reproduce this crash with the query cache disabled. 

There was a check table running at the time however.

Interestingly, reducing table_cache to 64 prevents the INSERT INTO ... SELECT+CHECK TABLE crash, however only if the query cache is disabled.

I also forgot to mention in my previous post that these crashes always happen in the same function, irregardless of the initiator.

Please, send your my.cnf content and describe exact steps to reproduce this crash each and every time.

I am not able to provide you with an exact method to reproduce this crash. I have tried constructing tables similar to ours without sensitive information, but am unable to reproduce the error.

We have upgraded to 5.0.30 and no longer are having issues however.

Please, send resolved stack trace in case of any similar crash in 5.0.30 (or 5.0.27).

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".