Bug #540 FLUSH QUERY CACHE causes Signal 11
Submitted: 29 May 2003 16:16 Modified: 15 Sep 2003 6:24
Reporter: Tony Holmes Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.14 OS:Linux (Linux 2.4.19 -SMP)
Assigned to: Assigned Account CPU Architecture:Any

[29 May 2003 16:16] Tony Holmes
Description:
Everytime I run FLUSH QUERY CACHE as root it causes a signal 11 and safe restart.  All the information I could gather is pasted below. It includes my.cnf mysqlbug and mysql.err contents

######  Program Versions  #########
All MySQL compiled original rpms from MySQL.com

MySQL-server-4.0.13-0
MySQL-devel-4.0.13-0
MySQL-client-4.0.13-0

###### my.cnf file follows ########

### Global Options
[mysqld]
set-variable = key_buffer_size=12M
set-variable = table_cache=96
query_cache_type = 1
query_cache_size = 8M

###### mysqlbug info follows ######

>Submitter-Id:  <submitter ID>
>Originator:    Tony
>Release:       mysql-4.0.13 (Official MySQL RPM)

>C compiler:    2.95.3
>C++ compiler:  2.95.3
>Environment:
        <machine, os, target, libraries (multiple lines)>
System: Linux omega.unihost.net 2.4.18-6mdkenterprise #1 SMP Fri Mar 15 02:28:20 CET 2002 i686 unknown
Architecture: i686

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gmake /usr/bin/gcc /usr/bin/cc
GCC: Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux-gnu/2.96/specs
gcc version 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)
Compilation info: CC='gcc'  CFLAGS='-O6 -fno-omit-frame-pointer -mpentium'  CXX='g++'  CXXFLAGS='-O6 -fno-omit-frame-pointer      \
        -felide-constructors -fno-exceptions -fno-rtti -mpentium'  LDFLAGS=''  ASFLAGS=''
LIBC:
lrwxrwxrwx    1 root     root           13 Apr 10 14:59 /lib/libc.so.6 -> libc-2.2.4.so
-rwxr-xr-x    1 root     root      1275332 Mar 21 04:36 /lib/libc-2.2.4.so
-rw-r--r--    1 root     root     27275946 Mar 21 03:27 /usr/lib/libc.a
-rw-r--r--    1 root     root          178 Mar 21 03:27 /usr/lib/libc.so
Configure command: ./configure '--disable-shared' '--with-mysqld-ldflags=-all-static' '--with-client-ldflags=-all-static' '--witho\
ut-berkeley-db' '--with-innodb' '--without-vio' '--without-openssl' '--enable-assembler' '--enable-local-infile' '--with-mysqld-us\
er=mysql' '--with-unix-socket-path=/var/lib/mysql/mysql.sock' '--prefix=/' '--with-extra-charsets=complex' '--exec-prefix=/usr' '-\
-libexecdir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--localstatedir=/var/lib/mysql' '--infodir=/usr/share/info' '--\
includedir=/usr/include' '--mandir=/usr/share/man' '--with-embedded-server' '--enable-thread-safe-client' '--with-comment=Official\
 MySQL RPM' 'CFLAGS=-O6 -fno-omit-frame-pointer -mpentium' 'CXXFLAGS=-O6 -fno-omit-frame-pointer                -felide-constructo\
rs -fno-exceptions -fno-rtti -mpentium'

########  Contents of Error Log Follows  ##########

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=12582912
read_buffer_size=131072
sort_buffer_size=2097144
max_used_connections=49
max_connections=100
threads_connected=44
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 229887 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x8971ea8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfbdf288, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x80741ea
0x829c1e8
0x80eaf33
0x80eaca0
0x80ebb29
0x8083738
0x8081cc3
0x808273b
0x807de2d
0x8083c5e
0x807cfff
0x829999c
0x82cd0aa
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x89827e8 = FLUSH QUERY CACHE
thd->thread_id=24

Successfully dumped variables, if you ran with --log, take a look at the
details of what thread 24 did to cause the crash.  In some cases of really
bad corruption, the values shown above may be invalid.

The manual page at http://www.mysql.com/doc/C/r/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
030529 22:48:00  mysqld restarted
030529 22:48:00  InnoDB: Started
/usr/sbin/mysqld: ready for connections.
Version: '4.0.13'  socket: '/var/lib/mysql/mysql.sock'  port: 3306

How to repeat:
I've not found anyone that could repeat the bug yet.  But it _feels_ like a bug. 

Suggested fix:
Sorry.  It's a bit beyond me.
[30 May 2003 3:45] Tony Holmes
Just thought it important to add that the kernel is an SMP kernel.

T.
[3 Jul 2003 1:30] Michael Widenius
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.

The resolved stack trace is:
x82cd0aaStack range sanity check OK, backtrace follows:
0x80741ea handle_segfault + 450
0x829c1e8 pthread_sighandler + 184
0x80eaf33 move_by_type__11Query_cachePPcPP17Query_cache_blockPUlP17Query_cache_block + 503
0x80eaca0 pack_cache__11Query_cache + 84
0x80ebb29 pack__11Query_cacheUlUi + 29
0x8083738 reload_acl_and_cache__FP3THDUlP13st_table_list + 228
0x8081cc3 mysql_execute_command__Fv + 13183
0x808273b mysql_parse__FP3THDPcUi + 559
0x807de2d dispatch_command__F19enum_server_commandP3THDPcUi + 1513
0x8083c5e do_command__FP3THD + 154
0x807cfff handle_one_connection + 635
0x829999c pthread_start_thread + 220

Unfortunately this doesn't help us solve the problem as we test FLUSH QUERY CACHE in the mysql-test/t/query_cache.test test which is run on all MySQL binaries as part of the release testing.

To solve this, we would need a full test case so that we can repeat the problem on our side.

Another option is to run the debug version of mysqld with --core and upload the core file you get to ftp://support.mysql.com/pub/mysql/secret
[3 Jul 2003 3:22] Tony Holmes
I'm not quite sure what feedback was not provided, since the *only* comments on the bug were mine.  Anyway,  I'm unable to provide a core dump as when the --core argument is provided to mysqld, and the "FLUSH QUERY CACHE" command is executed, the mysqld controlling process goes in to a runaway state using 99.9% processor.  At this stage only a kill -9 will take the processes down.  As a result of this, no core file is provided (and many tables need repairing).  I will gladly provide as much information as I possibly can, but I'm at the mercy of the (mis)behaviour of mysqld.  If anyone has any further suggestions of ways I can provide debug information, I will gladly try them.  Also if someone could tell me what feedback was meant to be returned, once again I'll do my best to provide it.
[3 Jul 2003 3:38] Tony Holmes
Incidentally, with regards to my last entry, should I create a bug for the lack of core dump and race process during crash, when using --core.

Cheers

T.
[3 Jul 2003 14:25] MySQL Verification Team
First of all, no new bug record is required.

Check your ulimit and kernel . If later is RH, or RH based kernel, that it is possible that it is patched not to dump it.

Regarding feedback, we need a set of SQL commands, including the FLUSH ... command that you have used that will always lead to the crash.
[30 Jul 2003 2:36] Tony Holmes
OK, Update time,

I just upgraded to 4.0.14, now the problem takes much longer to show, I'd used up over 1.1MB of the qcache before the problem showed itself.  When it did instead of hanging everything up entirely, it did the following:

#####

mysql> show status LIKE 'qcache%';
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Qcache_queries_in_cache | 1172    |
| Qcache_inserts          | 2371    |
| Qcache_hits             | 14636   |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 184     |
| Qcache_free_memory      | 6956992 |
| Qcache_free_blocks      | 33      |
| Qcache_total_blocks     | 2447    |
+-------------------------+---------+
8 rows in set (0.00 sec)

mysql> flush query cache;
/usr/bin/mysqld_safe: line 339:  3207 Segmentation fault      $NOHUP_NICENESS $ledir/$MYSQLD $defaults --basedir=$MY_BASEDIR_VERSION --datadir=$DATADIR $USER_OPTION --pid-file=$pid_file --skip-locking >>$err_log 2>&1

Number of processes running now: 0
030730 10:17:57  mysqld restarted
030730 10:17:57  mysqld ended

ERROR 2013: Lost connection to MySQL server during query

##############
 
Not much help i know, but I did notice that 'queries in cache' dropped from 1200 to 1172, thereby creating 33 free blocks, before the segfault.  I was monitoring the qcache every 5 seconds previously and the queries_in_cache only increased, it only segfaulted after flushing the query cache after the first decrease in cached queries.

Hope this is of some help

T.
[31 Jul 2003 1:18] Tony Holmes
Further to my last update.  Although the mysqld no longer goes into a race condition after it segfaults, the process executing the FLUSH command stays in a hung state.  In order to restart mysqld, a manual 'kill' of the process is required before the pid file can be used by the new instance of mysqld.
[31 Jul 2003 1:20] Tony Holmes
Update to the Kernel version and mysql server version to reflect changes made to server.
[12 Sep 2003 0:38] Michael K
I guess I won't add anything new to this bug, but I'm consistently hitting this on one of my servers.
Redhat9 4-way SMP system with 2Gigs of RAM, 2.4.18-14smp kernel.
I first enabled query cache in 4.0.13 official RPM binary install. Running FLUSH QUERY CACHE usually ended in mysql crash every 5th time or so, corrupting some tables.
Yesterday I upgraded to official 4.0.15-Max RPM binary, in a vain hope it will fix crashes. After 1 day of mysql uptime I attempted FLUSH QUERY CACHE and got familiar crash.
Here's resolved stack trace of latest crash:

0x80d99b4 handle_segfault + 420
0x4003c74e _end + 934965650
0x81531d4 move_by_type__11Query_cachePPcPP17Query_cache_blockPUlP17Query_cache_block + 500
0x8152f6f pack_cache__11Query_cache + 79
0x815190a pack__11Query_cacheUlUi + 26
0x80e9459 reload_acl_and_cache__FP3THDUlP13st_table_list + 329
0x80e7413 mysql_execute_command__Fv + 12803
0x80e81f5 mysql_parse__FP3THDPcUi + 149
0x80e3753 dispatch_command__F19enum_server_commandP3THDPcUi + 1443
0x80e319e do_command__FP3THD + 158
0x80e298e handle_one_connection + 638
0x40037881 _end + 934945477
0x420e3887 _end + 969204427

thd->query at 0x5b8d2948  is invalid pointer
thd->thread_id=700271

/etc/my.cnf:
[mysqld]
set-variable = ft_min_word_len=3
set-variable = long_query_time=15
set-variable = max_connections=512
set-variable = key_buffer=256M
set-variable = myisam_sort_buffer_size=64M
set-variable = join_buffer=12M
set-variable = record_buffer=4M
set-variable = sort_buffer=8M
set-variable = table_cache=2048
set-variable = thread_cache_size=384
set-variable = wait_timeout=7200
set-variable = connect_timeout=10
set-variable = max_allowed_packet=16M
set-variable = max_connect_errors=512
set-variable = tmp_table_size=128M
query_cache_limit = 1M
query_cache_size = 128M
query_cache_type = 1
read_buffer_size = 2M
skip-innodb
log-slow-queries = /usr/mysql/mysqld.slow.log
server-id=1
skip-name-resolve
log-bin
binlog-ignore-db = mysql
binlog-ignore-db = test
binlog-ignore-db = phpads
set-variable = max_binlog_size=120M

Unfortunately, I see no way to reproduce this outside of our setup. Our DB is quite large and busy (more than 150 queries/sec at busy time, serving popular vbulletin forum), and I don't want to experiment on a live system - I have to repair posts table after every crash, resulting in forum downtime. No core files, too.

Also, I have 3 or so stack traces from 4.0.13 version. Don't remember exactly, but they were definitely different from this last one.

I had to disable query cache for now because of this.
Also, you may have a look at bug 988:
http://bugs.mysql.com/bug.php?id=988
It looks like a duplicate of this one.
[15 Sep 2003 6:24] Oleksandr Byelkin
Thank you for bugreport! 
problem is solved, see BUG#988.