Bug #36812 MySQL "crashing" on subselect queries (I think)
Submitted: 20 May 2008 2:28 Modified: 23 Jun 2008 21:40
Reporter: Aaron Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Errors Severity:S2 (Serious)
Version:5.0.51a OS:Linux
Assigned to: CPU Architecture:Any

[20 May 2008 2:28] Aaron
Description:
For the past few months one of our database servers which hosts one vBulletin MyISAM database has been randomly "crashing." What I mean by crashing is that MySQL has to be manually killed (kill -9 pid and then killall -9 mysqld) in order to be able to restart it. Anyway, it does it completely randomly (usually every few days) and then it leaves this in our error log:

080519 18:21:24 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=268435456
read_buffer_size=2093056
max_used_connections=501
max_connections=500
threads_connected=30
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3332140 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x97b08490
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x96964a9c, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x819c9a9
0x818f67d
0x81f8ac0
0x81f748b
0x81f7307
0x81f748b
0x81f7307
0x81f75c1
0x81f7275
0x81f748b
0x81f7307
0x81f75c1
0x81f7275
0x81f75c1
0x81f7275
0x81f748b
0x81f7307
0x81f748b
0x81f7307
0x81f748b
0x81f7307
0x81f748b
0x81f729d
0x81f6ec9
0x81e6f7e
0x81e79bd
0x81e2d5b
0x81b36e9
0x81baac0
0x81b1d6b
0x81b17be
0x81b0bb6
0xc253cc
0xb25c3e
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x9458b2e0  is invalid pointer
thd->thread_id=2313409
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
080519 18:21:25 [Warning] Found an entry in the 'db' table with empty database name; Skipped
080519 18:21:25 [Warning] Found an entry in the 'db' table with empty database name; Skipped
080519 18:21:25 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.51a-community'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition (GPL)

This is a stack trace (I think):

[root@db admin]# resolve_stack_dump -s /tmp/mysqld.sym -n mysqld.stack
0x819c9a9 handle_segfault + 521
0x818f67d _ZN11select_send9send_dataER4ListI4ItemE + 205
0x81f8ac0 _Z27join_read_next_same_or_nullP14st_read_record + 576
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f7307 _Z10sub_selectP4JOINP13st_join_tableb + 295
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f7307 _Z10sub_selectP4JOINP13st_join_tableb + 295
0x81f75c1 _Z10sub_selectP4JOINP13st_join_tableb + 993
0x81f7275 _Z10sub_selectP4JOINP13st_join_tableb + 149
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f7307 _Z10sub_selectP4JOINP13st_join_tableb + 295
0x81f75c1 _Z10sub_selectP4JOINP13st_join_tableb + 993
0x81f7275 _Z10sub_selectP4JOINP13st_join_tableb + 149
0x81f75c1 _Z10sub_selectP4JOINP13st_join_tableb + 993
0x81f7275 _Z10sub_selectP4JOINP13st_join_tableb + 149
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f7307 _Z10sub_selectP4JOINP13st_join_tableb + 295
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f7307 _Z10sub_selectP4JOINP13st_join_tableb + 295
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f7307 _Z10sub_selectP4JOINP13st_join_tableb + 295
0x81f748b _Z10sub_selectP4JOINP13st_join_tableb + 683
0x81f729d _Z10sub_selectP4JOINP13st_join_tableb + 189
0x81f6ec9 _Z21setup_end_select_funcP4JOIN + 601
0x81e6f7e _ZN4JOIN4execEv + 4878
0x81e79bd _Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select + 493
0x81e2d5b _Z13handle_selectP3THDP6st_lexP13select_resultm + 331
0x81b36e9 _Z21mysql_execute_commandP3THD + 713
0x81baac0 _Z11mysql_parseP3THDPKcjPS2_ + 272
0x81b1d6b _Z16dispatch_command19enum_server_commandP3THDPcj + 1371
0x81b17be _Z10do_commandP3THD + 158
0x81b0bb6 handle_one_connection + 726
0xc253cc (?)
0xb25c3e (?)

We have had MySQL consultants review this and they think it is most likely a MySQL bug and not something we have caused by ourselves. I hope you guys can figure it out. :) Thank you!

How to repeat:
No reliable way, that I know of, to repeat.

Suggested fix:
No idea :(
[20 May 2008 3:16] Valeriy Kravchuk
Thank you for a problem report. Please, send your my.cnf file content and the results of:

uname -a
free

Linux commands, for the beginning.
[20 May 2008 3:18] Aaron
[root@db admin]# uname -a
Linux db.teenspot.com 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:30:58 EDT 2007 i686 i686 i386 GNU/Linux

[root@db admin]# free
             total       used       free     shared    buffers     cached
Mem:       4090552    3978460     112092          0      45528    3674292
-/+ buffers/cache:     258640    3831912
Swap:      8289532        192    8289340

[root@db admin]# cat /etc/my.cnf
[client]
port=3306
socket=/var/lib/mysql/mysql.sock

[mysqld]
bind-address=192.168.0.2
port=3306
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
skip-name-resolve
skip-locking
skip-innodb
max_connections=500
key_buffer=256M
myisam_sort_buffer_size=64M
join_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=1M
sort_buffer_size=4M
table_cache=1800
thread_cache_size=512
thread_concurrency=4
wait_timeout=60
connect_timeout=10
interactive_timeout=60
tmp_table_size=256M
bulk_insert_buffer_size=16M
max_allowed_packet=64M
max_connect_errors=10
max_heap_table_size=128M
query_cache_limit=2M
query_cache_size=128M
query_cache_type=1
query_prealloc_size=16384
query_alloc_block_size=16384
#log-warnings
log-error
#log-slow-queries=mysql-slow-may-27.log
#long-query-time=1
#log-queries-not-using-indexes

[mysqld_safe]
open_files_limit=8192

[mysqldump]
quick
max_allowed_packet=16M

[myisamchk]
key_buffer=64M
sort_buffer=64M
read_buffer=16M
write_buffer=16M
[20 May 2008 15:56] Sveta Smirnova
Thank you for the feedback.

Why do you need to manually kill MySQL server? Does it freeze or? Also please try to catch query which causes such behavior. You can do it if you connect as root in time when event, because which you kill MySQL, happens and run query SHOW PROCESSLIST.
[20 May 2008 18:16] Aaron
Hi Sveta,

We have to kill all of the MySQL processes because they are still running! I know that is a simple answer, but it is the only way to get MySQL working correctly again. If we just do a "/etc/rc.d/init.d/mysql start" when it "crashes" it just says that MySQL is already running, I believe. I will double check this next time it happens. I will also try and get a processlist from it, if it lets me.

Just to clarify on killing MySQL processes... we literally have to do this:

/etc/rc.d/init.d/mysql stop

It does it's ............... thing for a few minutes, but the MySQL processes are still running. At this point we do a:

killall -9 mysqld

It appears the mysqld processes are *still* running, then we have to find the mysqld process running under the "root" user and manually kill that pid.

kill -9 <root process pid>

Then we do this again:

killall -9 mysqld

and FINALLY, after those steps are all done, then MySQL is able to be started again.

Thanks!
[23 May 2008 2:45] Aaron
I wanted to add that I tried to do a "mysqladmin -u root processlist" when the "crash" happened, but it would not work. It was like it was lagged beyond belief.

Thanks
[23 May 2008 21:40] Sveta Smirnova
Thank you for the feedback.

For me many sub_select calls can show you issue some large SELECT query with JOIN of many tables. Additionally free shows you use almost all RAM of the system.

To be sure big SELECT is the cause turn on general query log and catch this select from the log file (it would be one of latest queries).
[23 Jun 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".