Bug #56831 5.1.47-ndb-7.1.5 mysqld randomly crashing with segfault
Submitted: 16 Sep 2010 23:16 Modified: 8 Sep 2016 5:58
Reporter: Christian Ehmig Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.47-ndb-7.1.5 OS:Linux (debian 2.6.26-2-amd64)
Assigned to: CPU Architecture:Any
Tags: crash on query, ndb tables, Signal 11
Triage: Triaged: D1 (Critical) / R6 (Needs Assessment) / E6 (Needs Assessment)

[16 Sep 2010 23:16] Christian Ehmig
Description:
We noticed random crashes of our mysqld api frontends in our MySQL Cluster setup. The cluster itself (data nodes and management nodes) is healthy and never crashed so far, although the cluster is productive for just 5 days now. Furthermore, we were able to track down the crashes to certain SELECT queries which are run on NDB tables only. We moved those queries to a particular MySQL instance (productive!) which is constantly crashing now (several times a day).

At certain time intervals, when the mysqld was restarted by mysqld_safe, it crashed immediatelly after restarting. I can provide further information if needed (ndb table schemes, cluster config, ...) We tried both binaries and built  mysqld from source (7.1.5 and 7.1.7) - identical behavior. We know the query mentioned in the crash dump is not optimal running on ndb tables, but at least it should not crash our productive mysqlds...

NDB specific mysqld settings:
ndbcluster
ndb_log_bin=0
ndb_log_binlog_index=0
ndb-cluster-connection-pool=8
ndb-force-send=1
ndb-use-exact-count=0
ndb-extra-logging=1
ndb-autoincrement-prefetch-sz=256
engine-condition-pushdown=1

Crash dump:

100917  0:49:37 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=21
max_threads=500
threads_connected=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1101340 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x7f60601da6d0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x41b45b48 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x33)[0x98dc63]
/usr/local/mysql/bin/mysqld(handle_segfault+0x324)[0x635f24]
/lib/libpthread.so.0[0x7f6086f8aa80]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache14get_free_blockEmcm+0x1b5)[0x768005]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache14allocate_blockEmcm+0x47)[0x769bd7]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache11store_queryEP3THDP10TABLE_LIST+0x478)[0x76c6c8]
/usr/local/mysql/bin/mysqld[0x6417a1]
/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x402d)[0x649fad]
/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x17c)[0x64e4dc]
/usr/local/mysql/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe04)[0x64f3b4]
/usr/local/mysql/bin/mysqld(_Z10do_commandP3THD+0xde)[0x64fcbe]
/usr/local/mysql/bin/mysqld(handle_one_connection+0x1f0)[0x63fa10]
/lib/libpthread.so.0[0x7f6086f82fc7]
/lib/libc.so.6(clone+0x6d)[0x7f608621864d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x1b82f10 = (SELECT UNIX_TIMESTAMP(pk_datetime) as timestamp FROM `27_quote_intraday_bid` WHERE pfk_instrument_id = '134002' ORDER BY pk_datetime ASC LIMIT 1) UNION(SELECT UNIX_TIMESTAMP(pk_datetime) as timestamp FROM `27_quote_intraday_bid` WHERE pfk_instrument_id = '134002' ORDER BY pk_datetime DESC LIMIT 1) LIMIT 100
thd->thread_id=1252
thd->killed=NOT_KILLED

How to repeat:
Not possible so far - occurs randomly. Sending a query like the one within the crashdump "manually" to the server works.
[17 Sep 2010 3:12] Valeriy Kravchuk
Please, send the my.cnf file content.
[17 Sep 2010 8:44] Christian Ehmig
my.cnf config file

Attachment: my.cnf (application/octet-stream, text), 4.62 KiB.

[17 Sep 2010 9:09] Christian Ehmig
Is this issue maybe related to the enabled query cache? The query cache implementation differs for the ndb storage engine as query cache invalidation is a "distributed" task. I disabled the query cache for now and will reply if I see any improvements.
[17 Sep 2010 12:02] Christian Ehmig
Currently the server in question is running 7 hours with no crash (query cache disabled). However, I have another strange thing which I forgot to mention, the processlist contains the following query:

*************************** 2. row ***************************
     Id: 350
   User: core
   Host: 10.20.56.11:39685
     db: instruments
Command: Query
   Time: 26355
  State: Sending data
   Info: /* Core: www.godmode-trader.ch :: /js/core/chart/history.php :: - :: - :: Instrument_Filter_HistoryIntradayExt :: www02/15973 */ (SELECT value as open, MIN(value) as low, MAX(value) as high, SUBSTRING(MAX(CONCAT(UNIX_TIMESTAMP(`pk_datetime`), value)), 11) as close, MAX(UNIX_TIMESTAMP(`pk_datetime`)) as seconds   FROM `27_quote_intraday_bid` WHERE `pfk_instrument_id`='134005' AND UNIX_TIMESTAMP(`pk_datetime`) < 1277510400  GROUP BY UNIX_TIMESTAMP(`pk_datetime`) DIV 60 ORDER BY seconds DESC LIMIT 0,1) UNION (SELECT value as open, MIN(value) as low, MAX(value) as high, SUBSTRING(MAX(CONCAT(UNIX_TIMESTAMP(`pk_datetime`), value)), 11) as close, MAX(UNIX_TIMESTAMP(`pk_datetime`)) as seconds   FROM `27_quote_intraday_bid` WHERE `pfk_instrument_id`='134005'  AND UNIX_TIMESTAMP(`pk_datetime`) >= '1277510400' AND UNIX_TIMESTAMP(`pk_datetime`) <= '1284698530'   GROUP BY UNIX_TIMESTAMP(`pk_datetime`) DIV 60 ORDER BY seconds ASC) ORDER BY seconds ASC

So in fact, this query was sent right after server startup and hangs since then in state "Sending data". Of course, it is not sending any data anywhere...
[8 Sep 2016 5:58] Bogdan Kecman
This signal 11 crash was fixed somewhere around 7.1.12, not repeatable any more on any of the modern versions