Bug #64278 mysqld, ndbd process fails (Internal program error, Error 2341)
Submitted: 9 Feb 2012 13:32 Modified: 8 Sep 2016 6:52
Reporter: Gabor Zele Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.5.16 ndb-7.2.2 OS:Linux (Debian 6.0 64-bit)
Assigned to: CPU Architecture:Any
Tags: cluster, ndb, ndbd

[9 Feb 2012 13:32] Gabor Zele
Description:
Our ndb cluster of to machines (both have mysql, ndbd, ndb_mgmd), the ndbd process quits due to unknown reasons with these errors (usually the second node exits after 10-20 minutes of the first):

Time: Thursday 9 February 2012 - 13:53:29
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming
error or missing error message, please report a bug)
Error: 2341
Error data: /pb2/build/sb_0-4474442-1322943573.08/mysql-cluster-gpl-7.2.2/storag
e/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 9735) 0x00000002
Program: ndbd
Pid: 6664
Version: mysql-5.5.16 ndb-7.2.2
Trace: /data/nbd-node/ndb_3_trace.log.4
***EOM**

How to repeat:
It repeats itself. it occured just 3-4 times today.
[10 Feb 2012 14:41] Gabor Zele
It seems the frontend mysqld also segfaults many times:

120210 14:54:36 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=3
max_threads=151
thread_count=3
connection_count=3
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 134075 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x260ae40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6b55cece78 thread_stack 0x40000
/opt/mysql/server-5.5/bin/mysqld(my_print_stacktrace+0x39)[0x8397c9]
/opt/mysql/server-5.5/bin/mysqld(handle_segfault+0x43a)[0x59847a]
/lib/libpthread.so.0(+0xeff0)[0x7f6b6688aff0]
/lib/libc.so.6(memcpy+0x46)[0x7f6b657bf936]
/opt/mysql/server-5.5/bin/mysqld(_ZN13ha_ndbcluster13unpack_recordEPhPKh+0x248)[0x9cd728]
/opt/mysql/server-5.5/bin/mysqld(_ZN13ha_ndbcluster8rnd_nextEPh+0x240)[0x9de720]
/opt/mysql/server-5.5/bin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x1f)[0x7da6bf]
/opt/mysql/server-5.5/bin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x79)[0x6333a9]
/opt/mysql/server-5.5/bin/mysqld[0x637cc2]
/opt/mysql/server-5.5/bin/mysqld(_ZN4JOIN4execEv+0xcaa)[0x64cafa]
/opt/mysql/server-5.5/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x12c)[0x64e2cc]
/opt/mysql/server-5.5/bin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x165)[0x64ed15]
/opt/mysql/server-5.5/bin/mysqld[0x60d562]
/opt/mysql/server-5.5/bin/mysqld(_Z21mysql_execute_commandP3THD+0x1216)[0x610fb6]
/opt/mysql/server-5.5/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x158)[0x614638]
/opt/mysql/server-5.5/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x135f)[0x6159bf]
/opt/mysql/server-5.5/bin/mysqld(_Z24do_handle_one_connectionP3THD+0xcf)[0x6aca9f]
/opt/mysql/server-5.5/bin/mysqld(handle_one_connection+0x51)[0x6acba1]
/lib/libpthread.so.0(+0x68ca)[0x7f6b668828ca]
/lib/libc.so.6(clone+0x6d)[0x7f6b6580f86d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.

the query is always the same type in the log, but running the specified query from command line runs without any errors.

The application that produces the query is a tomcat webapp, using the hibernate library for db access.
[13 Feb 2012 7:32] Jonas Oreland
Hi,

Thx for bug-report.
Is it possible to get the query that causes this ?
  (and some data) so that we can try it ?

A suggestion to how to try to reproduce would be to run the same query using
  several threads in parallel...e.g using mysqlslap

/Jonas
[16 Feb 2012 17:38] Jonas Oreland
Hi again,

1) we yesterday released 7.2.4 can you try with this version ?
2) for us to move forward. the very least we need is query + schema + output from explain.
3) having some data can also be beneficial...but without 2) nothing will happen on this one.

/Jonas
[19 Feb 2012 0:04] Gabor Zele
Hi,

I have installed 7.2.4 and the problem remains, but errors are less frequent now. i had one mysqld failure, and 8 ndbd shutdonws on Friday (on previous days there were 40-70 ndbd errors daily)

Unfortunately I am not able to identify the problematic query, all queries from the error log works fine standalone, even I tried mysqlslap but I was unable to reproduce the bug intentionally. Do you have any ideas how to go on?

thanks,
Gabor
[19 Feb 2012 9:19] Jonas Oreland
Hi

Ideas to go on:
1) Please verify that uploading schema/queries (including explain)
   is impossible

2) Please verify that uploading sample data is impossible

---

3) Can you try adding "SharedGlobalMemory=256M" to "[ndbd default]"
   and see if that changes behaviour ?

4) Can you upload traces generated with 7.2.4 (same as the ones you did...but
   for 7.2.4 instead)

/Jonas
[25 Feb 2012 16:33] Tom John
I am also receiving this crash frequently using MySQL cluster 7.2.4

SharedGlobalMemory=256M doesn't seem to help. Thanks

Thread pointer: 0x7feae31a1800
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x11955aed8 thread_stack 0x40000
0   mysqld                              0x000000010a04a71e my_print_stacktrace + 46
1   mysqld                              0x0000000109e932db handle_segfault + 523
2   libsystem_c.dylib                   0x00007fff89224cfa _sigtramp + 26
3   ???                                 0x000000011781300c 0x0 + 4689309708
4   mysqld                              0x000000010a162680 _ZN13ha_ndbcluster13unpack_recordEPhPKh + 750
5   mysqld                              0x000000010a16bba2 _ZN13ha_ndbcluster11next_resultEPh + 68
6   mysqld                              0x000000010a168a7f _ZN13ha_ndbcluster18ordered_index_scanEPK12st_key_rangeS2_bbPhP13part_id_range + 1455
7   mysqld                              0x000000010a16ad40 _ZN13ha_ndbcluster23read_range_first_to_bufEPK12st_key_rangeS2_bbPh + 722
8   mysqld                              0x000000010a156f4d _ZN13ha_ndbcluster10index_readEPhPKhj16ha_rkey_function + 77
9   mysqld                              0x0000000109f574f0 _ZL20join_read_always_keyP13st_join_table + 384
10  mysqld                              0x0000000109f602f2 _Z10sub_selectP4JOINP13st_join_tableb + 98
11  mysqld                              0x0000000109f65bbf _ZL9do_selectP4JOINP4ListI4ItemEP5TABLEP9Procedure + 479
12  mysqld                              0x0000000109f768e5 _ZN4JOIN4execEv + 10181
13  mysqld                              0x0000000109f74084 _Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex + 1124
14  mysqld                              0x0000000109f79cee _Z13handle_selectP3THDP3LEXP13select_resultm + 302
15  mysqld                              0x0000000109f317f9 _ZL21execute_sqlcom_selectP3THDP10TABLE_LIST + 793
16  mysqld                              0x0000000109f32040 _Z21mysql_execute_commandP3THD + 2032
17  mysqld                              0x0000000109f37f76 _Z11mysql_parseP3THDPcjP12Parser_state + 294
18  mysqld                              0x0000000109f3904d _Z16dispatch_command19enum_server_commandP3THDPcj + 1709
19  mysqld                              0x0000000109f39f37 _Z10do_commandP3THD + 231
20  mysqld                              0x0000000109fdba51 _Z24do_handle_one_connectionP3THD + 353
21  mysqld                              0x0000000109fdbb09 handle_one_connection + 73
22  libsystem_c.dylib                   0x00007fff891d08bf _pthread_start + 335
23  libsystem_c.dylib                   0x00007fff891d3b75 thread_start + 13

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7feae3147810): is an invalid pointer
Connection ID (thread ID): 217
Status: NOT_KILLED
[25 Feb 2012 18:23] Gabor Zele
I haven't tried setting SharedGlobalMemory yet, but disabling these three optimizations in my.cnf worked as a successful workaround avoiding these problems:

ndb-force-send = 0
engine-condition-pushdown = 0 
ndb-join-pushdown = 0                               

I have no time to investigate which one of these are responsible for the crash. But last week we don't have any crashes at all.

Gabor
[25 Feb 2012 18:23] Tom John
For me this is the query that is causing mysqld to crash:

select tx.tx_index from tx_output, tx where tx_output.tx_index = tx.tx_index and tx.hash = ? and tx_output.tx_output_n = ?

It runs fine on it's own, but seems that when mixed with some a combination of other queries or the cluster is under heavy load it causes the crash. Sorry I can't pinpoint it any further, all I know is when I remove that query the crash no longer occurs.

CREATE TABLE `tx` (
  `tx_index` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `hash` binary(32) NOT NULL,
  `version` tinyint(11) unsigned NOT NULL,
  `time` int(10) unsigned DEFAULT NULL,
  `ipv4` int(10) unsigned NOT NULL,
  PRIMARY KEY (`tx_index`),
  UNIQUE KEY `ihash` (`hash`) USING HASH
) ENGINE=ndbcluster AUTO_INCREMENT=858491 DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=FIXED;

CREATE TABLE `tx_output` (
  `tx_index` int(11) unsigned NOT NULL,
  `tx_output_n` smallint(11) unsigned NOT NULL,
  `value` bigint(20) NOT NULL,
  `type` tinyint(4) NOT NULL,
  `hash` binary(20) DEFAULT NULL
  PRIMARY KEY (`tx_index`,`tx_output_n`),
  KEY `ihash` (`hash`)
) ENGINE=ndbcluster DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Explain:

1	SIMPLE	tx	ALL	PRIMARY,ihash				792531	Parent of 2 pushed join@1; Using where
1	SIMPLE	tx_output	eq_ref	PRIMARY	PRIMARY	6	tx.tx_index,const	1	Child of 'tx' in pushed join@1
[18 May 2012 18:29] Ole John Aske
Based on the crash in DBLQH around line ~9750, I suspect this to be a duplicate of bug#65084, and bug#65141
[8 Sep 2016 6:52] MySQL Verification Team
Inspecting the logs and doing reproduction analysis I agree with Ole that this is solved with 7.2.7