Bug #41651 Cluster with over 20 ndbd nodes hangs with "Unhandled sections after execute"
Submitted: 19 Dec 2008 23:22 Modified: 7 Feb 2009 12:58
Reporter: Charad Sheerajin Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:6.3.19 OS:Linux (RHEL 5)
Assigned to: Assigned Account CPU Architecture:Any

[19 Dec 2008 23:22] Charad Sheerajin
Description:
We started a MySQL Cluster with 22 ndbd nodes and inserted over 400k rows.  Then we ran an indexed query against the cluster.  We expected a count for the # of rows to be returned, but instead, the mysqld process hangs.  The following line is logged to the management console's logs for Node 16 (ndbd): "Unhandled sections after execute".

How to repeat:
**************************************************************
Here are the commands to execute to get the failure to happen.
The only exception is that you will have to insert data into
the table before you can run the queries at the bottom.
**************************************************************

mysql> CREATE TABLE `test2` (
    ->   `id` bigint(20) unsigned NOT NULL,
    ->   `col1` varchar(32) NOT NULL DEFAULT '',
    ->   `col2` enum('I','O') NOT NULL DEFAULT 'I',
    ->   `col3` varchar(32) NOT NULL DEFAULT '',
    ->   `col4` varchar(32) NOT NULL DEFAULT '',
    ->   `col5` bigint(20) NOT NULL DEFAULT '0',
    ->   `col6` int(11) NOT NULL DEFAULT '0',
    ->   `col7` tinyint(4) NOT NULL DEFAULT '0',
    ->   `col8` bigint(20) NOT NULL DEFAULT '0',
    ->   `col9` varchar(96) NOT NULL DEFAULT '',
    ->   `col10` bigint(20) NOT NULL DEFAULT '0',
    ->   `col11` varchar(500) NOT NULL DEFAULT '0',
    ->   `col12` bigint(20) unsigned NOT NULL,
    ->   `col13` varchar(10) not null default '',
    ->   `col14` enum('T','F') NOT NULL DEFAULT 'F',
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

********************************************************************
*****  Run a program that inserts a few hundred thousand rows  *****
********************************************************************

mysql> select count(*) from test2;
+----------+
| count(*) |
+----------+
|   416000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from test2 where id < 50000;
+----------+
| count(*) |
+----------+
|    50000 |
+----------+
1 row in set (0.06 sec)

mysql> select count(*) from test2 where id < 100000;
Query aborted by Ctrl+C

********************************************************************
NOTE that the last query hung. I had to control-c the mysql client.
At the same time that the query hung, the management log file printed:

2008-12-19 16:31:38 [MgmSrvr] INFO     -- Node 16: Unhandled sections after execute

Also note that the number in the query that causes it to fail is somewhat random.
Sometimes it will fail with the query only asking for a few hundred records or just a few thousand.
This time it did not fail until I asked for < 100000
[19 Dec 2008 23:24] Charad Sheerajin
Fixed synopsis.
[19 Dec 2008 23:27] Charad Sheerajin
Management node logs, ndbd logs, and config files

Attachment: more_than_20_nodes_bug.tar.gz (application/x-gzip, text), 15.50 KiB.

[19 Dec 2008 23:29] Charad Sheerajin
Node 16: ndbd with "Unhandled sections after execute"
Node 128: ndbd and ndb_mgmd
Node 129: ndbd and ndb_mgmd

Other 19 ndbd's log files were not included.
[22 Dec 2008 15:36] Jonas Oreland
Hi,

Would it be possible for you to compile yourself, and
add some switches which will make the node crash, when the
problem occurs...this way I'm quite confident that we can 
fix it quickly.

/Jonas
[22 Dec 2008 16:53] Charad Sheerajin
Yes... we already have the compile setup so adding extra switches is not a problem.  Thanks.
[22 Dec 2008 16:58] Jonas Oreland
add "--with-ndb-ccflags='-DERROR_INSERT'" to your configure line
and then the node that gets this should abort when it happens.

if (when) this happen,
supply
- ndb_X_error.log
- ndb_X_trace*.log
- ndb_X_out.log
- ndb_Y_cluster*.log

(or use ndb_error_reporter which will do it for you)

/Jonas
[8 Feb 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".