MySQL Bugs: #45971: Optimize on NDB table causes full cluster crash (All API/NDB Storage nodes).

Bug #45971	Optimize on NDB table causes full cluster crash (All API/NDB Storage nodes).
Submitted:	6 Jul 2009 13:49	Modified:	5 Aug 2009 7:50
Reporter:	John Sabo	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-6.3	OS:	Linux (Gentoo 2008.0 AMD64)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	crash, ndbd, ndbmtd, Optimize

Description:
This happens both with. ndbmtd and ndbd. 

mysql -e "optimize table billing_processor_data_dump" avarice
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query

mysql> show table status where name = 'billing_processor_data_dump';
+-----------------------------+------------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| Name                        | Engine     | Version | Row_format | Rows  | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+-----------------------------+------------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| billing_processor_data_dump | ndbcluster |      10 | Dynamic    | 10470 |             60 |     4653056 |               0 |            0 |         0 |          10866 | NULL        | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
+-----------------------------+------------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.01 sec)

CREATE TABLE `billing_processor_data_dump` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `requested_date_from` datetime NOT NULL,
  `requested_date_to` datetime NOT NULL,
  `requested_fields` varchar(255) NOT NULL,
  `returned_data` longtext,
  `billing_company_id` int(11) unsigned NOT NULL DEFAULT '3',
  PRIMARY KEY (`id`),
  KEY `date` (`date`),
  KEY `date_from` (`requested_date_from`),
  KEY `billing_company_id` (`billing_company_id`)
) /*!50100 STORAGE MEMORY */ ENGINE=ndbcluster DEFAULT CHARSET=utf8;

Unfortunatly a data dump is not possible as this table stores customer information.

 

How to repeat:
Run 'OPTIMIZE TABLE [table]'

Sample of error from one of the data nodes. 

Time: Monday 6 July 2009 - 04:51:01
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming
 error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupVarAlloc.cpp
Error object: DBTUP (Line: 482) 0x0000000a
Program: /usr/sbin/ndbmtd
Pid: 15280
Trace: /db/datadir/ndb_4_trace.log.1 /db/datadir/ndb_4_trace.log.1_t1 /db/datadi
r/ndb_4_trace.log.1_t2 /db/datadir/ndb_4_trace.log.1_t3 /db/datadir/ndb_4_trace.
lo

ndb_mgmd config file

Attachment: my-mgmd.cnf (application/octet-stream, text), 3.79 KiB.

Data node 2(of 4) error and trace logs.

Attachment: node_2_trace.tar.bz2 (application/octet-stream, text), 189.66 KiB.

Data node 1(of 4) error and trace logs.

Attachment: node_3_trace.tar.bz2 (application/octet-stream, text), 183.15 KiB.

Data node 3(of 4) error and trace logs.

Attachment: node_4_trace.tar.bz2 (application/octet-stream, text), 218.34 KiB.

Data node 4(of 4) error and trace logs.

Attachment: node_5_trace.tar.bz2 (application/octet-stream, text), 186.87 KiB.

Hi,

I think I've got the same problem. NDBCluster 7.0.5. Only one table :

CREATE TABLE `foo` (
  `idAutoOT` int(10) unsigned NOT NULL auto_increment,
  `numInt` varchar(20) collate latin1_general_ci NOT NULL,
  `codeBase` varchar(5) collate latin1_general_ci NOT NULL,
  `dateEnvoi` datetime default NULL,
  `blocXML` text collate latin1_general_ci,
  `etat` int(2) unsigned default '0',
  `pjAttendue` int(1) unsigned default '0',
  `dateT` datetime default NULL,
  `agence` varchar(15) collate latin1_general_ci default NULL,
  `codeETL` varchar(20) collate latin1_general_ci default '',
  `codeUI` varchar(3) collate latin1_general_ci default '',
  `otGroupe` int(1) default NULL,
  `pix` int(1) default '0',
  `dateReception` datetime default NULL,
  PRIMARY KEY  (`idAutoOT`),
  KEY `cuOT` (`numInt`,`codeBase`),
  KEY `otcle1` (`dateT`),
  KEY `otcle2` (`dateT`,`etat`),
  KEY `o3` (`codeETL`,`dateEnvoi`),
  KEY `o4` (`codeUI`,`codeETL`)
) ENGINE=MyISAM AUTO_INCREMENT=1179679 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
;

There are 100.000 records in this table. When I truncate it and launch an "Optimize table foo' I have no problem, with the records, I get a :

2009-07-20 11:16:02 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

And all the nodes of the cluster Crash.

Here is my config.ini :

##################################
# CONFIGURATION DU CLUSTER MYSQL #
##################################

[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M 

[NDBD DEFAULT]

NoOfReplicas=2
Datadir=/servers/mysql/cluster
DataMemory=2560MB
IndexMemory=512MB
MaxNoOfExecutionThreads=4 # MySQL Cluster 7 -> multithread

NoOfFragmentLogFiles=32  
FragmentLogFileSize=128M
Diskcheckpointspeed=10M
Diskcheckpointspeedinrestart=100M
TimeBetweenLocalCheckpoints=20 

LockPagesInMainMemory=0 
RedoBuffer=32M 

LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

[NDB_MGMD]

HostName=xxxxxxxxxxx
Id=1
DataDir=/var/lib/mysql-cluster
ArbitrationRank=1
LogDestination=FILE:filename=/var/log/mgmd.log,maxsize=25000000,maxfiles=4

##############
# DATA NODES #
##############

[NDBD]

HostName=xxxxxxxxxxx
MaxNoOfAttributes=100000
MaxNoOfConcurrentOperations=1000000
MaxNoOfLocalOperations=1000000
MaxNoOfConcurrentTransactions=20480
MaxNoOfConcurrentScans=400
MaxNoOfTables=5000
MaxNoOfTriggers=5000
MaxNoOfOrderedIndexes=5000
MaxNoOfUniqueHashIndexes=1000
Id=2

[NDBD]

HostName=xxxxxxxxxxx
MaxNoOfAttributes=100000
MaxNoOfConcurrentOperations=250000
MaxNoOfConcurrentTransactions=20480
MaxNoOfConcurrentScans=400
MaxNoOfTables=5000
MaxNoOfTriggers=5000
MaxNoOfUniqueHashIndexes=1000
MaxNoOfOrderedIndexes=5000
Id=3

#############
# API NODES #
#############

[MYSQLD]
HostName=xxxxxxxxxxx
Id=5

[MYSQLD]
HostName=xxxxxxxxxxx
Id=6

[MYSQLD]
HostName=xxxxxxxxxxx
Id=4

[MYSQLD]
#Node de test
Id=10

With this trace : 
2009-07-20 09:58:36 [ndbd] INFO     -- dbtup/DbtupVarAlloc.cpp
2009-07-20 09:58:36 [ndbd] INFO     -- DBTUP (Line: 482) 0x0000000e
2009-07-20 09:58:36 [ndbd] INFO     -- Error handler shutting down system
2009-07-20 09:58:36 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-07-20 09:58:36 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

I'm on Centos 5.2.

Thank you

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79981

3003 Jonas Oreland	2009-08-04
      ndb - bug#45971 - crash during optimize table, make sure free list are searched correctly
                        reuse get_alloc_page (with size + 1) since index might grow
            bug#43683 - optimize table does not return memory,
                        try to move to page with *least* amount of free space

Versions? 6.2+?

no 6.2, optimize not implement in 6.2
6.3.26, 7.0.7 (i.e next version)

weird that triggers didnt do this...

Documented bugfix in the NDB-6.3.26 and 7.0.7 changelogs as follows:

        OPTIMIZE TABLE on an NDB table could in some cases cause SQL and
        data nodes to crash. This issue was observed with both ndbd and
        ndbmtd.