Bug #30153 NDB: on-line add column during updates causes complete and catastrophic failure
Submitted: 31 Jul 2007 16:18 Modified: 5 Oct 2007 11:40
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.2 OS:Linux (32 Bit)
Assigned to: Jonathan Miller CPU Architecture:Any

[31 Jul 2007 16:18] Jonathan Miller
Description:
I pulled tomas patch for the ForceVarPart: in the MySQLD and retested using the scripts I
built. During this testing the complete cluster system including both MySQLD's
failed/crashed.

Looking through the logs and core the following happens:

Cluster:

Node ID 3 crashes on assert(bm_len <= max_bmlen);
Node ID 2 crashes on ndbassert(retNo == 0);

Local = sever scripts are running on
Remote = MySQLD running on different server that some scripts use.

The local MySQLD Crashes during:
Z24ndb_handle_schema_changeP3THDP3NdbP17NdbEventOperationP19st_ndbcluster_share + 9850x836fc63

And the remote MySQLD crashes on 

if (!old &&
1760              old->getObjectVersion() != altered_table->getObjectVersion())
1761            dict->putTable(altered_table);

A complete crash log will be attached to this report after open

How to repeat:
Edit ndb_alter_dd.pl and set our $alterCount=27;
Create 2 DN cluster with 2 MySQLD's. MySQLD's should be on different hosts.
Edit shell scripts to contain remote MySQLD information
Run main script on local system using: perl ./ndb_alter_dd.pl --sock
[31 Jul 2007 16:18] Jonathan Miller
AlterTableUpdateCrash.log

Attachment: AlterTableUpdateCrash.log (text/x-log), 21.14 KiB.

[31 Jul 2007 19:53] Jonathan Miller
With the bin-log turned off, the cluster crash still happens, but both MySQLD's stay up
and running.

Local MySQLD error log:
ERR: receiveResponse - theImpl->theWaiter.m_state = 1
Node failed when TCRELEASE sent
Node failed when TCRELEASE sent
Node failed when TCRELEASE sent
NDB: Found 3 NdbTransaction's that have not been released
NDB: Found 2 NdbReceiver's that have not been released

Remote MySQLD error log:
070731 21:48:15 [ERROR] NDB binlog: Skipping drop database 'TABLE_ALTER' since it contained local tables binlog schema event 'DROP DATABASE IF EXISTS TABLE_ALTER' from node 5.
con=0xa0428c8 node=2 Connected InPreparedList sendABORT Aborted CompletedFailure
The node was stone dead, inform about abort
070731 21:48:30 [Note] NDB Binlog: Node: 2, down, Subscriber bitmask 00
070731 21:48:31 [Note] NDB Binlog: Node: 3, down, Subscriber bitmask 00
070731 21:48:31 [Note] NDB Binlog: cluster failure for ./mysql/ndb_schema at epoch 19.
con=0xa04dec8 node=3 Connected InPreparedList sendABORT Aborted CompletedFailure
The node was stone dead, inform about abort
070731 21:48:34 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:37 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:40 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:43 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:46 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:49 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:52 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:55 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:48:58 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:49:01 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:49:04 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
070731 21:49:07 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Incorrect information in file: './TABLE_ALTER/t1.frm'
Node failed when TCRELEASE sent
Node failed when TCRELEASE sent
NDB: Found 2 NdbTransaction's that have not been released
NDB: Found 2 NdbReceiver's that have not been released
[1 Aug 2007 21:14] Jonathan Miller
NOTE: This was using Fixed tables and not Dynamic
[5 Oct 2007 11:40] Jonathan Miller
Retest shows that this had been corrected in the latest clone.
/Jeb