Bug #49713 ndbd crash after alter tablespace
Submitted: 15 Dec 2009 14:16 Modified: 15 Dec 2009 16:16
Reporter: Mikael Carlson Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S2 (Serious)
Version:5.1.40 OS:Linux (Linux servername_here 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux)
Assigned to: CPU Architecture:Any
Tags: ndb tablespace alter

[15 Dec 2009 14:16] Mikael Carlson
Description:
ndbd crashes and failes to start after adding disk data over 4.5GB. 
I wanted to extend that tablespace to accomodate a load of data that will be migrated to the cluster from a mysql running on windows. Database size that will be migrated is approx 50GB. Now instead the ndbd wont start :(

Setup is following:
* Two cluster nodes, a few tables in RAM, a tablespace with 4.5GB (not containing data atm).
* The tablespace needs to be extended to start filling it up with data.
* The tablespace is extended with one file, (this seem to work fine?)

mysql> ALTER TABLESPACE ts_1 ADD DATAFILE 'data_4.dat' INITIAL_SIZE 2047M ENGINE NDBCLUSTER;
Query OK, 0 rows affected (51,95 sec)

* Waiting 20 min for the cluster to settle.
* Another extension is applied;

mysql> ALTER TABLESPACE ts_1 ADD DATAFILE 'data_5.dat' INITIAL_SIZE 2047M ENGINE NDBCLUSTER;
ERROR 1533 (HY000): Failed to alter:  CREATE DATAFILE

* Here i should probably have done a show warnings, but instead it went like this; 

mysql> SELECT LOGFILE_GROUP_NAME, FILE_NAME, EXTRA, TABLESPACE_NAME, DATA_FREE, DATA_LENGTH FROM INFORMATION_SCHEMA.FILES;
Empty set, 1 warning (0,00 sec)

mysql> SELECT LOGFILE_GROUP_NAME, FILE_NAME, EXTRA, TABLESPACE_NAME, DATA_FREE, DATA_LENGTH FROM INFORMATION_SCHEMA.FILES;
Empty set, 1 warning (0,01 sec)

mysql> show warnings;
+-------+------+-------------------------------------------+
| Level | Code | Message                                   |
+-------+------+-------------------------------------------+
| Error | 1296 | Got error 4009 'Cluster Failure' from NDB |
+-------+------+-------------------------------------------+
1 row in set (0,00 sec)

* I log out of the server  and goes into management console, show produces the following (ips changed);

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=2 (not connected, accepting connect from 1.2.3.4)
id=3 (not connected, accepting connect from 1.2.3.5)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@1.2.3.9  (Version: 5.1.40)

*I discover that the ndb processes have died on both cluster servers.
*The new data files are however there;

-rw-r--r-- 1 root mysql  268500992 30 nov 01.54 data_1.dat
-rw-r--r-- 1 root mysql  268500992 30 nov 01.54 data_2.dat
-rw-r--r-- 1 root mysql 2146533376 30 nov 02.30 data_3.dat
-rw-r--r-- 1 root mysql 2146533376 15 dec 12.16 data_4.dat
-rw-r--r-- 1 root mysql 2146533376 15 dec 12.48 data_5.dat

* I try to restart ndbd, ndbd goes into background and the mgm reports the following:

Node 3: Forced node shutdown completed. Occured during startphase 4. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

* This is from ndb2_error.log:

Time: Tuesday 15 December 2009 - 13:21:21
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 3567) 0x0000000a
Program: ndbd
Pid: 30532
Trace: /usr/local/mysql/var/mysql-cluster/ndb_2_trace.log.4
Version: Version 5.1.40
***EOM***

*The ndb_2_trace.log.4 contains the following loop:

--------------- Signal ----------------
r.bn: 246 "DBDIH", r.proc: 2, r.sigId: 242674 gsn: 238 "DISEIZEREQ" prio: 1
s.bn: 245 "DBTC", s.proc: 2, s.sigId: 242673 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'0001098d H'00f50002
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 242673 gsn: 236 "DISEIZECONF" prio: 1
s.bn: 246 "DBDIH", s.proc: 2, s.sigId: 242672 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'0001098c H'00008184

*The mysql1.err shows the following: 

091215 13:06:11 [ERROR] /usr/local/mysql/libexec/mysqld: Incorrect information in file: './database1/table1.frm'
091215 13:06:11 [ERROR] /usr/local/mysql/libexec/mysqld: Incorrect information in file: './database1/table2.frm' 

And so on, for all tables and databases, even the ones that arent supposed to use disk storage. 

===============================

Show ENGINE NDB STATUS before applying tablespace; (stipped ip and placeholders) 

mysql> SHOW ENGINE NDB STATUS;
| Type       | Name                  | Status                                                                                                                                         |
| ndbcluster | connection            | cluster_node_id=4, connected_host=1.2.3.9, connected_port=1186, number_of_data_nodes=2, number_of_ready_data_nodes=2, connect_count=1    |
| ndbcluster | NdbTransaction        | created=3, free=0, sizeof=212                                                                                                                  |
| ndbcluster | NdbOperation          | created=4, free=4, sizeof=660                                                                                                                  |
| ndbcluster | NdbIndexScanOperation | created=1, free=1, sizeof=744                                                                                                                  |
| ndbcluster | NdbIndexOperation     | created=0, free=0, sizeof=664                                                                                                                  |
| ndbcluster | NdbRecAttr            | created=829, free=829, sizeof=60                                                                                                               |
| ndbcluster | NdbApiSignal          | created=16, free=16, sizeof=136                                                                                                                |
| ndbcluster | NdbLabel              | created=0, free=0, sizeof=196                                                                                                                  |
| ndbcluster | NdbBranch             | created=0, free=0, sizeof=24                                                                                                                   |
| ndbcluster | NdbSubroutine         | created=0, free=0, sizeof=68                                                                                                                   |
| ndbcluster | NdbCall               | created=0, free=0, sizeof=16                                                                                                                   |
| ndbcluster | NdbBlob               | created=1, free=1, sizeof=264                                                                                                                  |
| ndbcluster | NdbReceiver           | created=2, free=0, sizeof=68                                                                                                                   |
| ndbcluster | binlog                | latest_epoch=1111500, latest_trans_epoch=1111392, latest_received_binlog_epoch=0, latest_handled_binlog_epoch=0, latest_applied_binlog_epoch=0 |

Tablespace layout before extending: 

mysql> SELECT LOGFILE_GROUP_NAME, FILE_NAME, EXTRA, TABLESPACE_NAME FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE';
+--------------------+------------+----------------+-----------------+
| LOGFILE_GROUP_NAME | FILE_NAME  | EXTRA          | TABLESPACE_NAME |
+--------------------+------------+----------------+-----------------+
| lg_1               | data_2.dat | CLUSTER_NODE=2 | ts_1            |
| lg_1               | data_2.dat | CLUSTER_NODE=3 | ts_1            |
| lg_1               | data_3.dat | CLUSTER_NODE=2 | ts_1            |
| lg_1               | data_3.dat | CLUSTER_NODE=3 | ts_1            |
| lg_1               | data_1.dat | CLUSTER_NODE=2 | ts_1            |
| lg_1               | data_1.dat | CLUSTER_NODE=3 | ts_1            |
+--------------------+------------+----------------+-----------------+

How to repeat:
Try to create two  NDB tablespaces of size 2G in close proximity.

Suggested fix:
If the first data have not propagated properly, maybe give an error message and suggest to wait. 
Did not expect the whole cluster to break.
[15 Dec 2009 15:53] MySQL Verification Team
Please run ndb_error_reporter to collect the config.ini file, cluster logs, node error logs and trace files and attach them to this bug report.
[15 Dec 2009 15:59] Mikael Carlson
Error report

Attachment: ndb_error_report_20091214233443.tar.gz (application/x-gzip, text), 290.67 KiB.

[15 Dec 2009 16:01] Mikael Carlson
Files attached. 

Will try to hunt the support forum how i can get my cluster back up and running. (its currently down).
[15 Dec 2009 16:16] Andrew Hutchings
Duplicate of bug #42934
[15 Dec 2009 16:20] Andrew Hutchings
Sorry, I mean dupe of bug #36702