Bug #50118 complete Cluster crash, lost data
Submitted: 6 Jan 2010 21:05 Modified: 18 Jan 2010 15:27
Reporter: Stefan Auweiler Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.35-ndb-7.0.7-cluster-gpl OS:Solaris (5.10 x86)
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: cluster, crash, mysqld

[6 Jan 2010 21:05] Stefan Auweiler
Description:
While 'truncate acctsessions' my client reported an error

The table had 11 rows at that time, I have been the only user. It is a test environment, going live End of January...  (if we can fix it...)

After the crash, the table was gone on mysqld on node1, even in the file system. All other nodes had the table in their file system, but it was nocht mentioned in 'show tables'. it war eassy to do a 'create table....'.

scanning the log files, I found:
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)

I found this in your bug report System, but it seed to be solved including 5.0.45 and one shoud report at the next occurence...  here I am.

M Systems:
mysql-cluster-gpl-7.0.7-solaris10-x86_64
Version: 5.1.35-ndb-7.0.7-cluster-gpl-log (MySQL Cluster Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock

Hardware:

SUN 6 times x4600, 64GB RAM
System = SunOS
Node = rn2bbbsv056
Release = 5.10
KernelID = Generic_139556-08
Machine = i86pc
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 16

I'm running 3 Nodegroups havind 2 replicas.
While initially installing, we figured, that the box of node 5 only saw 16GB of ram, so we opened a case at SUN for this. But anyway, we started the cluster using the 3 nodegroups...  maynbe this leads to a solution.

I'v created a ndb_erroro report, but am unable to download it to my the current location. If needed, I can attach ist to the case tomorrow.

Here are the error logs of the 5 running nodes:

Time: Wednesday 6 January 2010 - 19:22:17
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 8440) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 734
Trace: /DB/mysql/data/ndb_3_trace.log.2 /DB/mysql/data/ndb_3_trace.log.2_t1 /DB/mysql/data/ndb_3_trace.log.2_t2 /DB/mysql/data/ndb_3_trace.log.2_t3 /DB/mysql/data

Time: Wednesday 6 January 2010 - 18:26:11
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 5423) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2971
Trace: /DB/mysql/data/ndb_4_trace.log.1 /DB/mysql/data/ndb_4_trace.log.1_t1 /DB/mysql

Time: Wednesday 6 January 2010 - 18:23:56
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 8440) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2636
Trace: /DB/mysql/data/ndb_6_trace.log.1 /DB/mysql/data/ndb_6_trace.log.1_t1 /DB/mysql/data/ndb_6_trace.log.1_t2 /DB/mysql/data/ndb_6_trace.log.1_t3 /DB/mysql/dat

Time: Wednesday 6 January 2010 - 18:23:01
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 5423) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2563
Trace: /DB/mysql/data/ndb_7_trace.log.1 /DB/mysql/data/ndb_7_trace.log.1_t1 /DB/mysql

Time: Wednesday 6 January 2010 - 18:22:38
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 8440) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2763
Trace: /DB/mysql/data/ndb_8_trace.log.1 /DB/mysql/data/ndb_8_trace.log.1_t1 /DB/mysql/data/ndb_8_trace.log.1_t2 /DB/mysql/data/ndb_8_trace.log.1_t3 /DB/mysql/dat

Thank You for your support.
Regards Stefan

How to repeat:
no idea. It happened once.
[6 Jan 2010 22:00] Jonas Oreland
please upload the ndb_X_trace files too
(or use ndb_error_reporter which will gather all logs)
[7 Jan 2010 10:13] Stefan Auweiler
I've added files to the FTP Upload folder, containing the bug nr in their names:

bug-data-50118.README

has some more information on the files.

Regards Stefan
[11 Jan 2010 11:02] Martin Skold
The uploaded ndb_error_report file is empty.
[11 Jan 2010 18:22] Stefan Auweiler
Yes,

but for that reason, I uploaded the files via FTP, as described in your HowTo's.

Did you find these files?

Thanks
[12 Jan 2010 13:54] Gustaf Thorslund
hillbilly% cat bug-data-50118.split.a* > ndb_error_report_20100106200738.tar.bz2

and unpacking it worked fine.

/Gustaf
[18 Jan 2010 13:09] Jonas Oreland
reproduced.
the problem is related to index being dropped while transactions ongoing.
this is likely a regression from 6.3 (but havent tested yet)
[18 Jan 2010 14:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97258

3081 Jonas Oreland	2010-01-18
      ndb - bug#50118 commit testcase to 6.3
[18 Jan 2010 14:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97263

3342 Jonas Oreland	2010-01-18
      ndb - bug#50118 - make sure that index op is rejected when index is dropped
[18 Jan 2010 14:49] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20100118144630-9wodo8gd5m0ss3bl) (version source revid:jonas@mysql.com-20100118144630-9wodo8gd5m0ss3bl) (merge vers: 5.1.41-ndb-7.0.11) (pib:16)
[18 Jan 2010 14:49] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20100118144759-0o14v1e8cwx70cj4) (version source revid:jonas@mysql.com-20100118144759-0o14v1e8cwx70cj4) (merge vers: 5.1.41-ndb-7.1.0) (pib:16)
[18 Jan 2010 14:53] Jonas Oreland
pushed to 7.0.11 (and test prg also to 6.3)
docs: dropping unique indexes in parallel with using them could
  cause node (cluster) crash
[18 Jan 2010 15:27] Jon Stephens
DOcumented bugfix in the NDB-6.3.31 and 7.0.11 changelogs as follows:

        Dropping unique indexes in parallel while they were in use could
        cause node and cluster failures.

Closed.