MySQL Bugs: #50118: complete Cluster crash, lost data

Bug #50118	complete Cluster crash, lost data
Submitted:	6 Jan 2010 21:05	Modified:	18 Jan 2010 15:27
Reporter:	Stefan Auweiler	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	5.1.35-ndb-7.0.7-cluster-gpl	OS:	Solaris (5.10 x86)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	cluster, crash, mysqld

Description:
While 'truncate acctsessions' my client reported an error

The table had 11 rows at that time, I have been the only user. It is a test environment, going live End of January...  (if we can fix it...)

After the crash, the table was gone on mysqld on node1, even in the file system. All other nodes had the table in their file system, but it was nocht mentioned in 'show tables'. it war eassy to do a 'create table....'.

scanning the log files, I found:
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)

I found this in your bug report System, but it seed to be solved including 5.0.45 and one shoud report at the next occurence...  here I am.

M Systems:
mysql-cluster-gpl-7.0.7-solaris10-x86_64
Version: 5.1.35-ndb-7.0.7-cluster-gpl-log (MySQL Cluster Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock

Hardware:

SUN 6 times x4600, 64GB RAM
System = SunOS
Node = rn2bbbsv056
Release = 5.10
KernelID = Generic_139556-08
Machine = i86pc
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 16

I'm running 3 Nodegroups havind 2 replicas.
While initially installing, we figured, that the box of node 5 only saw 16GB of ram, so we opened a case at SUN for this. But anyway, we started the cluster using the 3 nodegroups...  maynbe this leads to a solution.

I'v created a ndb_erroro report, but am unable to download it to my the current location. If needed, I can attach ist to the case tomorrow.

Here are the error logs of the 5 running nodes:

Time: Wednesday 6 January 2010 - 19:22:17
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 8440) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 734
Trace: /DB/mysql/data/ndb_3_trace.log.2 /DB/mysql/data/ndb_3_trace.log.2_t1 /DB/mysql/data/ndb_3_trace.log.2_t2 /DB/mysql/data/ndb_3_trace.log.2_t3 /DB/mysql/data

Time: Wednesday 6 January 2010 - 18:26:11
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 5423) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2971
Trace: /DB/mysql/data/ndb_4_trace.log.1 /DB/mysql/data/ndb_4_trace.log.1_t1 /DB/mysql

Time: Wednesday 6 January 2010 - 18:23:56
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 8440) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2636
Trace: /DB/mysql/data/ndb_6_trace.log.1 /DB/mysql/data/ndb_6_trace.log.1_t1 /DB/mysql/data/ndb_6_trace.log.1_t2 /DB/mysql/data/ndb_6_trace.log.1_t3 /DB/mysql/dat

Time: Wednesday 6 January 2010 - 18:23:01
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 5423) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2563
Trace: /DB/mysql/data/ndb_7_trace.log.1 /DB/mysql/data/ndb_7_trace.log.1_t1 /DB/mysql

Time: Wednesday 6 January 2010 - 18:22:38
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 8440) 0x0000000a
Program: /usr/local/mysqlCluster/mysql/bin/ndbmtd
Pid: 2763
Trace: /DB/mysql/data/ndb_8_trace.log.1 /DB/mysql/data/ndb_8_trace.log.1_t1 /DB/mysql/data/ndb_8_trace.log.1_t2 /DB/mysql/data/ndb_8_trace.log.1_t3 /DB/mysql/dat

Thank You for your support.
Regards Stefan

How to repeat:
no idea. It happened once.

please upload the ndb_X_trace files too
(or use ndb_error_reporter which will gather all logs)

I've added files to the FTP Upload folder, containing the bug nr in their names:

bug-data-50118.README

has some more information on the files.

Regards Stefan

The uploaded ndb_error_report file is empty.

Yes,

but for that reason, I uploaded the files via FTP, as described in your HowTo's.

Did you find these files?

Thanks

hillbilly% cat bug-data-50118.split.a* > ndb_error_report_20100106200738.tar.bz2

and unpacking it worked fine.

/Gustaf

reproduced.
the problem is related to index being dropped while transactions ongoing.
this is likely a regression from 6.3 (but havent tested yet)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97258

3081 Jonas Oreland	2010-01-18
      ndb - bug#50118 commit testcase to 6.3

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97263

3342 Jonas Oreland	2010-01-18
      ndb - bug#50118 - make sure that index op is rejected when index is dropped

Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20100118144630-9wodo8gd5m0ss3bl) (version source revid:jonas@mysql.com-20100118144630-9wodo8gd5m0ss3bl) (merge vers: 5.1.41-ndb-7.0.11) (pib:16)

Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20100118144759-0o14v1e8cwx70cj4) (version source revid:jonas@mysql.com-20100118144759-0o14v1e8cwx70cj4) (merge vers: 5.1.41-ndb-7.1.0) (pib:16)

pushed to 7.0.11 (and test prg also to 6.3)
docs: dropping unique indexes in parallel with using them could
  cause node (cluster) crash

DOcumented bugfix in the NDB-6.3.31 and 7.0.11 changelogs as follows:

        Dropping unique indexes in parallel while they were in use could
        cause node and cluster failures.

Closed.