Bug #12608 Cluster shutsdown during simultaneous recovery/commit
Submitted: 16 Aug 2005 17:31 Modified: 8 Sep 2005 9:00
Reporter: Partha Dutta Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1.13 OS:Linux (RedHat ES 4)
Assigned to: Jonas Oreland CPU Architecture:Any

[16 Aug 2005 17:31] Partha Dutta
Description:
During the middle of an uncommitted transaction, a simultaneous restart of an ndbd node followed by a commit from an API node crashes the cluster when the ndbd node is in Phase 5 recovery.

How to repeat:
Set up includes 4 ndbd nodes, 1 mgm node, and 2 mysql api nodes.
1) Create a table and insert 30,000 records into the table (any way is fine).
2) start transaction on API node
3) delete from table limit 20000;
4) When delete is finished (but uncommitted), perform a pkill -9 ndbd on any data node.
5) restart the failed node.
6) When node goes into phase 5 recovery, commit the transaction.

The server hangs for about 3-5 minutes, then finally gets an error:
ERROR 1297 (HY000): Got temporary error 4010 'Node failure caused abort of transaction' from ndbcluster.  All of the ndbd nodes shut down at the same time.

Suggested fix:
Have a mechanism to either wait until recovery is fully complete before applying the yet to be committed transaction, or abort the current transaction.  Preferably, pause would be best.
[16 Aug 2005 17:34] Partha Dutta
cluster configuration

Attachment: config.ini (application/octet-stream, text), 543 bytes.

[16 Aug 2005 18:11] Kai Voigt
I could reproduce the error on a single machine running MacOSX and MySQL 5.0.10-beta. Here's my stripped down config file.

[ndbd default]
datadir=/usr/local/mysql/cluster/   # Datenverzeichnis
DataMemory=50M
noofreplicas=2                    # Anzahl der Storage-Knoten
[ndb_mgmd]
datadir=/usr/local/mysql/cluster/   # Datenverzeichnis
hostname=127.0.0.1
[ndbd]
[ndbd]
[ndbd]
[ndbd]
[mysqld]
[mysqld]                                                                        

The following table helped to reproduce the bug.

create table partha (id int primary key not null auto_increment, value char(255)) engine=ndb;

The populated data was created by issueing the following command 30.000 times.

insert into partha values (NULL, 'foo');
[1 Sep 2005 20:25] Jonas Oreland
Hi,

Can you please supply error log and trace file from crash?

/Jonas
[2 Sep 2005 9:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/29234
[5 Sep 2005 4:48] Jonas Oreland
Pushed into 4.1.15 and 5.0.13
[8 Sep 2005 9:00] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix for 4.1.15 and 5.0.3. Closed.