Bug #19743 temporary error 4010 'Cause cluster to crash'
Submitted: 11 May 2006 21:06 Modified: 17 May 2006 20:33
Reporter: Jonathan Miller Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.11 OS:Linux (Linux 32 Bit OS)
Assigned to: CPU Architecture:Any

[11 May 2006 21:06] Jonathan Miller
Description:
Loading DBT2 in a 2 datanode 2 replica database using Disk Data produced the following errors:

Loading of DBT2 dataset located in /space/var/ to database dbt2.

DB_ENGINE:      NDBDD
DB_SCHEME:      ORIG
DB_HOST:        localhost
DB_USER:        root
DB_SOCKET:      /tmp/mysql.sock

Creating table STOCK
Creating table ITEM
Creating table ORDER_LINE
Creating table ORDERS
Creating table NEW_ORDER
Creating table HISTORY
Creating table CUSTOMER
Creating table DISTRICT
Creating table WAREHOUSE

Loading table customer
ERROR 1297 (HY000) at line 1: Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
ERROR: rc=1
SCRIPT INTERRUPTED

Which caused the other data node to die as well. The odd issues here is that the datanode giving the error never died but was no longer connected to the management server.

system with temp error
15299 ?        00:00:09 ndb_mgmd
15314 ?        00:00:00 ndbd
15315 ?        00:09:35 ndbd

Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from 08) <- system with temp error
id=3    @10.100.1.92  (Version: 5.1.11, starting, Nodegroup: 0) <- Second system that I restarted

So I killed the ndbd processes and restarted them on 08 and then the data node on 07 died again. restarting the data node on 07 the data node on 08 died. Restarting the data node on 08 and the cluster came up again.

How to repeat:
Not sure.
[12 May 2006 18:54] Jonathan Miller
Was loading a memory version of DBT2 today. Got the following:

ERROR 1297 (HY000) at line 1: Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
ERROR: rc=1
SCRIPT INTERRUPTED

Looking at the processes ndbd still shows to be running, but it is not connected to the manager:

32051 ?        00:00:00 ndbd
32052 ?        00:35:10 ndbd

Connected to Management Server at: 08:14000
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from 08)
id=3 (not connected, accepting connect from 07)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @ndb08  (Version: 5.1.11)

The data node on 07 is gone. I will upload the error tar file.
[17 May 2006 20:33] Jonas Oreland
it's the same