MySQL Bugs: #4162: Node crashes when two instanses updates the same table and one of them aborts.

Bug #4162	Node crashes when two instanses updates the same table and one of them aborts.
Submitted:	16 Jun 2004 11:44	Modified:	20 Aug 2004 9:55
Reporter:	Lars Torstensson	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysqlcluster-4.1.2-3.4.9-alpha-pc-linux-	OS:	Linux (Redhat AS)
Assigned to:	Pekka Nousiainen	CPU Architecture:	Any

Description:
Node 1 went down when I made this sql update at the same time as one of our servers (C++) where doing a scan for all services where num_ip=<leased_num_ip
this scan was aborted, (B2 log).

SQL> update services set num_ip=5, leased_num_ip=5 where pop='upp1.se.bredband.com';                                              
Operation failed
[MySQL][ODBC driver][NDB Cluster]NDB-01100266 Time-out in NDB, probably caused by deadlock - at execute without commit (in SQLExecDirect)

Cluster log:
Jun 16 11:23:09 na-gw NDB[1727]: [MgmSrvr] Node 1: Local checkpoint 34393 started. Keep GCI = 146565 oldest restorable GCI = 146565
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 5: Node 1 Disconnected
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Lost connection to node 1
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: Arbitration check won - node group majority
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: President restarts arbitration thread [state=6]
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: GCP Take over started
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: GCP Take over completed
Jun 16 11:23:26 na-gw NDB[1727]: [MgmSrvr] Node 3: LCP Take over started

Error log:
Date/Time: Wednesday 16 June 2004 - 11:23:14
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: DbtupIndex.cpp
Object of reference: DBTUP (Line: 259) 0x0000000a
ProgramName: NDB Kernel
ProcessID: 1806
TraceFile: NDB_TraceFile_5.trace

node1.out:
2004-06-14 14:37:13 [NDB] INFO     -- Node restart completed copying the fragments to Node 2
2 - endTakeOver
Error handler shutting down system
Error handler shutdown completed - exiting
2004-06-16 11:24:08 [NDB] INFO     -- Ndb has terminated (pid 1806) restarting
2004-06-16 11:24:08 [NDB] INFO     -- Angel pid: 1782 ndb pid: 9644
2004-06-16 11:24:08 [NDB] INFO     -- NDB Cluster -- DB node 1
2004-06-16 11:24:08 [NDB] INFO     -- Version 3.4.9 (alpha) --

B2 log:
Jun 16 11:23:06 nl-fe2 nexus[30771]: DBmaint-5-DB: db_check_services start
Jun 16 11:23:06 nl-fe2 nexus[30771]: DBmaint-5-DB: db_scan_services start building vector
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-5-SYSTEM: Got SIG-TERM, shutting down...
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-5-DB: db_scan_services done vector size 5199
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-0-SYSTEM: System SHUTDOWN
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-5-SYSTEM: Free memory

How to repeat:
Above

Tracefile from node 1

Attachment: NDB_TraceFile_5.trace.tar.gz (application/x-gzip-compressed, text), 52.39 KiB.

1) scan update
2) scan

abort 2)

Uanable to reproduce, seems to be gone for 4.1.4