Bug #4162 Node crashes when two instanses updates the same table and one of them aborts.
Submitted: 16 Jun 2004 11:44 Modified: 20 Aug 2004 9:55
Reporter: Lars Torstensson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysqlcluster-4.1.2-3.4.9-alpha-pc-linux- OS:Linux (Redhat AS)
Assigned to: Pekka Nousiainen CPU Architecture:Any

[16 Jun 2004 11:44] Lars Torstensson
Description:
Node 1 went down when I made this sql update at the same time as one of our servers (C++) where doing a scan for all services where num_ip=<leased_num_ip
this scan was aborted, (B2 log).

SQL> update services set num_ip=5, leased_num_ip=5 where pop='upp1.se.bredband.com';                                              
Operation failed
[MySQL][ODBC driver][NDB Cluster]NDB-01100266 Time-out in NDB, probably caused by deadlock - at execute without commit (in SQLExecDirect)

Cluster log:
Jun 16 11:23:09 na-gw NDB[1727]: [MgmSrvr] Node 1: Local checkpoint 34393 started. Keep GCI = 146565 oldest restorable GCI = 146565
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 5: Node 1 Disconnected
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Lost connection to node 1
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: Arbitration check won - node group majority
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: President restarts arbitration thread [state=6]
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: GCP Take over started
Jun 16 11:23:14 na-gw NDB[1727]: [MgmSrvr] Node 3: GCP Take over completed
Jun 16 11:23:26 na-gw NDB[1727]: [MgmSrvr] Node 3: LCP Take over started

Error log:
Date/Time: Wednesday 16 June 2004 - 11:23:14
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: DbtupIndex.cpp
Object of reference: DBTUP (Line: 259) 0x0000000a
ProgramName: NDB Kernel
ProcessID: 1806
TraceFile: NDB_TraceFile_5.trace

node1.out:
2004-06-14 14:37:13 [NDB] INFO     -- Node restart completed copying the fragments to Node 2
2 - endTakeOver
Error handler shutting down system
Error handler shutdown completed - exiting
2004-06-16 11:24:08 [NDB] INFO     -- Ndb has terminated (pid 1806) restarting
2004-06-16 11:24:08 [NDB] INFO     -- Angel pid: 1782 ndb pid: 9644
2004-06-16 11:24:08 [NDB] INFO     -- NDB Cluster -- DB node 1
2004-06-16 11:24:08 [NDB] INFO     -- Version 3.4.9 (alpha) --

B2 log:
Jun 16 11:23:06 nl-fe2 nexus[30771]: DBmaint-5-DB: db_check_services start
Jun 16 11:23:06 nl-fe2 nexus[30771]: DBmaint-5-DB: db_scan_services start building vector
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-5-SYSTEM: Got SIG-TERM, shutting down...
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-5-DB: db_scan_services done vector size 5199
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-0-SYSTEM: System SHUTDOWN
Jun 16 11:23:26 nl-fe2 nexus[30771]: DBmaint-5-SYSTEM: Free memory

How to repeat:
Above
[16 Jun 2004 11:45] Lars Torstensson
Tracefile from node 1

Attachment: NDB_TraceFile_5.trace.tar.gz (application/x-gzip-compressed, text), 52.39 KiB.

[16 Jun 2004 11:51] Jonas Oreland
1) scan update
2) scan

abort 2)
[20 Aug 2004 9:55] Martin Skold
Uanable to reproduce, seems to be gone for 4.1.4