Bug #28175 Slave Host loosing network connection w/ Slave Batching can cause NDBD failure
Submitted: 30 Apr 2007 21:57 Modified: 30 Apr 2007 21:57
Reporter: Jonathan Miller Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:mysql-5.1-telco-6.2 OS:Linux (32 Bit)
Assigned to: CPU Architecture:Any
Triage: Triaged: D2 (Serious)

[30 Apr 2007 21:57] Jonathan Miller
Description:
Hi,

Another scenario when the slave mysqld host looses network connection with slave cluster with using slave batching causes an NDBD failure.

MySQLD Log:

070430 21:07:55 [ERROR] NDB Binlog: ndbevent->execute failed for REPL$mysql/ndb_schema; 1421 Partially connected API in NdbOperation::execute()
070430 21:07:55 [ERROR] NDB Binlog:FAILED CREATE (DISCOVER) EVENT OPERATIONS Event: REPL$mysql/ndb_schema
2007-04-30 21:07:57 [NdbApi] ERROR    -- dropped GSN_SUB_TABLE_DATA due to wrong magic number
070430 21:07:59 [Note] Slave SQL thread initialized, starting replication in log 'ndb09.000003' at position 659709761, relay log './n12-relay-bin.000059' position: 1414240
070430 21:07:59 [Note] Slave I/O thread: connected to master 'rep@ndb09:3306',replication started in log 'n09.000003' at position 659709761
070430 21:07:59 [Warning] NDB Binlog: cluster has reconnected. Changes to the database that occured while disconnected will not be in the binlog
070430 21:08:04 [ERROR] Slave: Error in Write_rows event: row application failed, Error_code: 121
070430 21:08:04 [ERROR] Slave: Error in Write_rows event: error during transaction execution on table TPCB.trans, Error_code: 121
070430 21:10:24 [ERROR] Slave: Error 'Table 'history' is read only' in Write_rows event: when locking tables, Error_code: 1036
070430 21:10:24 [Warning] Slave: Table 'history' is read only Error_code: 1036
070430 21:10:24 [Warning] Slave: Unknown error Error_code: 1105
070430 21:10:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'n09.000003' position 663411308070430 21:10:25 [Warning] NDB Binlog: cluster has reconnected. Changes to the database that occured while disconnected will not be in the binlog

Ndbd Error:
Time: Monday 30 April 2007 - 21:08:01
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 2650) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 18228
Trace: /space/run/ndb_2_trace.log.1
Version: mysql-5.1.15 ndb-6.1.7-beta
***EOM***

--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 769272267 gsn: 316 "LQHKEYREQ" prio: 1
s.bn: 245 "DBTC", s.proc: 2, s.sigId: 769272165 length: 18 trace: 1 #sec: 0 fragInf: 0
 ClientPtr = H'0001919b hashValue = H'38660df6 tcBlockRef = H'00f50002
 transId1 = H'0001395d transId2 = H'00600500 savePointId = H'00000000
 Op: 4 Lock: 0 Flags: CommitAckMarker NoDisk ScanInfo/noFiredTriggers: H'23d
 AttrLen: 4 (4 in this) KeyLen: 1 TableId: 7 SchemaVer: 1
 FragId: 0 ReplicaNo: 0 LastReplica: 1 NextNodeId: 3
 ApiRef: H'80060005 ApiOpRef: H'00000340
 KeyInfo: H'0000047a
 AttrInfo: H'00000004 H'0000047a H'00010004 H'008d1780
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 769272266 gsn: 12 "TCKEYREQ" prio: 1
s.bn: 32774 "API", s.proc: 5, s.sigId: 0 length: 18 trace: 1 #sec: 0 fragInf: 0
 apiConnectPtr: H'00000023, apiOperationPtr: H'0000083c
 Operation: Write, Flags: NoDisk IgnoreError
 keyLen: 5, attrLen: 13, AI in this: 5, tableId: 13, tableSchemaVer: 1, API Ver: 263
 transId(1, 2): (H'0001395d, H'00600500)
 -- Variable Data --
 H'3062646e H'20203139 H'20202020 H'20202020 H'20202020 H'00000014 H'3062646e
 H'20203139 H'20202020 H'20202020

DBTUP   008150
DBLQH   002582 002650

How to repeat:
See above
[28 Nov 2007 23:50] Trudy Pelzer
D2/I4, lowering priority