Description:
nodes on slave server crash after intense db activity
one node crash with
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtc/DbtcMain.cpp
Error object: DBTC (Line: 8545) 0x0000000e
Program: /usr/mysql/libexec/ndbd
other node crash with
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 7010) 0x0000000e
Program: /usr/mysql/libexec/ndbd
storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp:7010
/**
* Only primary replica can get ZTUPLE_ALREADY_EXIST || ZNO_TUPLE_FOUND
*
* Unless it's a simple or dirty read
*
* NOT TRUE!
* 1) op1 - primary insert ok
* 2) op1 - backup insert fail (log full or what ever)
* 3) op1 - delete ok @ primary
* 4) op1 - delete fail @ backup
*
* -> ZNO_TUPLE_FOUND is possible
*/
ndbrequire
(tcPtr->seqNoReplica == 0 ||
errCode != ZTUPLE_ALREADY_EXIST ||
(tcPtr->operation == ZREAD && (tcPtr->dirtyOp || tcPtr->opSimple))); //7010
tcPtr->abortState = TcConnectionrec::ABORT_FROM_LQH;
abortCommonLab(signal);
storage/ndb/src/kernel/blocks/dblqh/dbtc/DbtcMain.cpp:8545
const Uint32 noOfLqhs = tmp.p->noOfLqhs;
ndbrequire(noOfLqhs < MAX_REPLICAS); //8545
tmp.p->lqhNodeId[noOfLqhs] = tnodeid;
tmp.p->noOfLqhs = (noOfLqhs + 1);
How to repeat:
Configuration:
1 PC 64bits with vmware running with 4 virtual machine
2 are running with ndb_mgm mysqld and our application
2 are running with ndb node
This machine is running "master"
A separate PC is running "slave" with the same configuration.
There is 2 replication flow
I want to test the replication with this table and procedure:
create table if not exists loadreptable ( nid INTEGER NOT NULL, nom CHAR(255), prenom CHAR(255), abc CHAR(255), wkz CHAR(255),xyz CHAR(255),
PRIMARY KEY USING HASH (nid) )
engine=ndb PARTITION BY KEY (nid);
delimiter //
CREATE PROCEDURE loadreplication (in p1 INT)
BEGIN
label1: LOOP
SET p1 = p1 - 1;
IF p1 < 0 THEN LEAVE label1;
END IF;
DELETE FROM loadreptable WHERE nid > 2;
UPDATE loadreptable SET nid=nid+1 ORDER BY nid DESC;
UPDATE loadreptable SET nom=\"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
INSERT INTO loadreptable VALUES(1,"wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww",
"tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt",
"yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
"kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk",
"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb");
END LOOP label1;
END;
//
delimiter ;
When I call loadreplication with 20 everything is right
When I call loadreplication with 200 the 2 nodes on slave side crash
Suggested fix:
n/a