Bug #56929 Error 901 'Inconsistent ordered index' crashes cluster
Submitted: 22 Sep 2010 12:33 Modified: 22 Sep 2010 14:02
Reporter: Joffrey MICHAIE Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-7.1, 7.3.3 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: 7.1.5, 7.3.3, cluster, ordered index
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[22 Sep 2010 12:33] Joffrey MICHAIE
Description:
Hi Team,

while performing many REPLACE statement on the SAME row of a small table within a transaction, all data nodes are crashing.

This has been tested on 6.3.27, 7.1.3 and 7.1.5, running on MaxOsX 10.5, CentOs 5, and Debian (all 64 bits versions) with ndbd and ndbmtd

*** MySQL will get error :

ERROR 1296 (HY000) at line 4: Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' FROM NDBCLUSTER

*** Data node 1 will get error :

2010-09-22 14:30:16 [ndbd] INFO     -- dbtup/DbtupTrigger.cpp
2010-09-22 14:30:16 [ndbd] INFO     -- DBTUP (Line: 1623) 0x00000002
2010-09-22 14:30:16 [ndbd] INFO     -- Error handler shutting down system
2010-09-22 14:30:16 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-09-22 14:30:17 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

*** Data node 2 will get error :

2010-09-22 14:30:11 [ndbd] INFO     -- dbtup/DbtupTrigger.cpp
2010-09-22 14:30:11 [ndbd] INFO     -- DBTUP (Line: 1623) 0x00000002
2010-09-22 14:30:11 [ndbd] INFO     -- Error handler shutting down system
2010-09-22 14:30:11 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-09-22 14:30:12 [ndbd] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

BR,
Joffrey

How to repeat:
Run cluster with default settings + 
DataMemory = 64M
IndexMemory = 16M
MaxNoOfConcurrentOperations=200000

Run mysqld with default settings

CREATE TABLE `sequences` (
  `seq_id` int(11) NOT NULL,
  `seq_no` bigint(30) NOT NULL,
  PRIMARY KEY (`seq_id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

DELIMITER //
CREATE FUNCTION foo(sequence BIGINT) RETURNS TINYINT
BEGIN
WHILE sequence > 0 
DO REPLACE INTO sequences (seq_id,seq_no) VALUES (1,sequence);
SET sequence = sequence - 1; END WHILE; 
RETURN 1;
END //
DELIMITER ;

START TRANSACTION;
SELECT foo(100000);

Suggested fix:
Removing the ordered index on the primary key solves the problem :

Restart crashed nodes

ALTER TABLE sequences DROP PRIMARY KEY, ADD PRIMARY KEY (seq_id) USING HASH;

START TRANSACTION;
SELECT foo(100000);
[22 Sep 2010 14:45] Jonas Oreland
Hi Joffrey

I can already now see that the problem is that our ordered index implementation
has a limitation on 65535 updates to same row within one transaction.
(or it might be 32768...not 100% sure)

Apparently the code that checks this is faulty.
So the fix will be to fix to check of the condition and
emit a proper error code.

To remove the limitation is a big undertaking, and will likely
not be done shortly.

You're most welcome to try foo(32767), foo(65534) and see if that works.

/Jonas
[23 Sep 2010 8:11] Joffrey MICHAIE
Hi Jonas,

thank you for the explaination,

the problem appears more or less at 32k updated records with this REPLACE.

/Joffrey
[12 Sep 2012 10:19] Hartmut Holzgraefe
> Apparently the code that checks this is faulty.
> So the fix will be to fix to check of the condition and
> emit a proper error code.

Nodes are still crashing in the very same way now two years later in 7.2.7 so the faulty error handling still seems to be in place?
[14 Jan 2013 9:31] Umesh Shastry
http://bugs.mysql.com/bug.php?id=60579 marked as duplicate of this one.
[14 Jan 2013 9:49] Umesh Shastry
Still repeatable on latest GA (7.1.24) with the initial test case:

[ushastry@cluster-repo mysql-cluster-com-7_1_24]$ bin/mysql -u root -p test
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.63-ndb-7.1.24-cluster-com MySQL Cluster Server (Commercial)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
mysql>
mysql> CREATE TABLE `sequences` (
    ->   `seq_id` int(11) NOT NULL,
    ->   `seq_no` bigint(30) NOT NULL,
    ->   PRIMARY KEY (`seq_id`)
    -> ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.87 sec)

mysql>
mysql> DELIMITER //
mysql> CREATE FUNCTION foo(sequence BIGINT) RETURNS TINYINT
    -> BEGIN
    -> WHILE sequence > 0
    -> DO REPLACE INTO sequences (seq_id,seq_no) VALUES (1,sequence);
    -> SET sequence = sequence - 1; END WHILE;
    -> RETURN 1;
    -> END //
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;
mysql>
mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT foo(100000);
ERROR 1296 (HY000): Got error 901 'Inconsistent ordered index. The index needs to be dropped and recreated' from NDBCLUSTER
mysql>

## 

ndb_mgm> Node 5: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_mgm> Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

### out.log

2013-01-15 07:48:45 [ndbd] INFO     -- Watchdog: User time: 770  System time: 248
2013-01-15 07:48:45 [ndbd] INFO     -- dbtup/DbtupTrigger.cpp
2013-01-15 07:48:45 [ndbd] INFO     -- DBTUP (Line: 2036) 0x00000006
2013-01-15 07:48:45 [ndbd] INFO     -- Error handler shutting down system
2013-01-15 07:48:45 [ndbd] INFO     -- Error handler shutdown completed - exiting
2013-01-15 07:48:45 [ndbd] ALERT    -- Node 5: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming e
rror or missing error message, please report a bug). Temporary error, restart node'.

Current byte-offset of file-pointer is: 568

Time: Tuesday 15 January 2013 - 07:48:45
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtupTrigger.cpp
Error object: DBTUP (Line: 2036) 0x00000006
Program: ndbd
Pid: 19658
Version: mysql-5.1.63 ndb-7.1.24
Trace: /data/ushastry/cluster/mysql-cluster-com-7_1_24/cluster-data/ndb_5_trace.log.1 [t1..t1]
***EOM***

## traces

DBTUP   032034 002036

--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 5, r.sigId: 1201866 gsn: 195 "ABORT" prio: 1
s.bn: 245 "DBTC", s.proc: 5, s.sigId: 1201865 length: 4 trace: 1 #sec: 0 fragInf: 0
 H'00031548 H'00f50005 H'00000005 H'00700600
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 5, r.sigId: 1201865 gsn: 15 "TCROLLBACKREQ" prio: 1
s.bn: 32775 "API", s.proc: 6, s.sigId: 0 length: 3 trace: 1 #sec: 0 fragInf: 0
 H'00000023 H'00000005 H'00700600
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 5, r.sigId: 1201864 gsn: 315 "LQHKEYREF" prio: 1
s.bn: 247 "DBLQH", s.proc: 5, s.sigId: 1201863 length: 5 trace: 1 #sec: 0 fragInf: 0
Signal data: H'00039548 H'00039548 H'00000385 H'00000005 H'00700600
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 5, r.sigId: 1201863 gsn: 435 "TUPKEYREF" prio: 1
s.bn: 249 "DBTUP", s.proc: 5, s.sigId: 1201862 length: 2 trace: 1 #sec: 0 fragInf: 0
Signal data: H'00008000 H'00000385
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 5, r.sigId: 1201862 gsn: 316 "LQHKEYREQ" prio: 1
s.bn: 245 "DBTC", s.proc: 5, s.sigId: 1201861 length: 13 trace: 1 #sec: 2 fragInf: 0
 ClientPtr = H'00039548 hashValue = H'f07e5de3 tcBlockRef = H'00f50005
 transId1 = H'00000005 transId2 = H'00700600 savePointId = H'00008000
 Op: 4 Lock: 0 Flags: NoDisk ScanInfo/noFiredTriggers: H'8
 AttrLen: 0 (0 in this) KeyLen: 0 TableId: 7 SchemaVer: 1
 FragId: 1 ReplicaNo: 0 LastReplica: 1 NextNodeId: 4
 ApiRef: H'80070006 ApiOpRef: H'00000014
 AttrInfo:
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 5, r.sigId: 1201861 gsn: 12 "TCKEYREQ" prio: 1
s.bn: 32775 "API", s.proc: 6, s.sigId: 0 length: 8 trace: 1 #sec: 2 fragInf: 0
 apiConnectPtr: H'00000023, apiOperationPtr: H'00000014
 Operation: Write, Flags: Execute NoDisk IgnoreError
 keyLen: 0, attrLen: 0, AI in this: 0, tableId: 7, tableSchemaVer: 1, API Ver: 280
 transId(1, 2): (H'00000005, H'00700600)
 -- Variable Data --
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 5, r.sigId: 1201860 gsn: 342 "PACKED_SIGNAL" prio: 1
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: -1 length: 7 trace: 1 #sec: 0 fragInf: 0
Signal data: H'40007fff H'00039547 H'00f70004 H'00000000 H'00000005 H'00700600 H'00000000
--------- Begin Packed Signals --------
--------------- Signal ----------------
r.bn: 245 "DBTC", length: 7 "LQHKEYCONF"
Signal data: H'40007fff H'00039547 H'00f70004 H'00000000 H'00000005 H'00700600 H'00000000
--------- End Packed Signals ----------
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 5, r.sigId: 1201859 gsn: 316 "LQHKEYREQ" prio: 1
s.bn: 245 "DBTC", s.proc: 5, s.sigId: 1201857 length: 13 trace: 1 #sec: 2 fragInf: 0
 ClientPtr = H'00039547 hashValue = H'f07e5de3 tcBlockRef = H'00f50005
[31 Oct 2013 10:37] Hartmut Holzgraefe
Still reproducible in 7.2.13 and 7.3.2 :(
[20 Nov 2013 1:52] 李 源錫
Still was reproduced in mysql5.6.14-ndb7.3.3
[21 Nov 2013 12:32] Umesh Shastry
Bug #70955 marked as duplicate of this one.
[21 Nov 2013 12:33] Umesh Shastry
Bug #70435 marked as duplicate of this one.
[8 Dec 2014 7:39] Umesh Shastry
Bug #73030 marked as duplicate of this
[22 Dec 2014 8:28] Сергей Кукуев
Hi!

The same thing with mysql-5.6.21 ndb-7.3.7
[22 Dec 2014 8:29] Сергей Кукуев
Error report from 7.3.7

Attachment: ndb_error_report_20141222112238.tar.bz2 (application/octet-stream, text), 1.19 MiB.

[22 Dec 2014 8:59] Сергей Кукуев
Probably this bug related to bug #2107
http://bugs.mysql.com/bug.php?id=2107

After node restart after node crush after error:

Error: 2341
Error data: DbtupTrigger.cpp
Error object: DBTUP (Line: 2184) 0x00000002

I tried drop database:

DROP DATABASE BS_SANITY;
show warnings;
show errors;

# Level, Code, Message
Error, 1051, Unknown table 'BS_SANITY.BL_PAnumbers,BS_SANITY.BL_TariffPlansDict,BS_SANITY.TL_Factors,BS_SANITY.SE_Entities,BS_SA'
[22 Dec 2014 9:01] Сергей Кукуев
Sorry in previous comment I specified wrong bug

RIGHT one is bug #21078