Bug #77701 NDB ArrayPool<T>::getPtr Error 2301 Assertion
Submitted: 13 Jul 2015 12:38 Modified: 3 Dec 2015 18:40
Reporter: Carl Krumins Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:7.3.9 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: NDB 2301 Assertion ArrayPool getPtr

[13 Jul 2015 12:38] Carl Krumins
Description:
4 Nodes running 7.3.9
Both (2) nodes in same nodegroup crash at same time with same message:

Time: Monday 13 July 2015 - 12:38:12
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: /export/home/pb2/build/sb_0-14884623-1427967983.18/mysql-cluster-gpl-7.3.9/storage/ndb/src/kernel/vm/ArrayPool.hpp line: 493
Program: ndbmtd
Pid: 860088 thr: 10
Version: mysql-5.6.24 ndb-7.3.9
Trace: /data/mysqlcluster//ndb_4_trace.log.15 [t1..t17]
***EOM***

Other nodegroup dies because insufficient nodes to maintain a complete cluster.

One Nodegroup
2015-07-13 13:28:15 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug).
 Temporary error, restart node'.

Other Nodegroup
2015-07-13 13:29:20 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please invest
igate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

How to repeat:
Not sure

Nodes re-crash every 18-36 hours constantly without a traceable reason at this stage. Has crashed the cluster many times with exact same nodegroup and same error. 

I previously reported fixed bug in 7.3.6 with:
Employing a CHAR column that used the UTF8 character set as a table's primary key column led to node failure when restarting data nodes. Attempting to restore a table with such a primary key also caused ndb_restore to fail. (Bug #16895311, Bug #68893)

Some previous reports suggest with this ArrayPool<T>::getPtr from many years ago unresolved might suggest UTF8 UNIQUE KEY index's but we don't have any UNIQUE UTF8 keys in our schema that match old similar bug reports. Old reports were with very old versions so this is a report with released latest version.

Suggested fix:
Not sure
Will upload ndb error reporter in a minute.
[13 Jul 2015 12:50] Carl Krumins
ndb_error_reporter generated file and is named:
mysql-bug-data-77701.tar.bz2
and has been uploaded to the requested sftp.oracle.com as per instructions.
[13 Jul 2015 13:54] MySQL Verification Team
Hi,

Thanks for the report and the logs.

kind regards
Bogdan Kecman
[30 Jul 2015 3:45] Carl Krumins
Have replicated same bug in latest 7.4.7 with separate bug and trace files in bug report #77864
[30 Jul 2015 10:07] MySQL Verification Team
Bug #77864 marked as duplicate of this
[1 Sep 2015 19:30] Carl Krumins
Do you have an approx estimated time for the next 7.4.8 release schedule?
[1 Sep 2015 19:48] MySQL Verification Team
oracle policy does not allow any "future predictions" about when release will happen.
[26 Oct 2015 15:29] Carl Krumins
Changelog for MySQL 5.6.27 says: 
"An assertion could be raised due to incorrect error handling if a SELECT ... FOR UPDATE subquery resulted in deadlock and caused a rollback. (Bug #21096444)"

Is this fix in 5.6.27 changelog related to this bug #77701 (and duplicate bug #77864) described above and can this be closed? Or is this unfixed yet?
[3 Nov 2015 18:40] MySQL Verification Team
Hi,
yes, that bug is fixed in 5.6.27 and 5.7.9, but if it is related to this problem I can't say for sure. Since you have ability to reproduce this problem upgrading to 5.6.27+ based system would easily confirm if that patch solves this bug too or not. 

kind regards
Bogdan Kecman
[4 Dec 2015 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".