MySQL Bugs: #25389: Crash on ArrayPool<T>::getPtr

Bug #25389	Crash on ArrayPool<T>::getPtr
Submitted:	3 Jan 2007 15:04	Modified:	21 Sep 2007 22:51
Reporter:	Kris Buytaert (Candidate Quality Contributor)	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Disk Data	Severity:	S3 (Non-critical)
Version:	5.1.14, 5.1.12	OS:	Linux (Linux)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Active cluster running with disk based storage enabled.

Suddenly node crashes

2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2007-01-03 14:31:13 [MgmSrvr] ALERT    -- Node 4: Node 3 Disconnected
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: Communication to Node 3 closed
2007-01-03 14:31:13 [MgmSrvr] ALERT    -- Node 4: Network partitioning - arbitration required
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: President restarts arbitration thread [state=7]
2007-01-03 14:31:13 [MgmSrvr] ALERT    -- Node 4: Arbitration won - positive reply from node 1
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: GCP Take over started
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue:
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: LCP Take over started
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: ParticipatingDIH = 0000000000000000
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: ParticipatingLQH = 0000000000000000
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 1
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: GCP Take over completed
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: LCP Take over completed (state = 4)
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: ParticipatingDIH = 0000000000000000
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: ParticipatingLQH = 0000000000000000
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2007-01-03 14:31:13 [MgmSrvr] INFO     -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 1
2007-01-03 14:31:13 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2301: 'Assertion
(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2007-01-03 14:31:14 [MgmSrvr] INFO     -- Node 4: Started arbitrator node 1 [ticket=3e3e0002e85ef2db]
2007-01-03 14:31:32 [MgmSrvr] INFO     -- Node 4: Communication to Node 3 opened
2007-01-03 14:31:35 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2007-01-03 14:31:35 [MgmSrvr] ALERT    -- Node 4: Forced node shutdown completed. Initiated by signal 0. Caused by error 2301: 'Assertion
(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Node error: 

Current byte-offset of file-pointer is: 568                       

Time: Wednesday 3 January 2007 - 14:31:12
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 333 (block: DBTUP)
Program: ndbd
Pid: 26537
Trace: /var/lib/mysql/mysql-cluster//ndb_3_trace.log.1
Version: Version 5.1.12 (beta)
***EOM***

Shortly afterwards second node crashes with identical error.

Tracelogs attached.

How to repeat:
not reproduced yet

Trace of an ndbd crash

Attachment: traces.log.tar.gz (application/x-gzip, text), 125.18 KiB.

Hi,

A number of fixes has been made in 5.1.14

Is it possible for you to test this version ?

/Jonas

Please, try to repeat with a newer version, 5.1.14, and inform about the results.

I am able to reproduce this using 5.1.14, but using TEXT columns. As my test setups were in chroot jails, ndb_error_reporter doesn't work. However, the following was the error in the node's errorlog:

Time: Monday 29 January 2007 - 15:13:46
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 396 (block: DBLQH)
Program: ./ndbd
Pid: 28725
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.1
Version: Version 5.1.14 (beta)
***EOM***

This occurs right at startup. If I changed my one TEXT column to VARCHAR(2000), I could not replicate. This also only occurs if there is actual data in the TEXT column. If the table is empty, then there is no problem.

I'm seeing similar behavior.  I too had data in the table that I was trying to update.  Inserts work fine.  I'll attach the output from ndb_error_reporter.

The statement that caused the crash: update IndexEntry set text = "Paris Bluegreen" where id=34225

The DDL for the table:
CREATE 
    TABLE IndexEntry 
    ( 
        id bigint(20) NOT NULL AUTO_INCREMENT, 
        text varchar(255) NOT NULL, 
        deleted bit(1) NOT NULL, 
        modified datetime, 
        sortText varchar(255), 
        PRIMARY KEY USING BTREE (id) 
    ) 
    ENGINE= NDBCLUSTER DEFAULT CHARSET= utf8 COMMENT= 'number_of_replicas: 2'

NDB Error Report

Attachment: ndb_error_report_20070313125749.tar.bz2 (application/octet-stream, text), 136.31 KiB.

Also, I am on 5.1.16-beta.

Hi Kris, Rob & Nathan

Do any one of you have this repeatable.
I.e a set of schema+data+sql that will reproduce problem with
  some likelihood

/Jonas

I wasn't able to reproduce this either using 
basic setup (2 data nodes, 2 replica):

  CREATE LOGFILE GROUP lg_1    
  ADD UNDOFILE 'undo_1.dat'     
  INITIAL_SIZE 16M     
  UNDO_BUFFER_SIZE 2M     
  ENGINE NDB;

  ALTER LOGFILE GROUP lg_1
  ADD UNDOFILE 'undo_2.dat'
  INITIAL_SIZE 12M
  ENGINE NDB;

  CREATE 
  TABLE IndexEntry 
  ( 
    id bigint(20) NOT NULL AUTO_INCREMENT, 
    text varchar(255) NOT NULL, 
    deleted bit(1) NOT NULL, 
    modified datetime, 
    sortText varchar(255), 
    PRIMARY KEY USING BTREE (id) 
  ) 
  ENGINE= NDBCLUSTER DEFAULT CHARSET= utf8 
  COMMENT='number_of_replicas: 2';

  insert into IndexEntry
  set id = 1, text="foobar", deleted=1, modified=now(), sortText="barfoo";

  update IndexEntry set text = "Paris Bluegreen" where id=34225;  

so we really need some self contained test case
to reproduce this ...

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".