Bug #24763 Pointer too large: Please report a bug - Cluster - logs+traces included
Submitted: 1 Dec 2006 21:44 Modified: 20 Aug 2009 9:10
Reporter: Alex Davies Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1 OS:Linux (RHEL 4.x)
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: 5.0.38, 5.1.22, Cluster; Pointer too large; DBTUP; DbtupPagMan.cpp; 2306; ndbd

[1 Dec 2006 21:44] Alex Davies
Description:
I have had a cluster crash with an error telling me to report an error ["Pointer too large (Internal error, programming error or missing error message, please report a bug)"]. All other nodes shutdown on the instruction of the Arbitrator. This error occured on two nodes - who happen to be both the nodes in a nodegroup.

Cluster consists of 6 storage nodes (with NoOfReplicas=2), 1 managment node with all 7 running as SQL nodes.

Hardware is Dual Processor with 8GB RAM. Running RHEL 4 with Kernel 2.6.9-34.0.1.ELsmp. (x64_86)

This has happened before on this cluster, and is causing us big problems with the production use of this software - so any quick fix measure that I can use until a full patch is available would be greatly appreciated.

Cluster setup as follows:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     6 node(s)
id=2    @10.0.1.1  (Version: 5.0.27, Nodegroup: 0, Master)
id=3    @10.0.1.2  (Version: 5.0.27, Nodegroup: 0)
id=4    @10.0.1.3  (Version: 5.0.27, Nodegroup: 1)
id=5    @10.0.1.4  (Version: 5.0.27, Nodegroup: 1)
id=6    @10.0.1.5  (Version: 5.0.27, Nodegroup: 2)
id=7    @10.0.1.6  (Version: 5.0.27, Nodegroup: 2)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.0.1.7  (Version: 5.0.27)

[mysqld(API)]   13 node(s)
id=8    @10.0.1.1  (Version: 5.0.27)
id=9    @10.0.1.2  (Version: 5.0.27)
id=10   @10.0.1.3  (Version: 5.0.27)
id=11   @10.0.1.4  (Version: 5.0.27)
id=12   @10.0.1.5  (Version: 5.0.27)
id=13   @10.0.1.6  (Version: 5.0.27)
id=14   @10.0.1.7  (Version: 5.0.27)
id=15 (not connected, accepting connect from 10.0.1.1)
id=16 (not connected, accepting connect from 10.0.1.2)
id=17 (not connected, accepting connect from 10.0.1.3)
id=18 (not connected, accepting connect from 10.0.1.4)
id=19 (not connected, accepting connect from 10.0.1.5)
id=20 (not connected, accepting connect from 10.0.1.6)

config.ini:

# MySQL Cluster Config file

# Created: 22/6/06
# Updated 18/11/06
# Alex Davies <alex@davz.net>

#
# Define MGM node
#

[NDB_MGMD]
HostName=10.0.1.7
DataDir=/var/lib/mysql-cluster

#
# Define Storage nodes
#

[NDBD DEFAULT]
RedoBuffer=16MB
UndoDataBuffer=32MB
UndoIndexBuffer=3MB
BackupDataDir=/var/lib/mysql-cluster-backups

NoOfReplicas=2
DataDir= /var/lib/mysql-cluster
DataMemory=5500M
IndexMemory=1500M
TimeBetweenLocalCheckpoints=27
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=1024
MaxNoOfTables=512
MaxNoOfAttributes=5000
MaxNoOfTriggers=2000
MaxNoOfConcurrentOperations=170000
# Double default
NoOfFragmentLogFiles=16

[NDBD]
HostName=10.0.1.1

[NDBD]
HostName=10.0.1.2

[NDBD]
HostName=10.0.1.3

[NDBD]
HostName=10.0.1.4

[NDBD]
HostName=10.0.1.5

[NDBD]
HostName=10.0.1.6

#
# Define SQL Nodes
#

# Each node in twice to allow for backups to be restored

[MYSQLD]
HostName=10.0.1.1

[MYSQLD]
HostName=10.0.1.2

[MYSQLD]
HostName=10.0.1.3

[MYSQLD]
HostName=10.0.1.4

[MYSQLD]
HostName=10.0.1.5

[MYSQLD]
HostName=10.0.1.6

[MYSQLD]
HostName=10.0.1.7

[MYSQLD]
HostName=10.0.1.1

[MYSQLD]
HostName=10.0.1.2

[MYSQLD]
HostName=10.0.1.3

[MYSQLD]
HostName=10.0.1.4

[MYSQLD]
HostName=10.0.1.5

[MYSQLD]
HostName=10.0.1.6

Storage nodes 5 & 6 ndb_x_error.log's:

[root@cl2s5 mysql-cluster]# tail -25 ndb_6_error.log
Current byte-offset of file-pointer is: 568

Time: Friday 1 December 2006 - 18:38:34
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPagMan.cpp
Error object: DBTUP (Line: 342) 0x0000000e
Program: ndbd
Pid: 9505
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.1
Version: Version 5.0.27
***EOM***

[root@cl2s6 mysql-cluster]# tail -25 ndb_7_error.log
Current byte-offset of file-pointer is: 568

Time: Friday 1 December 2006 - 18:38:59
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPagMan.cpp
Error object: DBTUP (Line: 342) 0x0000000e
Program: ndbd
Pid: 10908
Trace: /var/lib/mysql-cluster/ndb_7_trace.log.1
Version: Version 5.0.27
***EOM***

Trace files:  Attached.

Please let me know if there is any further information that I can provide to be helpful.

Cluster started sucessfully after the crash with just "ndbd" running on all nodes (no managment node restart or --initial required)

How to repeat:
I would imagine a big dataset with a high query rate, but am not exactly sure exactly what caused it!

Suggested fix:
Patch?

config.ini change?
[1 Dec 2006 21:48] Alex Davies
ndb_6_trace.log.1.txt (cut to fit upload restriction)

Attachment: ndb_6_trace.log.1-snip.txt (text/plain), 48.03 KiB.

[1 Dec 2006 21:48] Alex Davies
ndb_7_trace.log.1.txt (cut to fit upload restriction)

Attachment: ndb_7_trace.log.1-snip.txt (text/plain), 43.56 KiB.

[1 Dec 2006 21:51] Alex Davies
Full trace file will not upload; I have attached the first part to this bug; rest is at the FTP site with filenames

bug-data-24763___ndb_6_trace.log.1
bug-data-24763___ndb_7_trace.log.1

Alex
[1 Dec 2006 23:43] Alex Davies
Sorry, filenamed on FTP site are

bug-data-24763___ndb_6_trace.log.1.txt
bug-data-24763___ndb_7_trace.log.1.txt
[2 Dec 2006 0:54] MySQL Verification Team
Updating Category to Cluster.
[2 Dec 2006 7:45] Jonas Oreland
Hi,

I can not find any directly while looking at tracefiles.

Do you get this repeatable ?
Do you have a test-system where this can be repeated?
Would it be possible to try to add some debug-code/use debug build to try to
  find error (maybe in combination with --core option)

/jonas
[3 Dec 2006 19:42] Alex Davies
Hi,

I am not sure. There was a cluster crash about 10 days ago but before I was able to investigate their admins had deleted all logs by starting ndbd with --initial!

If it happens again, I'll let you know.

I'm not currently using a distribution compiled from source, but if I need to I can compile the servers to run in debug mode. I'll do this if there is another crash.

Are there any config.ini paramaters I can set to reduce the chance of the "Pointer" getting "too large"?

Many thanks,

Alex

Kind regards,

Alex
[20 Aug 2009 8:53] Jonas Oreland
this has been fixed in >= telco-6.2 since forever now.
setting this to need-back to see if it's till active
[20 Aug 2009 9:05] Alex Davies
So far as I am aware, this issue has been resolved - i've not seen it since forever either!!

Cheers.
[20 Aug 2009 9:10] Jonas Oreland
closing