MySQL Bugs: #24763: Pointer too large: Please report a bug - Cluster

Bug #24763	Pointer too large: Please report a bug - Cluster - logs+traces included
Submitted:	1 Dec 2006 21:44	Modified:	20 Aug 2009 9:10
Reporter:	Alex Davies	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1	OS:	Linux (RHEL 4.x)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	5.0.38, 5.1.22, Cluster; Pointer too large; DBTUP; DbtupPagMan.cpp; 2306; ndbd

Description:
I have had a cluster crash with an error telling me to report an error ["Pointer too large (Internal error, programming error or missing error message, please report a bug)"]. All other nodes shutdown on the instruction of the Arbitrator. This error occured on two nodes - who happen to be both the nodes in a nodegroup.

Cluster consists of 6 storage nodes (with NoOfReplicas=2), 1 managment node with all 7 running as SQL nodes.

Hardware is Dual Processor with 8GB RAM. Running RHEL 4 with Kernel 2.6.9-34.0.1.ELsmp. (x64_86)

This has happened before on this cluster, and is causing us big problems with the production use of this software - so any quick fix measure that I can use until a full patch is available would be greatly appreciated.

Cluster setup as follows:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     6 node(s)
id=2    @10.0.1.1  (Version: 5.0.27, Nodegroup: 0, Master)
id=3    @10.0.1.2  (Version: 5.0.27, Nodegroup: 0)
id=4    @10.0.1.3  (Version: 5.0.27, Nodegroup: 1)
id=5    @10.0.1.4  (Version: 5.0.27, Nodegroup: 1)
id=6    @10.0.1.5  (Version: 5.0.27, Nodegroup: 2)
id=7    @10.0.1.6  (Version: 5.0.27, Nodegroup: 2)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.0.1.7  (Version: 5.0.27)

[mysqld(API)]   13 node(s)
id=8    @10.0.1.1  (Version: 5.0.27)
id=9    @10.0.1.2  (Version: 5.0.27)
id=10   @10.0.1.3  (Version: 5.0.27)
id=11   @10.0.1.4  (Version: 5.0.27)
id=12   @10.0.1.5  (Version: 5.0.27)
id=13   @10.0.1.6  (Version: 5.0.27)
id=14   @10.0.1.7  (Version: 5.0.27)
id=15 (not connected, accepting connect from 10.0.1.1)
id=16 (not connected, accepting connect from 10.0.1.2)
id=17 (not connected, accepting connect from 10.0.1.3)
id=18 (not connected, accepting connect from 10.0.1.4)
id=19 (not connected, accepting connect from 10.0.1.5)
id=20 (not connected, accepting connect from 10.0.1.6)

config.ini:

# MySQL Cluster Config file

# Created: 22/6/06
# Updated 18/11/06
# Alex Davies <alex@davz.net>

#
# Define MGM node
#

[NDB_MGMD]
HostName=10.0.1.7
DataDir=/var/lib/mysql-cluster

#
# Define Storage nodes
#

[NDBD DEFAULT]
RedoBuffer=16MB
UndoDataBuffer=32MB
UndoIndexBuffer=3MB
BackupDataDir=/var/lib/mysql-cluster-backups

NoOfReplicas=2
DataDir= /var/lib/mysql-cluster
DataMemory=5500M
IndexMemory=1500M
TimeBetweenLocalCheckpoints=27
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=1024
MaxNoOfTables=512
MaxNoOfAttributes=5000
MaxNoOfTriggers=2000
MaxNoOfConcurrentOperations=170000
# Double default
NoOfFragmentLogFiles=16

[NDBD]
HostName=10.0.1.1

[NDBD]
HostName=10.0.1.2

[NDBD]
HostName=10.0.1.3

[NDBD]
HostName=10.0.1.4

[NDBD]
HostName=10.0.1.5

[NDBD]
HostName=10.0.1.6

#
# Define SQL Nodes
#

# Each node in twice to allow for backups to be restored

[MYSQLD]
HostName=10.0.1.1

[MYSQLD]
HostName=10.0.1.2

[MYSQLD]
HostName=10.0.1.3

[MYSQLD]
HostName=10.0.1.4

[MYSQLD]
HostName=10.0.1.5

[MYSQLD]
HostName=10.0.1.6

[MYSQLD]
HostName=10.0.1.7

[MYSQLD]
HostName=10.0.1.1

[MYSQLD]
HostName=10.0.1.2

[MYSQLD]
HostName=10.0.1.3

[MYSQLD]
HostName=10.0.1.4

[MYSQLD]
HostName=10.0.1.5

[MYSQLD]
HostName=10.0.1.6

Storage nodes 5 & 6 ndb_x_error.log's:

[root@cl2s5 mysql-cluster]# tail -25 ndb_6_error.log
Current byte-offset of file-pointer is: 568

Time: Friday 1 December 2006 - 18:38:34
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPagMan.cpp
Error object: DBTUP (Line: 342) 0x0000000e
Program: ndbd
Pid: 9505
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.1
Version: Version 5.0.27
***EOM***

[root@cl2s6 mysql-cluster]# tail -25 ndb_7_error.log
Current byte-offset of file-pointer is: 568

Time: Friday 1 December 2006 - 18:38:59
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPagMan.cpp
Error object: DBTUP (Line: 342) 0x0000000e
Program: ndbd
Pid: 10908
Trace: /var/lib/mysql-cluster/ndb_7_trace.log.1
Version: Version 5.0.27
***EOM***

Trace files:  Attached.

Please let me know if there is any further information that I can provide to be helpful.

Cluster started sucessfully after the crash with just "ndbd" running on all nodes (no managment node restart or --initial required)

How to repeat:
I would imagine a big dataset with a high query rate, but am not exactly sure exactly what caused it!

Suggested fix:
Patch?

config.ini change?

ndb_6_trace.log.1.txt (cut to fit upload restriction)

Attachment: ndb_6_trace.log.1-snip.txt (text/plain), 48.03 KiB.

ndb_7_trace.log.1.txt (cut to fit upload restriction)

Attachment: ndb_7_trace.log.1-snip.txt (text/plain), 43.56 KiB.

Full trace file will not upload; I have attached the first part to this bug; rest is at the FTP site with filenames

bug-data-24763___ndb_6_trace.log.1
bug-data-24763___ndb_7_trace.log.1

Alex

Sorry, filenamed on FTP site are

bug-data-24763___ndb_6_trace.log.1.txt
bug-data-24763___ndb_7_trace.log.1.txt

Updating Category to Cluster.

Hi,

I can not find any directly while looking at tracefiles.

Do you get this repeatable ?
Do you have a test-system where this can be repeated?
Would it be possible to try to add some debug-code/use debug build to try to
  find error (maybe in combination with --core option)

/jonas

Hi,

I am not sure. There was a cluster crash about 10 days ago but before I was able to investigate their admins had deleted all logs by starting ndbd with --initial!

If it happens again, I'll let you know.

I'm not currently using a distribution compiled from source, but if I need to I can compile the servers to run in debug mode. I'll do this if there is another crash.

Are there any config.ini paramaters I can set to reduce the chance of the "Pointer" getting "too large"?

Many thanks,

Alex

Kind regards,

Alex

this has been fixed in >= telco-6.2 since forever now.
setting this to need-back to see if it's till active

So far as I am aware, this issue has been resolved - i've not seen it since forever either!!

Cheers.

closing