Bug #20892 dbtup/DbtupPageMap.cpp Error: 2306 Pointer too large
Submitted: 6 Jul 2006 22:12 Modified: 6 Sep 2006 11:00
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1,5.0,5.1 OS:Linux (Linux 32 Bit OS)
Assigned to: Jonas Oreland CPU Architecture:Any

[6 Jul 2006 22:12] Jonathan Miller
Description:
In running hugoLoad on T8 using 1,000,000 (2GB table) I recieved:

ERROR: 1221 REDO buffers overloaded, consult online manual (increase RedoBuffer)
           Status: Temporary error, Classification: Overload error
           File: HugoTransactions.cpp (Line: 609)
Record 578773 could not be inserted, has retried 5 times
ERROR: 1221 REDO buffers overloaded, consult online manual (increase RedoBuffer)
           Status: Temporary error, Classification: Overload error
           File: HugoTransactions.cpp (Line: 609)
ERROR: 1221 REDO buffers overloaded, consult online manual (increase RedoBuffer)
           Status: Temporary error, Classification: Overload error
           File: HugoTransactions.cpp (Line: 609)
ERROR: 827 Out of memory in Ndb Kernel, table data (increase DataMemory)
           Status: Permanent error, Classification: Insufficient space
           File: HugoTransactions.cpp (Line: 633)

Which cause one of the data nodes to crash.

Time: Thursday 6 July 2006 - 23:32:14
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbtup/DbtupPageMap.cpp
Error object: DBTUP (Line: 317) 0x0000000e
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 2816
Trace: /space/run/ndb_3_trace.log.2
Version: Version 5.1.12 (beta)
***EOM***

DBLQH   002479 002489
DBACC   008525 003208 003227

--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 1224823220 gsn: 465 "DUMP_STATE_ORD" prio: 1
s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 1224823207 length: 2 trace: 1 #sec: 0 fragInf: 0
 H'000008fe H'00000001
--------------- Signal ----------------
r.bn: 254 "CMVMI", r.proc: 3, r.sigId: 1224823219 gsn: 247 "EVENT_REP" prio: 1
s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 0 length: 11 trace: 1 #sec: 0 fragInf: 0
 H'00000031 H'3a48514c H'6d754420 H'63612070 H'65766974 H'61635320 H'6365526e
 H'2064726f H'6973202d H'203a657a H'00313032
--------------- Signal ----------------
r.bn: 261 "PGMAN", r.proc: 3, r.sigId: 1224823218 gsn: 465 "DUMP_STATE_ORD" prio: 1
s.bn: 254 "CMVMI", s.proc: 3, s.sigId: 1224823201 length: 1 trace: 1 #sec: 0 fragInf: 0
 H'000008fe
--------------- Signal ----------------
r.bn: 259 "TSMAN", r.proc: 3, r.sigId: 1224823217 gsn: 465 "DUMP_STATE_ORD" prio: 1
s.bn: 254 "CMVMI", s.proc: 3, s.sigId: 1224823201 length: 1 trace: 1 #sec: 0 fragInf: 0
 H'000008fe

How to repeat:
create_tab T8
hugoLoad -r 1000000 T8
[6 Jul 2006 22:48] Jonathan Miller
I paste from wrong trace log. here is correction
DBTUP   024156 024232 021082 016212 016212 016214
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 255699984 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 255699983 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000000d H'00000005 H'00000010
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 255699983 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 255699982 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000000d H'00000005 H'00000010
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 255699982 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 255699981 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000000d H'00000005 H'00000010
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 255699981 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 255699980 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000000d H'00000005 H'00000010
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 255699980 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 255699979 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000000d H'00000005 H'00000010
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 255699979 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 255699978 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000000d H'00000005 H'00000010
[7 Jul 2006 13:12] Jonathan Miller
Hi, Hit this again on a different table and a different line of code.

Time: Friday 7 July 2006 - 14:48:33
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPageMap.cpp
Error object: DBTUP (Line: 297) 0x0000000e
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 6170
Trace: /space/run/ndb_3_trace.log.3
Version: Version 5.0.23

There seems to be no recovery from this unless I take down the cluster and restart with --initial

DBTC    000172 000278 012381 012416 012418

--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 538495503 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 538495501 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000007 H'00000003
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 538495502 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 538495500 length: 3 trace: 1 #sec: 0 fragInf: 0
 H'00000011 H'00000021 H'00f70003
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 538495501 gsn: 202 "DROP_TAB_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 538495499 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderRef: fa0003 senderData: 36 TableId: 3 requestType: 0
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 538495500 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 538495498 length: 3 trace: 1 #sec: 0 fragInf: 0
 H'00000011 H'00000021 H'00f70003
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 538495499 gsn: 204 "DROP_TAB_CONF" prio: 1
s.bn: 248 "DBACC", s.proc: 3, s.sigId: 538495495 length: 3 trace: 0 #sec: 0 fragInf: 0
 senderRef: f80003 senderData: 36 TableId: 3
--------------- Signal ----------------
[7 Jul 2006 13:23] Jonathan Miller
When Trying to restart data node 3, data node 2 failes

       ?016217 014255 014262 014267 014255 000299 016215 016215

r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 550288783 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 550288782 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000007 H'00000002
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 550288782 gsn: 164 "CONTINUEB" prio: 1
s.bn: 249 "DBTUP", s.proc: 2, s.sigId: 550288781 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000007 H'00000002
--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 2, r.sigId: 550288781 gsn: 202 "DROP_TAB_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 550288780 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderRef: fa0002 senderData: 119 TableId: 2 requestType: 0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 2, r.sigId: 550288780 gsn: 204 "DROP_TAB_CONF" prio: 1
s.bn: 248 "DBACC", s.proc: 2, s.sigId: 550288776 length: 3 trace: 0 #sec: 0 fragInf: 0
 senderRef: f80002 senderData: 119 TableId: 2
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 550288779 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 550288775 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 550288778 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 550288775 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004

Time: Friday 7 July 2006 - 14:04:44
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPageMap.cpp
Error object: DBTUP (Line: 297) 0x0000000e
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 9131
Trace: /space/run/ndb_2_trace.log.2
Version: Version 5.0.23
***EOM***
[7 Jul 2006 14:29] Jonas Oreland
Hi,

I tried all day to reproduce this...unfortunatly wo/ success.
I did however find 2 bugs while searching, and fixed one...
I found http://bugs.mysql.com/bug.php?id=20904
    and http://bugs.mysql.com/bug.php?id=20905

Your most welcome to retest 20905 if you like..cause I think it's so weird
  that I'm uncertain to wheather i'm loosing it...
[7 Jul 2006 14:39] Jonathan Miller
Here is the script that I used to produce this

Attachment: load.sh (application/x-shellscript, text), 1.98 KiB.

[7 Jul 2006 15:59] Jonathan Miller
Time: Friday 7 July 2006 - 17:26:33
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: DbtupPageMap.cpp
Error object: DBTUP (Line: 297) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 17815
Trace: /space/run/ndb_2_trace.log.3
Version: Version 5.0.23
***EOM***

I keep hitting it. I was able to recover last time. But the cluster will not allow and DDL operations with only one data node up.

Please try the shell script I attached.

Thanks
[3 Aug 2006 7:37] Jonas Oreland
Hi,

I finally manage to get this...but in 5.0 (still havent succeeded(?) in 5.1)
while retesting bug#20905

/Jonas
[3 Aug 2006 12:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10004

ChangeSet@1.2537, 2006-08-03 14:01:37+02:00, jonas@perch.ndb.mysql.com +1 -0
  ndb - bug#20892
    Fix bug in tup buddy allocator, which made it make invalid access to cfreepagelist[16] (which is not defined)
[3 Aug 2006 12:50] Jonas Oreland
This patch is also needed in 5.0/5.1

http://lists.mysql.com/commits/10005
[28 Aug 2006 9:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10918

ChangeSet@1.2037, 2006-08-28 11:27:11+02:00, mskold@mysql.com +2 -0
  ndb - bug#20892 Fix bug in tup buddy allocator, which made it make invalid access to cfreepagelist[16] (which is not defined)
[1 Sep 2006 8:05] Jonas Oreland
pushed to 5.1.12
[1 Sep 2006 19:33] Jonas Oreland
pushed to 5.0.25
[6 Sep 2006 7:13] Jonas Oreland
pushed into 4.1.22
[6 Sep 2006 11:00] Jon Stephens
Documented in 4.1.22, 5.0.25, and 5.1.12 changelogs.
[4 Apr 2008 21:06] Sveta Smirnova
Bug #35795 was marked as duplicated of this one.