MySQL Bugs: #38628: crash in NdbScanOperation::takeOverScanOp

Bug #38628	crash in NdbScanOperation::takeOverScanOp
Submitted:	7 Aug 2008 12:10	Modified:	5 Oct 2008 16:32
Reporter:	Gunn Olaussen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-5.1-telco-6.3	OS:	Solaris (SunOS 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire-V210)
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
When running daily-basic on solaris sparc rig in trondheim there were 130 core files and all that I looked at where the same. Doing the same manually:

[ndbdev@techra35]~/autotest2/run/run-manual/run/ndb_mgmd.1: testBasic -n DeleteRead
random seed: 3582926507
testBasic started [2008-08-07 07:51:17]
| -  T1
- DeleteRead started  [2008-08-07 07:51:18]
- DeleteRead PASSED  [2008-08-07 07:51:18]
| -  T2
- DeleteRead started [2008-08-07 07:51:20]
Bus Error (core dumped)

Core was generated by `testBasic -n DeleteRead'.
Program terminated with signal 10, Bus error.
#0  0xff17e4bc in NdbScanOperation::takeOverScanOp (this=0x6dfa38,
    opType=NdbOperation::DeleteRequest, pTrans=0x5f4750) at NdbScanOperation.cpp:2167
2167    NdbScanOperation.cpp: No such file or directory.
        in NdbScanOperation.cpp
(gdb) where
#0  0xff17e4bc in NdbScanOperation::takeOverScanOp (this=0x6dfa38,
 opType=NdbOperation::DeleteRequest, pTrans=0x5f4750)
     at NdbScanOperation.cpp:2167
#1  0x0003be54 in UtilTransactions::clearTable (this=0xffbff5d0, pNdb=0x5f4228, flags=0, records=327680, parallelism=240) 
  at ../../../../storage/ndb/include/ndbapi/NdbScanOperation.hpp:596
#2  0x0001fb3c in runClearTable2 (ctx=0xe82d0, step=0xffbff5d0) at testBasic.cpp:362
#3  0x0002ab58 in NDBT_Step::execute (this=0xaecc0, ctx=0xe82d0) at NDBT_Test.cpp:291
#4  0x0002acd0 in NDBT_TestCaseImpl1::runFinal (this=0xc1fec,
 ctx=0xe82d0) at NDBT_Test.cpp:726
#5  0x0002a924 in NDBT_TestCase::execute (this=0xc1f50, ctx=0xe82d0)
 at NDBT_Test.cpp:654
#6  0x00028a78 in NDBT_TestSuite::execute (this=0x62fb0,
 con=@0xffbffa98, ndb=0xffbff888, pTab=0x63ad0,
     _testname=0xffbffba5 "DeleteRead") at NDBT_Test.cpp:1097
#7  0x00029d64 in NDBT_TestSuite::executeAll (this=0x62fb0,
 con=@0xffbffa28, _testname=0xffbffba5 "DeleteRead",
     _testname=0xffbffc40 "Fill") at NDBT_Test.cpp:849
#8  0x0002a61c in NDBT_TestSuite::execute (this=0x62fb0, argc=0,
 argv=0xe) at NDBT_Test.cpp:1376
#9  0x0001e764 in _start ()

How to repeat:
Problem can be reproduced by starting the cluster manually and running any of the failing test cases (see example of one under Description). my.cnf contained:

[atrt]
basedir = /home/ndbdev/autotest2/run/run-manual/run
baseport = 14000
clusters = .2node

[ndb_mgmd]

[mysqld]
skip-innodb
skip-bdb

[cluster_config.2node]
ndb_mgmd = techra35
ndbd = techra36,techra37
ndbapi= techra35,techra35,techra35

NoOfReplicas = 2
IndexMemory = 100M 
DataMemory = 300M
BackupMemory = 64M
MaxNoOfConcurrentScans = 100
MaxNoOfSavedMessages= 1000
SendBufferMemory = 2M
NoOfFragmentLogFiles = 4
FragmentLogFileSize = 64M
CompressedLCP=1
CompressedBackup=1
ODirect=1

#
# Generated by atrt
# Thu Aug  7 09:26:13 2008

[mysql_cluster.2node]
ndb-connectstring= techra35:14000

[cluster_config.ndb_mgmd.1.2node]
PortNumber= 14000

[cluster_config.ndbd.1.2node]
FileSystemPath= /home/ndbdev/autotest2/run/run-manual/run/

[cluster_config.ndbd.2.2node]
FileSystemPath= /home/ndbdev/autotest2/run/run-manual/run/

proposed fix

Attachment: bug38628.patch (text/x-patch), 1.87 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51566

2649 Jonas Oreland	2008-08-13
      ndb - bug#38628 - Fix invalid memory access in takeOverScanOp
        (causes bus-error on i.e sparc)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51574

2649 Jonas Oreland	2008-08-13
      ndb - bug#38628 - Fix invalid memory access in takeOverScanOp
        (causes bus-error on i.e sparc)

pushed to 62 63 and 64

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51912

2650 Tomas Ulin	2008-08-19
      continued fix for reset master

Documented in the NDB 6.2.16 and 6.3.17 changelogs as follows:

        An invalid memory access caused the management server to crash on
        Solaris Sparc platforms.

Already documented; closed.

Pushed into 6.0.7-alpha  (revid:jonas@mysql.com-20080813200401-ly735jw1t0j4eawe) (version source revid:tomas.ulin@sun.com-20080902154454-pvi3xa61d2wtxtbg) (pib:5)