Bug #76301 ClusterJ test crashes with a core dump in Java 1.7 on Solaris/OEL
Submitted: 12 Mar 2015 17:21 Modified: 7 Jul 2015 14:42
Reporter: Lakshmi Narayanan Sreethar Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster/J Severity:S3 (Non-critical)
Version:7.4 OS:Any
Assigned to: CPU Architecture:Any

[12 Mar 2015 17:21] Lakshmi Narayanan Sreethar
Description:
The ndb.clusterj test is failing in few branches in PB2 with a core dump in JVM.
The core dump occurs every time when the test QueryBlobIndexScanTest is running.
The branches where the test fails : Solaris OS and Oracle Enterprise Linux. 

The core dump : 

Last '200' lines of output from command:
SIGTERM: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIG39: [libjvm.so+0xaf0718], sa_mask[0]=0x00000000, sa_flags=0x00000008
SIG40: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
Test schema failed (normal) select id from t_basic where id = 9999
Successfully initialized schema.

2: testsuite.clusterj.AutoPKTest.test running...testsuite.clusterj.AutoPKTest

3: testsuite.clusterj.BigIntegerTypesTest.testWriteJDBCReadNDB running...testsuite.clusterj.BigIntegerTypesTest

<snip>

75: testsuite.clusterj.QueryAllPrimitivesTest.test running...testsuite.clusterj.QueryAllPrimitivesTest

76: testsuite.clusterj.QueryBigIntegerTypesTest.test running...testsuite.clusterj.QueryBigIntegerTypesTest

77: testsuite.clusterj.QueryBlobIndexScanTest.test running...testsuite.clusterj.QueryBlobIndexScanTest
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xffffffff7ed61b5c, pid=10762, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (7.0_55-b14) (build 1.7.0_55-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode solaris-sparc compressed oops)
# Problematic frame:
# C  [libc.so.1+0x61b5c]Warning: SIGSEGV handler expected:libjvm.so+0x289e30  found:libjvm.so+0xc9b080
Signal Handlers:
SIGSEGV: [libjvm.so+0xc9b080], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGBUS: [libjvm.so+0xc9b080], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGFPE: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGPIPE: SIG_IGN, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGXFSZ: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGILL: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGQUIT: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGHUP: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGINT: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGTERM: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIG39: [libjvm.so+0xaf0718], sa_mask[0]=0x00000000, sa_flags=0x00000008
SIG40: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
Warning: SIGBUS handler expected:libjvm.so+0x289e30  found:libjvm.so+0xc9b080
Signal Handlers:
SIGSEGV: [libjvm.so+0xc9b080], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGBUS: [libjvm.so+0xc9b080], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGFPE: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGPIPE: SIG_IGN, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGXFSZ: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGILL: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGQUIT: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGHUP: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGINT: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGTERM: [libjvm.so+0xaeba70], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIG39: [libjvm.so+0xaf0718], sa_mask[0]=0x00000000, sa_flags=0x00000008
SIG40: [libjvm.so+0x289e30], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
# [ timer expired, abort... ]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
ndb.clusterj                             w8 [ fail ]

How to repeat:
Run './mtr cluster' and observe the crash
[13 May 2015 23:16] Craig Russell
Posted by developer:
 
Root cause is incorrect calculation of space needed for NdbRecord.

In NdbDictionary::createRecord, the space for blob columns is calculated to include 16 byte blob header plus 256 byte inline size. But actually only sizeof(Blob*) is needed. When scanning tables that have blob columns, the method nextResultCopyOut (used only by clusterj scans) copies the incorrect number of bytes, and potentially overwrites data in memory immediately following the buffer. 

In java 6 and earlier, the buffers were allocated on page boundaries (typically 4K bytes) so it would be unlikely that the extra data would be a problem. But in java 7 and later, buffers are allocated one after the other so the extra data corrupts the following buffer. 

The solution is to check the size of the buffer that is calculated by NdbDictionary::createRecord and if that size is bigger than the internally calculated buffer size, use the larger size.
[7 Jul 2015 14:42] Daniel So
Posted by developer:
 
Added the following entry to the MySQL Cluster 7.4.7 and 7.3.10 changelogs:

"When used with Java 1.7 or higher, ClusterJ might cause the Java VM to crash when querying tables with BLOB columns, because NdbDictionary::createRecord calculates the wrong size needed for the record. Subsequently, when ClusterJ called NdbScanOperation::nextRecordCopyOut, the data overran the allocated buffer space. With this fix, ClusterJ checks the size calculated by NdbDictionary::createRecord and uses the value for the buffer size, if it is larger than the value ClusterJ itself calculates."