Bug #92826 ndb_import segfaults with --opbatch=10000
Submitted: 17 Oct 2018 12:08 Modified: 19 Oct 2018 0:51
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.6.7 OS:Any
Assigned to: CPU Architecture:Any

[17 Oct 2018 12:08] Daniël van Eeden
Description:
ndb_import segfaults with --opbatch=10000

How to repeat:
--opbatch=1026 crashes
--opbatch=1025 does not
[17 Oct 2018 12:09] Daniël van Eeden
*** stack smashing detected ***: /bin/ndb_import terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7ffff6ee36e7]
/lib64/libc.so.6(+0x1186a2)[0x7ffff6ee36a2]
/bin/ndb_import[0x464f55]
/bin/ndb_import[0x465048]
/bin/ndb_import[0x41c2c5]
/bin/ndb_import[0x412c19]
/bin/ndb_import[0x412f2e]
/bin/ndb_import[0x4e421f]
/lib64/libpthread.so.0(+0x7e25)[0x7ffff7bc6e25]
/lib64/libc.so.6(clone+0x6d)[0x7ffff6ec9bad]
======= Memory map: ========
...
[18 Oct 2018 17:53] MySQL Verification Team
Hi,

Can you share the full config for cluster and can you give me some general info about the table you are importing
 - what is the create table of the table you are importing
 - how many records you have that you are importing per table (wc filename.csv ?)

I'm running a test now and with --opbatch=10000 and it's running for 2+ hours already without a crash (the same data imported within few minutes without --opbatch=10000 ) so for now I am not able to reproduce the issue.. 

all best
Bogdan
[18 Oct 2018 18:07] Daniël van Eeden
# anonymized config:

[system]
Name=foobar101

[ndbd default]
NoOfReplicas= 2
DataDir= /var/lib/mysql-cluster
LockPagesInMainMemory=1
DataMemory=80G
ODirect=1
NoOfFragmentLogFiles=300
NoOfFragmentLogParts=10
ThreadConfig=ldm={count=10,cpubind=0-4,12-16,thread_prio=9,spintime=200},tc={count=4,cpuset=6-7,18-19,thread_prio=8},send={count=1,cpuset=8},recv={count=1,cpuset=20},main={count=1,cpuset=9,21},rep={count=1,cpuset=9,21},io={count=1,cpuset=9,21,thread_prio=8},watchdog={count=1,cpuset=9,21,thread_prio=9}
MaxNoOfConcurrentOperations=300000
SchedulerSpinTimer=400
SchedulerExecutionTimer=100
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=200
TransactionDeadlockDetectionTimeout=10000
RedoBuffer=128M
CompressedBackup=1
MaxNoOfTables=1024
MaxNoOfAttributes=10000
MaxNoOfOrderedIndexes=768
MaxNoOfFiredTriggers=8000
SharedGlobalMemory=1G
DiskPageBufferMemory=4G
DiskIOThreadPool=4
MinDiskWriteSpeed=20M
MaxDiskWriteSpeed=300M
MaxDiskWriteSpeedOtherNodeRestart=200M
MaxDiskWriteSpeedOwnRestart=400M
BackupDiskWriteSpeedPct=60
IndexStatAutoCreate=1
IndexStatAutoUpdate=1
StartupStatusReportFrequency=120

[tcp default]
SendBufferMemory=32M

[ndb_mgmd]
NodeId= 49
HostName= foobarmgmt-1001.example.com

[ndbd]
NodeId= 1
HostName= foobar-1001.example.com
[ndbd]
NodeId= 2
HostName= foobar-1002.example.com
[ndbd]
NodeId= 3
HostName= foobar-1003.example.com
[ndbd]
NodeId= 4
HostName= foobar-1004.example.com

[mysqld]
NodeId= 51
[mysqld]
NodeId= 52
[mysqld]
NodeId= 53
[mysqld]
NodeId= 54
[mysqld]
NodeId= 55
[mysqld]
NodeId= 56
[mysqld]
NodeId= 57
[mysqld]
NodeId= 58
[mysqld]
NodeId= 59
[mysqld]
NodeId= 60
[mysqld]
NodeId= 61
[mysqld]
NodeId= 62
[mysqld]
NodeId= 63
[mysqld]
NodeId= 64
[mysqld]
NodeId= 65
[mysqld]
NodeId= 66
[mysqld]
NodeId= 67
[mysqld]
NodeId= 68
[mysqld]
NodeId= 69
[mysqld]
NodeId= 70
[mysqld]
NodeId= 71
[mysqld]
NodeId= 72
[mysqld]
NodeId= 73
[mysqld]
NodeId= 74
[mysqld]
NodeId= 75
[mysqld]
NodeId= 76
[mysqld]
NodeId= 77
[mysqld]
NodeId= 78
[18 Oct 2018 18:15] Daniël van Eeden
my original file had 183011 entries, but I reproduced it with just the first 18000 entries.

The table is used for archiving/combining some old P_S data. 
The table looks like this:
CREATE TABLE `host_file_stats_201608` (
  `sample_id` bigint(20) unsigned NOT NULL,
  `TABLE_NAME` char(64) NOT NULL,
  `COUNT_STAR` bigint(20) unsigned NOT NULL,
  `SUM_TIMER_WAIT` bigint(20) unsigned NOT NULL,
  `MIN_TIMER_WAIT` bigint(20) unsigned NOT NULL,
  `AVG_TIMER_WAIT` bigint(20) unsigned NOT NULL,
  `MAX_TIMER_WAIT` bigint(20) unsigned NOT NULL,
  `COUNT_READ` bigint(20) unsigned NOT NULL,
  `SUM_TIMER_READ` bigint(20) unsigned NOT NULL,
  `MIN_TIMER_READ` bigint(20) unsigned NOT NULL,
  `AVG_TIMER_READ` bigint(20) unsigned NOT NULL,
  `MAX_TIMER_READ` bigint(20) unsigned NOT NULL,
  `SUM_NUMBER_OF_BYTES_READ` bigint(20) unsigned NOT NULL,
  `COUNT_WRITE` bigint(20) unsigned NOT NULL,
  `SUM_TIMER_WRITE` bigint(20) unsigned NOT NULL,
  `MIN_TIMER_WRITE` bigint(20) unsigned NOT NULL,
  `AVG_TIMER_WRITE` bigint(20) unsigned NOT NULL,
  `MAX_TIMER_WRITE` bigint(20) unsigned NOT NULL,
  `SUM_NUMBER_OF_BYTES_WRITE` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`sample_id`,`TABLE_NAME`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC

-- host_file_stats_201608 --
Version: 8
Fragment type: HashMapPartition
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 19
Number of primary keys: 2
Length of frm data: 599
Max Rows: 0
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
PartitionCount: 40
FragmentCount: 40
PartitionBalance: FOR_RP_BY_LDM
ExtraRowGciBits: 0
ExtraRowAuthorBits: 0
TableStatus: Retrieved
Table options:
HashMap: DEFAULT-HASHMAP-3840-40
-- Attributes --
sample_id Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY DYNAMIC
TABLE_NAME Char(64;latin1_swedish_ci) PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY DYNAMIC
COUNT_STAR Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
SUM_TIMER_WAIT Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
MIN_TIMER_WAIT Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
AVG_TIMER_WAIT Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
MAX_TIMER_WAIT Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
COUNT_READ Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
SUM_TIMER_READ Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
MIN_TIMER_READ Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
AVG_TIMER_READ Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
MAX_TIMER_READ Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
SUM_NUMBER_OF_BYTES_READ Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
COUNT_WRITE Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
SUM_TIMER_WRITE Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
MIN_TIMER_WRITE Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
AVG_TIMER_WRITE Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
MAX_TIMER_WRITE Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
SUM_NUMBER_OF_BYTES_WRITE Bigunsigned NOT NULL AT=FIXED ST=MEMORY DYNAMIC
-- Indexes -- 
PRIMARY KEY(sample_id, TABLE_NAME) - UniqueHashIndex
PRIMARY(sample_id, TABLE_NAME) - OrderedIndex
[19 Oct 2018 0:51] MySQL Verification Team
Hi,

I verified the bug with your table structure, looks like it's related to row size, anyhow, the bug looks real, and the workaround you already found (use smaller opbatch ).

Thanks for report and the data
Bogdan
[19 Oct 2018 0:53] MySQL Verification Team
mysql> CREATE TABLE `host_file_stats_201608` (
    ->   `sample_id` bigint(20) unsigned NOT NULL,
    ->   `TABLE_NAME` char(64) NOT NULL,
    ->   `COUNT_STAR` bigint(20) unsigned NOT NULL,
    ->   `SUM_TIMER_WAIT` bigint(20) unsigned NOT NULL,
    ->   `MIN_TIMER_WAIT` bigint(20) unsigned NOT NULL,
    ->   `AVG_TIMER_WAIT` bigint(20) unsigned NOT NULL,
    ->   `MAX_TIMER_WAIT` bigint(20) unsigned NOT NULL,
    ->   `COUNT_READ` bigint(20) unsigned NOT NULL,
    ->   `SUM_TIMER_READ` bigint(20) unsigned NOT NULL,
    ->   `MIN_TIMER_READ` bigint(20) unsigned NOT NULL,
    ->   `AVG_TIMER_READ` bigint(20) unsigned NOT NULL,
    ->   `MAX_TIMER_READ` bigint(20) unsigned NOT NULL,
    ->   `SUM_NUMBER_OF_BYTES_READ` bigint(20) unsigned NOT NULL,
    ->   `COUNT_WRITE` bigint(20) unsigned NOT NULL,
    ->   `SUM_TIMER_WRITE` bigint(20) unsigned NOT NULL,
    ->   `MIN_TIMER_WRITE` bigint(20) unsigned NOT NULL,
    ->   `AVG_TIMER_WRITE` bigint(20) unsigned NOT NULL,
    ->   `MAX_TIMER_WRITE` bigint(20) unsigned NOT NULL,
    ->   `SUM_NUMBER_OF_BYTES_WRITE` bigint(20) unsigned NOT NULL,
    ->   PRIMARY KEY (`sample_id`,`TABLE_NAME`)
    -> ) ENGINE=ndbcluster DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Query OK, 0 rows affected (0.19 sec)

mysql> insert into host_file_stats_201608 select id, ' ', 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 from t1 limit 20000;
Query OK, 20000 rows affected (0.50 sec)
Records: 20000  Duplicates: 0  Warnings: 0

mysql>  select * INTO OUTFILE '/tmp/t4.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' FROM host_file_stats_201608;
Query OK, 20000 rows affected (0.07 sec)

mysql> create table t4 like host_file_stats_201608;
Query OK, 0 rows affected (0.19 sec)

[root@localhost mysql]# bin/ndb_import --opbatch=10000 test /tmp/t4.csv --fields-optionally-enclosed-by='"' --fields-terminated-by="," --fields-escaped-by='\\'
job-1 import test.t4 from /tmp/t4.csv
job-1 [starting] import test.t4 from /tmp/t4.csv
job-1 [running] import test.t4 from /tmp/t4.csv
Segmentation fault (core dumped)
[root@localhost mysql]#