Description:
Restoring a modyfied dump (changed engine via regex from Innodb|myisam to ndbcluster and added tablespace line) into mysql-cluster causes data loss and logs full of:
WARNING -- thr: 7: Overslept 1507 ms, expected ~10ms
WARNING -- thr: 4: Overslept 1512 ms, expected ~10ms
WARNING -- thr: 6: Overslept 2559 ms, expected ~10ms
and
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
after that data memory usage goes down from like 4% to 2% and then 0% and starting again to fill up but very slowly the dump does not abort.
datanodes hardware:
24cpu(HT)
128GB RAM
2.5TB raid5
2x 1GB LACP Ethernet
Management / sql nodes(2x KVM):
4 vCPUS
4 GB RAM
my config :
NoOfReplicas=2
DataMemory=90000M
IndexMemory=5000M
DiskPageBufferMemory=4000M
CompressedBackup=true
datadir=/var/lib/mysql-cluster
NoOfFragmentLogParts=10
MaxNoOfConcurrentOperations=10000000
MaxNoOfAttributes=100000
NoOfFragmentLogFiles=32
#Alle 64MB Lokalen Snapshot erstellen 20=4MB 21=8MB 22=16MB
TimeBetweenLocalCheckpoints=26
#Alle 8 Sekunden(8000) Globalen Snapshot erstellen
TimeBetweenGlobalCheckpoints=10000
MaxDiskWriteSpeed=600M
MinDiskWriteSpeed=200M
MaxDiskWriteSpeedOwnRestart=300M
MaxNoOfExecutionThreads=20
#RealtimeScheduler=1
TransactionDeadlockDetectionTimeout=3000
StopOnError=0
ODirect=1
#ThreadConfig=ldm={count=10,cpubind=0-4,12-16},tc={count=4,cpubind=6-7,18-19},send={count=1,cpubind=8},recv={count=1,cpubind=20},main={count=1,cpubind=9,21},rep={count=1,cpubind=9,21},io={count=1,cpubind=9,21},watchdog={count=1,cpubind=9,21}
#ThreadConfig=ldm={count=10,cpubind=0-4,12-16,thread_prio=9,spintime=200}, \
#tc={count=4,cpuset=6-7,18-19,thread_prio=8},send={count=1,cpuset=8}, \
#recv={count=1,cpuset=20},main={count=1,cpuset=9,21},rep={count=1,cpuset=9,21}, \
#io={count=1,cpuset=9,21,thread_prio=8},watchdog={count=1,cpuset=9,21,thread_prio=9}
[tcp default]
SendBufferMemory=64M
ReceiveBufferMemory=64M
[ndb_mgmd]
NodeId=1
hostname=172.16.17.11
datadir=/var/lib/mysql-cluster
[ndb_mgmd]
NodeId=2
hostname=172.16.17.12
datadir=/var/lib/mysql-cluster
[ndbd]
hostname=172.16.17.1
[ndbd]
hostname=172.16.17.2
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
the dump itself is 33GB.
ndb_mgm output:
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @172.16.17.1 (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0, *)
id=4 @172.16.17.2 (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=2 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
[mysqld(API)] 20 node(s)
id=5 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=6 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=7 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=8 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=9 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=10 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=11 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=12 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)
id=16 (not connected, accepting connect from any host)
id=17 (not connected, accepting connect from any host)
id=18 (not connected, accepting connect from any host)
id=19 (not connected, accepting connect from any host)
id=20 (not connected, accepting connect from any host)
id=21 (not connected, accepting connect from any host)
id=22 (not connected, accepting connect from any host)
id=23 (not connected, accepting connect from any host)
id=24 (not connected, accepting connect from any host)
How to repeat:
1. clean start of datanode with empty directory and --initial
2. on mysqld do :
CREATE LOGFILE GROUP lg1 ADD UNDOFILE 'undo.dat' INITIAL_SIZE = 10000M engine=NDB;
create tablespace ts_jt add datafile 'ts_01.dat' use logfile group lg1 initial_size 100G engine ndb;
create database testdb
3. mysql -u root -ppassword testdb < testdb.sql