Description:
Restoring a modyfied dump (changed engine via regex from Innodb|myisam to ndbcluster and added tablespace line) into mysql-cluster causes data loss and logs full of:
WARNING -- thr: 7: Overslept 1507 ms, expected ~10ms
WARNING -- thr: 4: Overslept 1512 ms, expected ~10ms
WARNING -- thr: 6: Overslept 2559 ms, expected ~10ms
and
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads)
after that data memory usage goes down from like 4% to 2% and then 0% and starting again to fill up but very slowly the dump does not abort.
datanodes hardware:
24cpu(HT)
128GB RAM
2.5TB raid5
2x 1GB LACP Ethernet
Management / sql nodes(2x KVM):
4 vCPUS
4 GB RAM
my config :
NoOfReplicas=2
DataMemory=90000M
IndexMemory=5000M
DiskPageBufferMemory=4000M
CompressedBackup=true
datadir=/var/lib/mysql-cluster
NoOfFragmentLogParts=10
MaxNoOfConcurrentOperations=10000000
MaxNoOfAttributes=100000
NoOfFragmentLogFiles=32
#Alle 64MB Lokalen Snapshot erstellen 20=4MB 21=8MB 22=16MB
TimeBetweenLocalCheckpoints=26
#Alle 8 Sekunden(8000) Globalen Snapshot erstellen
TimeBetweenGlobalCheckpoints=10000
MaxDiskWriteSpeed=600M
MinDiskWriteSpeed=200M
MaxDiskWriteSpeedOwnRestart=300M
MaxNoOfExecutionThreads=20
#RealtimeScheduler=1
TransactionDeadlockDetectionTimeout=3000
StopOnError=0
ODirect=1
#ThreadConfig=ldm={count=10,cpubind=0-4,12-16},tc={count=4,cpubind=6-7,18-19},send={count=1,cpubind=8},recv={count=1,cpubind=20},main={count=1,cpubind=9,21},rep={count=1,cpubind=9,21},io={count=1,cpubind=9,21},watchdog={count=1,cpubind=9,21}
#ThreadConfig=ldm={count=10,cpubind=0-4,12-16,thread_prio=9,spintime=200}, \
#tc={count=4,cpuset=6-7,18-19,thread_prio=8},send={count=1,cpuset=8}, \
#recv={count=1,cpuset=20},main={count=1,cpuset=9,21},rep={count=1,cpuset=9,21}, \
#io={count=1,cpuset=9,21,thread_prio=8},watchdog={count=1,cpuset=9,21,thread_prio=9}
[tcp default]
SendBufferMemory=64M
ReceiveBufferMemory=64M
[ndb_mgmd]
NodeId=1
hostname=172.16.17.11
datadir=/var/lib/mysql-cluster
[ndb_mgmd]
NodeId=2
hostname=172.16.17.12
datadir=/var/lib/mysql-cluster
[ndbd]
hostname=172.16.17.1
[ndbd]
hostname=172.16.17.2
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]
the dump itself is 33GB.
ndb_mgm output:
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @172.16.17.1 (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0, *)
id=4 @172.16.17.2 (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=2 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
[mysqld(API)] 20 node(s)
id=5 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=6 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=7 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=8 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5)
id=9 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=10 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=11 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=12 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)
id=16 (not connected, accepting connect from any host)
id=17 (not connected, accepting connect from any host)
id=18 (not connected, accepting connect from any host)
id=19 (not connected, accepting connect from any host)
id=20 (not connected, accepting connect from any host)
id=21 (not connected, accepting connect from any host)
id=22 (not connected, accepting connect from any host)
id=23 (not connected, accepting connect from any host)
id=24 (not connected, accepting connect from any host)
How to repeat:
1. clean start of datanode with empty directory and --initial
2. on mysqld do :
CREATE LOGFILE GROUP lg1 ADD UNDOFILE 'undo.dat' INITIAL_SIZE = 10000M engine=NDB;
create tablespace ts_jt add datafile 'ts_01.dat' use logfile group lg1 initial_size 100G engine ndb;
create database testdb
3. mysql -u root -ppassword testdb < testdb.sql
Description: Restoring a modyfied dump (changed engine via regex from Innodb|myisam to ndbcluster and added tablespace line) into mysql-cluster causes data loss and logs full of: WARNING -- thr: 7: Overslept 1507 ms, expected ~10ms WARNING -- thr: 4: Overslept 1512 ms, expected ~10ms WARNING -- thr: 6: Overslept 2559 ms, expected ~10ms and thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) thr_no:15 - sleeploop 10!! (Worker thread blocked (>= 10ms) by slow consumer threads) after that data memory usage goes down from like 4% to 2% and then 0% and starting again to fill up but very slowly the dump does not abort. datanodes hardware: 24cpu(HT) 128GB RAM 2.5TB raid5 2x 1GB LACP Ethernet Management / sql nodes(2x KVM): 4 vCPUS 4 GB RAM my config : NoOfReplicas=2 DataMemory=90000M IndexMemory=5000M DiskPageBufferMemory=4000M CompressedBackup=true datadir=/var/lib/mysql-cluster NoOfFragmentLogParts=10 MaxNoOfConcurrentOperations=10000000 MaxNoOfAttributes=100000 NoOfFragmentLogFiles=32 #Alle 64MB Lokalen Snapshot erstellen 20=4MB 21=8MB 22=16MB TimeBetweenLocalCheckpoints=26 #Alle 8 Sekunden(8000) Globalen Snapshot erstellen TimeBetweenGlobalCheckpoints=10000 MaxDiskWriteSpeed=600M MinDiskWriteSpeed=200M MaxDiskWriteSpeedOwnRestart=300M MaxNoOfExecutionThreads=20 #RealtimeScheduler=1 TransactionDeadlockDetectionTimeout=3000 StopOnError=0 ODirect=1 #ThreadConfig=ldm={count=10,cpubind=0-4,12-16},tc={count=4,cpubind=6-7,18-19},send={count=1,cpubind=8},recv={count=1,cpubind=20},main={count=1,cpubind=9,21},rep={count=1,cpubind=9,21},io={count=1,cpubind=9,21},watchdog={count=1,cpubind=9,21} #ThreadConfig=ldm={count=10,cpubind=0-4,12-16,thread_prio=9,spintime=200}, \ #tc={count=4,cpuset=6-7,18-19,thread_prio=8},send={count=1,cpuset=8}, \ #recv={count=1,cpuset=20},main={count=1,cpuset=9,21},rep={count=1,cpuset=9,21}, \ #io={count=1,cpuset=9,21,thread_prio=8},watchdog={count=1,cpuset=9,21,thread_prio=9} [tcp default] SendBufferMemory=64M ReceiveBufferMemory=64M [ndb_mgmd] NodeId=1 hostname=172.16.17.11 datadir=/var/lib/mysql-cluster [ndb_mgmd] NodeId=2 hostname=172.16.17.12 datadir=/var/lib/mysql-cluster [ndbd] hostname=172.16.17.1 [ndbd] hostname=172.16.17.2 [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] [mysqld] the dump itself is 33GB. ndb_mgm output: ndb_mgm> show Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=3 @172.16.17.1 (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0, *) id=4 @172.16.17.2 (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=1 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5) id=2 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5) [mysqld(API)] 20 node(s) id=5 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5) id=6 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5) id=7 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5) id=8 @172.16.17.11 (mysql-5.7.17 ndb-7.5.5) id=9 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5) id=10 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5) id=11 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5) id=12 @172.16.17.12 (mysql-5.7.17 ndb-7.5.5) id=13 (not connected, accepting connect from any host) id=14 (not connected, accepting connect from any host) id=15 (not connected, accepting connect from any host) id=16 (not connected, accepting connect from any host) id=17 (not connected, accepting connect from any host) id=18 (not connected, accepting connect from any host) id=19 (not connected, accepting connect from any host) id=20 (not connected, accepting connect from any host) id=21 (not connected, accepting connect from any host) id=22 (not connected, accepting connect from any host) id=23 (not connected, accepting connect from any host) id=24 (not connected, accepting connect from any host) How to repeat: 1. clean start of datanode with empty directory and --initial 2. on mysqld do : CREATE LOGFILE GROUP lg1 ADD UNDOFILE 'undo.dat' INITIAL_SIZE = 10000M engine=NDB; create tablespace ts_jt add datafile 'ts_01.dat' use logfile group lg1 initial_size 100G engine ndb; create database testdb 3. mysql -u root -ppassword testdb < testdb.sql