Bug #85015 MySQL Cluster 7.5.5 Replication Slave SQL Thread hangs with create table
Submitted: 16 Feb 2017 8:59 Modified: 22 Mar 2017 21:42
Reporter: Ivan Ma Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S3 (Non-critical)
Version:7.5.5 OS:Oracle Linux (7.3)
Assigned to: CPU Architecture:Any
Tags: regression

[16 Feb 2017 8:59] Ivan Ma
Description:
Environment :  
OS - Oracle Linux 7.

    #uname -a
    Linux virtual-24.localhost 4.1.12-61.1.27.el7uek.x86_64 #2 SMP Fri Feb 3 12:31:56 PST 2017 x86_64 x86_64 x86_64 GNU/Linux

    #cat /etc/oracle-release
    Oracle Linux Server release 7.3

Cluster (mycluster1 on VM1 ) -- replicate to ---> Cluster (mycluster2 on VM2)

mysqld on mycluster1 (3316, 3326) - VM1

mysqld on mycluster2 (3316,3326) - VM2

mysqld (3326) on mycluster1 ---> mysqld(3326) on mycluster2

Problem :
1. With create table on VM1 (mycluster1), the slave io thread is running correctly.  But the SQL Thread is hanged in 'creating table'

2. using mysql client on VM2 (mycluster2), issue command such as "show tables", that will be hanged forever.

3. using strace on the mysqld on the SLAVE server (strace -f -p <mysqld process of the slave>... ) and CTRL-C exit.  The locking is  released.  

4. If again on MASTER to issue "create table engine ndb", same locking  happens on SLAVE again .     

How to repeat:

Assuming you have the following setup

a.   2 VMs  (just register correctly in /etc/hosts to make sure the hostname can be resolved to IPs )

    hostname : primary and secondary

b.  folders : for Primary

    Home directory of mysql user : /home/mysql

    /home/mysql/mcm

    /home/mysql/mcm/mcm1.4.1   --> the mcm1.4.1 folder

    /home/mysql/mcm/cluster-latest ---> the soft link to the MySQL Cluster 7.5.5 folder

    /home/mysql/demo/mcm/rep/.....   (expand the rep.tar to demo folder)

c. folders : for Secondary

      /home/mysql/mcm

    /home/mysql/mcm/mcm1.4.1   --> the mcm1.4.1 folder

    /home/mysql/mcm/cluster-latest ---> the soft link to the MySQL Cluster 7.5.5 folder

    /home/mysql/demo/mcm/rep    (just to make the folder)    No need to expand.  The script will scp the related program from PRIMARY.

d. ssh trusted host between primary and secondary for mysql user.

check the comm.sh to see if it is what the environment variables fit yours. 

***************************************

To run it ...

following the number in the script :

./01-startmcmd.sh all

<wait a few second and issue 02-..> or else the mcmd will be core dump  .. too early to access before recovery to connect to mcmd, mcmd will be core dumped (bug???)>

./02-createCluster.sh

./03-startcluster.sh

./04-createData.sh

< no need to start ./05-insert.sh    because the 04 already cause the problem>

./06-startSlave.sh

**********************

Now you can login to secondary and check... the SQL Thread is in hanging stage of 'creating table'.   "show processlist" also show the create statement hangs around.
[17 Feb 2017 8:33] Ivan Ma
To simplify the test scenario : using 1 VM.  No replication but enable log-bin.

create site --hosts=127.0.0.1 mysite;
add package --basedir=/home/mysql/demos/mcm/cluster-755 cluster755;

create cluster --package=cluster755 --processhosts=ndb_mgmd@127.0.0.1,ndbmtd@127.0.0.1,ndbmtd@127.0.0.1,mysqld@127.0.0.1 mycluster;
set server-id:mysqld:50=53316 mycluster;
set binlog-format:mysqld=ROW mycluster;
set log-bin:mysqld:50=binlog mycluster;

start cluster mycluster;
show status -r mycluster;

*********

#!/bin/sh

mysql -uroot -h127.0.0.1 -P53316 << EOC
create database if not exists test;
create table test.t1 (f1 int not null auto_increment primary key, f2 varchar(20)) engine=ndbcluster;

The create table statement is hanged.

Please also find attached gdb log "thread apply all bt" when it was hanged.
[17 Feb 2017 8:50] Ivan Ma
A simple test script for hang on my environment.

Attachment: bug85015.test (application/octet-stream, text), 656 bytes.

[2 Mar 2017 3:02] MySQL Verification Team
Hi Ivan,
I managed to verify this finally *but* it's not something I can reproduce *on demand* ... initially I was testing using bare metal and I was not able to reproduce ... moved to VM and again was working ok, then moved to single VM (your last example) and again it was working ok ?!?!?!? I finally reproduced it by

1. using MCM to create cluster (I was normally using cluster's I configure and start myself)

2. I moved the VM from dedicated ESXi to my local machine with VirtualBox

not sure which exactly helped but it can be reproduced now.
[22 Mar 2017 21:42] Jon Stephens
Documented fix in NDB 7.5.7 and 7.6.2, as follows:

    Execution of CREATE TABLE could in some cases cause the
    replication slave SQL thread to hang.

    Regression of BUG#83676.

Closed.