MySQL Bugs: #75321: fabric commands hang indefinitely during fabric setup

Bug #75321	fabric commands hang indefinitely during fabric setup
Submitted:	28 Dec 2014 8:44	Modified:	9 Jan 2015 13:50
Reporter:	Daniel Yudelevich	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Fabric	Severity:	S1 (Critical)
Version:	1.5.3	OS:	Linux (CentOS 7)
Assigned to:		CPU Architecture:	Any

Description:
Running on CentOS 7 / MySQL 5.6.22 from Yum repo.

On clean setup of Fabric 1.5.3 (mysql-utilities-1.5.3-1.el7.noarch) after starting the daemon on the store instance and adding two nodes (following http://dev.mysql.com/doc/mysql-utilities/1.5/en/fabric-quick-start-replication.html ), any fabric command hangs indefinitely after running mysqlfabric group health for the first time.

I see active connections from fabric on both of the group's nodes:
mysql> select * from information_schema.processlist where Host like '12.0.0.10%';
+----+--------+-----------------+------+---------+------+-------+------+
| ID | USER   | HOST            | DB   | COMMAND | TIME | STATE | INFO |
+----+--------+-----------------+------+---------+------+-------+------+
|  5 | fabric | 12.0.0.10:55005 | NULL | Sleep   |  492 |       | NULL |
+----+--------+-----------------+------+---------+------+-------+------+
1 row in set (0.00 sec)

mysql> select * from information_schema.processlist where Host like '12.0.0.10%';
+----+--------+-----------------+------+---------+------+-------+------+
| ID | USER   | HOST            | DB   | COMMAND | TIME | STATE | INFO |
+----+--------+-----------------+------+---------+------+-------+------+
|  6 | fabric | 12.0.0.10:60704 | NULL | Sleep   |  534 |       | NULL |
+----+--------+-----------------+------+---------+------+-------+------+
1 row in set (0.00 sec)

Looking at the debug log, looks like it fails to get the slave status:
[DEBUG] 1419754308.868386 - XML-RPC-Session-0 - Error executing function: get_slave_status.
[DEBUG] 1419754308.871567 - XML-RPC-Session-0 - Error executing function: check_slave_issues.

my.cnf (same on all boxes, only report-host and server-id differ)
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 
server_id=10
report-host=12.0.0.10
report-port=3306
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
enforce-gtid-consistency=true
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
log-bin
skip_name_resolve
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

fabric.cfg on main node:
[DEFAULT]
prefix = 
sysconfdir = /etc
logdir = /var/log

[storage]
address = localhost:3306
user = fabric
password = fabric 
database = fabric
auth_plugin = mysql_native_password
connection_timeout = 1
connection_attempts = 2
connection_delay = 1

[servers]
user = fabric
password = fabric 
unreachable_timeout = 1

[protocol.xmlrpc]
address = localhost:32274
threads = 5
user = admin
password = fabric
disable_authentication = no
realm = MySQL Fabric
ssl_ca = 
ssl_cert = 
ssl_key = 

[protocol.mysql]
address = localhost:32275
user = admin
password = fabric
disable_authentication = no
ssl_ca = 
ssl_cert = 
ssl_key = 

[executor]
executors = 5

[logging]
level = DEBUG
url = file:///var/log/fabric.log

[sharding]
mysqldump_program = /usr/bin/mysqldump
mysqlclient_program = /usr/bin/mysql
prune_limit = 10000

[statistics]
prune_time = 3600

[failure_tracking]
notifications = 300
notification_clients = 50
notification_interval = 60
failover_interval = 0
detections = 3
detection_interval = 6
detection_timeout = 1
prune_time = 3600

[connector]
ttl = 1

[client]
password = fabric

How to repeat:
scenario a:
1. mysqlfabric manage setup
2. mysqlfabric manage start
3. mysqlfabric group create test
4. mysqlfabric group add host (per each host)
5. mysqlfabric group health test (works OK)
6. repeat mysqlfabric group health test (hangs)

scenario b:
1. mysqlfabric manage setup
2. mysqlfabric manage start
3. mysqlfabric group create test
4. mysqlfabric group add host (per each host)
5. mysqlfabric group lookup_servers test
6. repeat mysqlfabric group promote test --slave_id=(slave id from 5)

Debug log\work-log of hang on double health

Attachment: fabric_hang_health.txt (text/plain), 80.53 KiB.

Debug log - promote hang

Attachment: fabric_hang_promote.txt (text/plain), 17.72 KiB.

Hi Daniel!

Thanks for the bug report, but it seems to be a duplicate of BUG#74555.