MySQL Bugs: #72871: mysqlfabric changes slave connection to master and breaks replication.

Bug #72871	mysqlfabric changes slave connection to master and breaks replication.
Submitted:	4 Jun 2014 12:08	Modified:	16 Jun 2014 9:42
Reporter:	Shahriyar Rzayev	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Fabric	Severity:	S4 (Feature request)
Version:	1.4.3	OS:	Linux (CentOS 6.5)
Assigned to:		CPU Architecture:	Any

Description:
Dear experts,
I am testing mysqlfabric from 1.4.3 GA.
I have installed 3 MySQL 5.6.19 servers on CentOS 6.5:
192.168.1.77 -> master.
192.168.1.88 -> slave.
192.168.1.99 -> back log for fabric.

After setup master->slave topology manually then did all steps described on official documentation.
http://dev.mysql.com/doc/mysql-utilities/1.4/en/fabric.html

Here is fabric.conf file:

[DEFAULT]
prefix = /usr/local
sysconfdir = /etc
logdir = /var/log

[statistics]
prune_time = 3600

[logging]
url = file:///var/log/fabric.log
level = DEBUG

[storage]
auth_plugin = mysql_native_password
database = fabric
user = fabric
address = localhost:3306
connection_delay = 1
connection_timeout = 6
password = 12345
connection_attempts = 6

[failure_tracking]
notification_interval = 60
notification_clients = 50
detection_timeout = 1
detection_interval = 6
notifications = 300
detections = 3
failover_interval = 0
prune_time = 3600

[servers]
password = 12345
user = fabric

[connector]
ttl = 1

[client]
password = 

[protocol.xmlrpc]
disable_authentication = no
ssl_cert = 
realm = MySQL Fabric
ssl_key = 
ssl_ca = 
threads = 5
user = admin
address = localhost:32274
password = 12345

[executor]
executors = 5

[sharding]
mysqldump_program = /usr/bin/mysqldump
mysqlclient_program = /usr/bin/mysql

Here is commands that i used to add servers to group:

[root@fabricsrv ~]# mysqlfabric group create my_group
Procedure :
{ uuid        = f834719e-0638-4c60-a9b2-a460525f998a,
  finished    = True,
  success     = True,
  return      = True,
  activities  = 
}
[root@fabricsrv ~]# mysqlfabric group add my_group 192.168.1.77:3306
Procedure :
{ uuid        = 4abb9453-9da3-4e48-ae92-5c5b079f19a5,
  finished    = True,
  success     = True,
  return      = True,
  activities  = 
}
[root@fabricsrv ~]# mysqlfabric group add my_group 192.168.1.88:3306
Procedure :
{ uuid        = b09e6b4a-9e33-4f57-b38b-2dbed71c858f,
  finished    = True,
  success     = True,
  return      = True,
  activities  = 
}

[root@fabricsrv ~]# mysqlfabric group promote my_group
Procedure :
{ uuid        = f994d518-ae1c-4c73-bbca-c0e702e2ca59,
  finished    = True,
  success     = True,
  return      = True,
  activities  = 
}

[root@fabricsrv ~]# mysqlfabric group lookup_servers my_group
Command :
{ success     = True
  return      = [{'status': 'SECONDARY', 'server_uuid': '37246eb6-ebc8-11e3-845b-080027e3e8d2', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.88:3306'}, {'status': 'PRIMARY', 'server_uuid': '75c5356e-ebc7-11e3-8456-08002738050e', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.1.77:3306'}]
  activities  = 
}

[root@fabricsrv ~]# mysqlfabric group health my_group
Command :
{ success     = True
  return      = {'37246eb6-ebc8-11e3-845b-080027e3e8d2': {'status': 'SECONDARY', 'is_alive': True, 'threads': {'io_running': False, 'io_error': "error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 1"}}, '75c5356e-ebc7-11e3-8456-08002738050e': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
  activities  = 
}

After this error: 'io_running': False, 'io_error': "error connecting to master 'fabric@192.168.1.77:3306'

Here is status from slave:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Connecting to master
                  Master_Host: 192.168.1.77
                  Master_User: fabric
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 151
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1045
                Last_IO_Error: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 15
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 75c5356e-ebc7-11e3-8456-08002738050e
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 140604 16:46:44
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 75c5356e-ebc7-11e3-8456-08002738050e:1-5
                Auto_Position: 1
1 row in set (0,00 sec)

As you see it changes slave to master connection. ofcourse there is no such user to connect to master from slave.

How to repeat:
All steps provided in Description.

Suggested fix:
As i understand from documentation the 'fabric' user is using only  for connecting to master and slave servers from back log server and not for connection from slave to master.

Really unclear why it changes master connection in slave server.

Hi Shahriyar and thank you for the bug report.

It is unclear what the error is, except that the slave server cannot connect. Could you please upload the error log from the slave server so that it is possible to see what the error is?

Hi Mats, 
Sure here is related portion of slave error log:

Version: '5.6.19-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2014-06-04 14:36:43 2704 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_po
s= 4, master_bind=''. New state master_host='192.168.1.77', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2014-06-04 14:36:46 2704 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore 
not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manua
l for more information.
2014-06-04 14:36:46 2704 [Note] Slave I/O thread: connected to master 'repl@192.168.1.77:3306',replication started in log 'FIRST' at position 4
2014-06-04 14:36:46 2704 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '/var/lib/mysql/data/
mysql-relay-bin.000001' position: 4
2014-06-04 16:32:43 2704 [Note] Error reading relay log event: slave SQL thread was killed
2014-06-04 16:32:44 2704 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-06-04 16:32:44 2704 [Note] Slave I/O thread killed while reading event
2014-06-04 16:32:44 2704 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000001', position 1092
2014-06-04 16:32:44 2704 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='192.168.1.77', master_port= 3306, master_log_file='mysq
l-bin.000001', master_log_pos= 1092, master_bind=''. New state master_host='192.168.1.77', master_port= 3306, master_log_file='', master_log_pos
= 4, master_bind=''.
2014-06-04 16:32:44 2704 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore 
not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manua
l for more information.
2014-06-04 16:32:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 1, Error_code: 1045
2014-06-04 16:32:44 2704 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '/var/lib/mysql/data/
mysql-relay-bin.000001' position: 4
2014-06-04 16:33:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 2, Error_code: 1045
2014-06-04 16:34:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 3, Error_code: 1045
2014-06-04 16:35:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 4, Error_code: 1045
2014-06-04 16:36:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 5, Error_code: 1045
2014-06-04 16:37:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 6, Error_code: 1045
2014-06-04 16:38:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 7, Error_code: 1045
2014-06-04 16:39:44 2704 [ERROR] Slave I/O: error connecting to master 'fabric@192.168.1.77:3306' - retry-time: 60  retries: 8, Error_code: 1045

It seems that, slave i/o and sql thread was killed by fabric tool and it is executed change master statement against slave with 'fabric' user. But as documentation stated, this user(fabric) is only for, fabric installed server(back log server) to connect to both master and slave servers but not for using as replication user between master and slave.
Am i right?

Hi Shahriyar,

There are two sections in the configuration file that affect what user is being used: "servers" and "storage".

The "servers" section is used for information about the managed servers, while "storage" is used for the backing store. The default user in "servers" is "fabric", so it should exists on both masters and slaves that are being managed.

Documentation for this can be found in

  http://dev.mysql.com/doc/mysql-utilities/1.4/en/fabric-configuration-file.html

Do you have the user defined with the appropriate password according to your configuration file?

Here is my servers:

192.168.1.77 -> master.
192.168.1.88 -> slave.
192.168.1.99 -> back log for fabric.

I have setup replication between master and slave manually and after everything is worked i have created 'fabric'@'192.168.1.99' for connection from fabric back log server on master and it automatically created on slave side ofcourse:

On both servers:

mysql> show grants for 'fabric'@'192.168.1.99'
    -> ;
+---------------------------------------------------------------------------------------------------------------------------+
| Grants for fabric@192.168.1.99                                                                                            |
+---------------------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'fabric'@'192.168.1.99' IDENTIFIED BY PASSWORD '*00A51F3F48415C7D4E8908980D443C29C69B60C9' |
+---------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,04 sec)

And i can manually make connection from back log fabric server to these slave and master servers:

[root@fabricsrv ~]# mysql -u fabric -p --host=192.168.1.77
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.6.19-log MySQL Community Server (GPL)

[root@fabricsrv ~]# mysql -u fabric -p --host=192.168.1.88
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.6.19-log MySQL Community Server (GPL)

This 'fabric' user must be only used by back log server for connecting to these servers not running a change master statement with 'fabric' user. It will ofcourse brake slave connection to master because there is no such user 'fabric' for connecting from slave to master. This 'fabric' user is dedicated   only to back log server.

Hi Shahriyar,

You get the error message because you have not defined the user that is required to manage the servers.

MySQL Fabric manages the group of servers by using replication and promoting servers as needed so you shall not set up replication between them, just add them to the group.

You need to add the user in the "servers" section to the servers being managed with the correct password since the Fabric node need to be able to communicate with them.

Thank you for the bug report, but as it is described in the report, this is not a bug.

Please do not hesitate to file a new bug if you discover something that does not work as it should.

I have already created 'fabric' user on both master and slave server and provided in:

[servers]
password = 12345
user = fabric

Also created 'fabric' user on back log server and provided as:

[storage]
auth_plugin = mysql_native_password
database = fabric
user = fabric
address = localhost:3306
connection_delay = 1
connection_timeout = 6
password = 12345
connection_attempts = 6

Also there is no such thing in documentation that 'fabric' user must be created as replication user:

From Documentation:

http://dev.mysql.com/doc/mysql-utilities/1.4/en/fabric-create-user.html

"
Fabric uses the same user account, who must have all privileges on all databases, to access all MySQL Servers that it will manage. The user and password are defined in the configuration file as we shall see. To create this user and grant all the necessary privileges, execute the following command on all MySQL servers:

GRANT ALL ON *.* TO 'fabric'@'localhost';
SET PASSWORD FOR 'fabric'@'localhost' = PASSWORD('secret');

In the preceding example, substitute a password of your choice (replace 'secret'). Also, if you are going to run Fabric on a host other than where the managed MySQL servers reside, substitute the 'localhost' for the Fabric's host name. "

As documentation states i must replace 'localhost' with ip or hostname of back log server (for me it is 192.168.1.99) and i have created this users:
On master:

mysql> select user, host from mysql.user where user like 'fab%';
+--------+--------------+
| user   | host         |
+--------+--------------+
| fabric | 192.168.1.99 |
+--------+--------------+
1 row in set (0,03 sec)

On slave:

mysql> select user, host from mysql.user where user like 'fab%';
+--------+--------------+
| user   | host         |
+--------+--------------+
| fabric | 192.168.1.99 |
+--------+--------------+
1 row in set (0,09 sec)

But now when i read your message i undestand that the fabric user must be created as not 'fabric'@'ip_of_back_log_server' but as 'fabric'@'%' so that when it will execute 'change master' it will use this 'fabric' user as replication user. Now am i right?

So please add to documentation and explain this situation that 'fabric' user will be using as slave to master connection purposes as well.

I fixed this slave error with creating another fabric user in master server with ip of slave server:

mysql> create user 'fabric'@'192.168.1.88' identified by '12345';
Query OK, 0 rows affected (0,11 sec)

mysql> grant all on *.* to 'fabric'@'192.168.1.88';
Query OK, 0 rows affected (0,05 sec)

[root@fabricsrv ~]# mysqlfabric manage ping
Command :
{ success     = True
  return      = True
  activities  = 
}

[root@fabricsrv ~]# mysqlfabric group health my_group
Command :
{ success     = True
  return      = {'37246eb6-ebc8-11e3-845b-080027e3e8d2': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, '75c5356e-ebc7-11e3-8456-08002738050e': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}
  activities  = 
}

[root@fabricsrv ~]# mysqlfabric group lookup_servers my_group
Command :
{ success     = True
  return      = [{'status': 'SECONDARY', 'server_uuid': '37246eb6-ebc8-11e3-845b-080027e3e8d2', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.88:3306'}, {'status': 'PRIMARY', 'server_uuid': '75c5356e-ebc7-11e3-8456-08002738050e', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.1.77:3306'}]
  activities  = 
}

Hi Shahryiar,

The original bug was re-opened, but it is still not a bug. I agree that there is a documentation issue here, but please file this as a separate (documentation) bug. Piggybacking on existing bugs and changing the scope of the bug mostly lead to confusion.