MySQL Bugs: #74722: MYSQL fabric should handle common scenario for failover

Bug #74722	MYSQL fabric should handle common scenario for failover
Submitted:	7 Nov 2014 1:08	Modified:	20 Dec 2014 12:58
Reporter:	Mahesh Patil	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Fabric	Severity:	S1 (Critical)
Version:	1.5.2	OS:	Linux (Centos 6.3)
Assigned to:		CPU Architecture:	Any

Description:
I have followed steps and found out these common scenario where mysqlfabric should have handled it

1. I have created a fabric group with two servers one is MASTER (Primary - read-write mode ) and Slave (Secondary - Read only mode)

2. After that I have stopped MASTER (Primary) and now Slave is a new MASTER

3. Later I started original MASTER , and I can see status as FAULTY

4. Now I made status from FAULTY to SPARE and then SECONDARY

5. Now the replication is in reverse order (Slave is a new MASTER) and (Master is a NEW SLAVE).

There are problems faced in this scenario

1. Replication does not work and throws error for new slave , you need to change replication co-ordinates manually to make setup working for new replication setup (New Master to New Slave) should work fine without any problem and mysqlfabric should know the co-ordinates from where it has to read from (New Master's binary log and it's position)

2. How can I get my original setup in working condition without loosing any data ? (Original Master and Slave setup)

How to repeat:
Explained the same here http://forums.mysql.com/read.php?144,623272,623272#msg-623272

Hi Mahesh,

Thanks for the bug report. There are a few pieces of information missing for us to reproduce the problem.

In step 2, how do you stop the master? Do you shut it down, or do you do something else to stop the master? Normally, this should be done by using "mysqlfabric group demote" to demote the current master and then start working with it. It still works if you shut it down, but since you have asynchronous replication going, there might be transactions that have not yet been sent to the slave lingering.

I'm unsure at what step you see the error. Is it at step 2? Also, what is the error?

Normally, you should be able to get back to your original setup by incorporating the failed master in the group and then promoting it, but if you have errors in the server, it cannot (and should not) be promoted because of the errors.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".