Bug #96721 Add or remove instances to InnoDB cluster fails in mult-primary mode
Submitted: 2 Sep 2019 20:00 Modified: 22 Sep 2019 19:24
Reporter: Shubhra Prakash Nandi Email Updates:
Status: Verified Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:8.0.15 OS:CentOS (7)
Assigned to: CPU Architecture:x86 (x86_64)
Tags: addinstance, admin api, group_replication_enforce_update_everywhere_checks, InnoDB Cluster, multi-primary, mysql shell, removeinstance

[2 Sep 2019 20:00] Shubhra Prakash Nandi
Description:
Hello,

I found an issue with the table `instances` in InnoDB cluster metadata schema `mysql_innodb_cluster_metadata` which have the following foreign key constraints with CASCADE clause.

CONSTRAINT `instances_ibfk_1` FOREIGN KEY (`host_id`) REFERENCES `hosts` (`host_id`) ON DELETE RESTRICT,
  CONSTRAINT `instances_ibfk_2` FOREIGN KEY (`replicaset_id`) REFERENCES `replicasets` (`replicaset_id`) ON DELETE SET NULL

If a InnoDB cluster is created in multi-primary mode and global variable group_replication_enforce_update_everywhere_checks is set to ON (which is required in multi-primary configuration) any cluster.addInstance and cluster.removeInstance API call from Mysql shell will fail to add or remove instances respectively since these API calls causes the CASCADE clause to fire in `mysql_innodb_cluster_metadata`.`instances` table and group_replication_enforce_update_everywhere_checks prevents the CASCADE clause from executing.

So to add or remove instances in multi-primary mode we need to stop all instances, set group_replication_enforce_update_everywhere_checks  = OFF in my.cnf, start the instances, reboot the cluster from complete outage and then execute cluster.addInstance or cluster.removeInstance. Then again stop all instances, set group_replication_enforce_update_everywhere_checks  = ON in my.cnf, start the instances, reboot the cluster. So the CASCADE clause basically breaks Mysql shell API cluster.addInstance and cluster.removeInstance when the cluster is configured in multi-primary mode.

How to repeat:
1. Create InnoDB cluster with 3 instances in multi-primary mode.

2. Add group_replication_enforce_update_everywhere_checks = ON in my.cnf of all instances and shutdown all instances and then start all instances and reboot cluster from complete outage from one instance and rejoin other instances to cluster.

3. Prepare a new instance to be used as a cluster instance.

4. Execute Mysql shell and try to add the new instance to cluster using cluster.addInstance OR try to remove an existing instance from cluster using cluster.removeInstance. Both these commands fail.

Suggested fix:
Remove the CASCADE clause from `mysql_innodb_cluster_metadata`.`instances` foreign key constraints as mentioned below and replace their functionality using triggers.

CONSTRAINT `instances_ibfk_1` FOREIGN KEY (`host_id`) REFERENCES `hosts` (`host_id`) ON DELETE RESTRICT,
  CONSTRAINT `instances_ibfk_2` FOREIGN KEY (`replicaset_id`) REFERENCES `replicasets` (`replicaset_id`) ON DELETE SET NULL
[2 Sep 2019 20:28] Shubhra Prakash Nandi
I think I have mentioned the CASCADE clause by mistake, there is no CASCADE clause but ON DELETE SET NULL.

Also the constraint `instances_ibfk_1` FOREIGN KEY (`host_id`) REFERENCES `hosts`
(`host_id`) ON DELETE RESTRICT can be ignored from fix since RESTRICT is the default clause.
[6 Sep 2019 14:06] MySQL Verification Team
Hi,

Thanks for the report, verified as stated.
[13 Sep 2019 10:44] Shubhra Prakash Nandi
I am providing a fix but this does not still solve the issue with adding new instances when group_replication_enforce_update_everywhere_checks = ON since the plugin tries to create the table mysql_innodb_cluster_metadata.instances with old structure. The fix works for removing existing instances and dissolving the cluster.

USE mysql_innodb_cluster_metadata;

ALTER TABLE instances DROP FOREIGN KEY instances_ibfk_2;

ALTER TABLE instances ADD CONSTRAINT instances_ibfk_2 FOREIGN KEY (replicaset_id) REFERENCES replicasets (replicaset_id);

DELIMITER //

CREATE TRIGGER bd_replicasets BEFORE DELETE ON replicasets
FOR EACH ROW
BEGIN
	UPDATE instances inst SET inst.replicaset_id = NULL WHERE inst.replicaset_id = OLD.replicaset_id;
END
//

DELIMITER ;
[22 Sep 2019 19:24] Shubhra Prakash Nandi
Well I am providing a work around for this issue which works for both -  adding a new instance and removing and existing instance without taking the cluster down. This only works with Mysql 8 and not with any 5.7 version.

-- To add or remove an instance use the below UDF function to change the cluster mode to single-primary
SELECT group_replication_switch_to_single_primary_mode();

Now you can add or remove an instance without facing issues since this will automatically set group_replication_enforce_update_everywhere_checks to OFF.

-- To restore back the cluster mode to multi-primary use the below UDF function
SELECT group_replication_switch_to_multi_primary_mode();

This will automatically set group_replication_enforce_update_everywhere_checks to ON which is mandatory in a multi-primary cluster.

But still a fix is required for the original problem since it affects both 5.7 and 8.0 versions and I would consider it as a bug.