Description:
Issue : Innodb cluster Primary writes impacted due to addInstance() and removeInstance()
How to repeat:
Test Case:
Started InnoDB cluster with one node, with Single-Primary mode.
MySQL 127.0.0.1:30308 ssl JS > cluster.status()
{
"clusterName": "testCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "127.0.0.1:30308",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures.",
"topology": {
"127.0.0.1:30308": {
"address": "127.0.0.1:30308",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.33"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "127.0.0.1:30308"
### Sarted sysbench WRITE load running on primary
$ sysbench /usr/share/sysbench/oltp_write_only.lua --mysql_storage_engine=innodb --table-size=20000 --tables=10 --mysql-host=127.0.0.1 --mysql-db=test --mysql-port=30308 --mysql-user=msandbox --mysql-password=msandbox --threads=8 --time=0 --report-interval=1 --events=0 --db-driver=mysql --db-ps-mode=disable run
###### Adding new node in InnoDB cluster,
MySQL 127.0.0.1:30308 ssl JS > cluster.addInstance('icadmin@127.0.0.1:30309')
NOTE: The target instance '127.0.0.1:30309' has not been pre-provisioned (GTID set is empty). The Shell is unable to decide whether incremental state recovery can correctly provision it.
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of '127.0.0.1:30309' with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.
The incremental state recovery may be safely used if you are sure all updates ever executed in the cluster were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the cluster or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.
Please select a recovery method [C]lone/[I]ncremental recovery/[A]bort (default Clone):
Validating instance configuration at 127.0.0.1:30309...
This instance reports its own address as 127.0.0.1:30309
Instance configuration is suitable.
NOTE: Group Replication will communicate with other members using '127.0.0.1:30309'. Use the localAddress option to override.
* Checking connectivity and SSL configuration...
A new instance will be added to the InnoDB Cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.
Adding instance to the cluster...
Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.
NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.
* Waiting for clone to finish...
NOTE: 127.0.0.1:30309 is being cloned from 127.0.0.1:30308
** Stage DROP DATA: Completed
** Clone Transfer
FILE COPY ###################################################### 100% Compl PAGE COPY ###################################################### 100% Compl REDO COPY ###################################################### 100% Completed
NOTE: 127.0.0.1:30309 is shutting down...
* Waiting for server restart... ready
* 127.0.0.1:30309 has restarted, waiting for clone to finish...
** Stage RESTART: Completed
* Clone process has finished: 240.93 MB transferred in about 1 second (~240.93 MB/s)
Incremental state recovery is now in progress.
* Waiting for distributed recovery to finish...
NOTE: '127.0.0.1:30309' is being recovered from '127.0.0.1:30308'
* Distributed recovery has finished
The instance '127.0.0.1:30309' was successfully added to the cluster.
MySQL 127.0.0.1:30308 ssl JS >
As we can see below, write load impacted on primary due to addInstance()
[ 39s ] thds: 8 tps: 284.00 qps: 1703.98 (r/w/o: 0.00/1135.99/567.99) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 8 tps: 255.00 qps: 1530.00 (r/w/o: 0.00/1020.00/510.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 41s ] thds: 8 tps: 272.00 qps: 1632.03 (r/w/o: 0.00/1088.02/544.01) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00
[ 42s ] thds: 8 tps: 257.99 qps: 1547.97 (r/w/o: 0.00/1031.98/515.99) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00
[ 43s ] thds: 8 tps: 261.93 qps: 1571.57 (r/w/o: 0.00/1047.72/523.86) lat (ms,95%): 46.63 err/s: 0.00 reconn/s: 0.00
[ 44s ] thds: 8 tps: 69.84 qps: 419.03 (r/w/o: 0.00/279.36/139.68) lat (ms,95%): 66.84 err/s: 0.00 reconn/s: 0.00 ####
[ 45s ] thds: 8 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 1708.63 err/s: 0.00 reconn/s: 0.00 #### Here write load impacted on primary due to add instance
[ 46s ] thds: 8 tps: 264.63 qps: 1587.77 (r/w/o: 0.00/1058.52/529.26) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00
[ 47s ] thds: 8 tps: 239.02 qps: 1434.11 (r/w/o: 0.00/956.08/478.04) lat (ms,95%): 47.47 err/s: 0.00 reconn/s: 0.00
[ 48s ] thds: 8 tps: 252.98 qps: 1515.87 (r/w/o: 0.00/1009.92/505.96) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00
[ 49s ] thds: 8 tps: 230.01 qps: 1382.08 (r/w/o: 0.00/922.05/460.03) lat (ms,95%): 52.89 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 8 tps: 227.00 qps: 1362.02 (r/w/o: 0.00/908.01/454.01) lat (ms,95%): 61.08 err/s: 0.00 reconn/s: 0.00
[ 51s ] thds: 8 tps: 249.99 qps: 1495.92 (r/w/o: 0.00/995.95/499.97) lat (ms,95%): 45.79 err/s: 0.00 reconn/s: 0.00
MySQL 127.0.0.1:30308 ssl JS > cluster.status()
{
"clusterName": "testCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "127.0.0.1:30308",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures.",
"topology": {
"127.0.0.1:30308": {
"address": "127.0.0.1:30308",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.33"
},
"127.0.0.1:30309": {
"address": "127.0.0.1:30309",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.33"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "127.0.0.1:30308"
#### Added one more node,
MySQL 127.0.0.1:30308 ssl JS > cluster.addInstance('icadmin@127.0.0.1:30310')
NOTE: The target instance '127.0.0.1:30310' has not been pre-provisioned (GTID set is empty). The Shell is unable to decide whether incremental state recovery can correctly provision it.
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of '127.0.0.1:30310' with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.
The incremental state recovery may be safely used if you are sure all updates ever executed in the cluster were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the cluster or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.
Please select a recovery method [C]lone/[I]ncremental recovery/[A]bort (default Clone):
Validating instance configuration at 127.0.0.1:30310...
This instance reports its own address as 127.0.0.1:30310
Instance configuration is suitable.
NOTE: Group Replication will communicate with other members using '127.0.0.1:30310'. Use the localAddress option to override.
* Checking connectivity and SSL configuration...
A new instance will be added to the InnoDB Cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.
Adding instance to the cluster...
Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.
NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.
* Waiting for clone to finish...
NOTE: 127.0.0.1:30310 is being cloned from 127.0.0.1:30308
** Stage DROP DATA: Completed
** Clone Transfer
FILE COPY ###################################################### 100% Compl PAGE COPY ###################################################### 100% Compl REDO COPY ###################################################### 100% Completed
NOTE: 127.0.0.1:30310 is shutting down...
* Waiting for server restart... ready
* 127.0.0.1:30310 has restarted, waiting for clone to finish...
** Stage RESTART: Completed
* Clone process has finished: 217.01 MB transferred in about 1 second (~217.01 MB/s)
Incremental state recovery is now in progress.
* Waiting for distributed recovery to finish...
NOTE: '127.0.0.1:30310' is being recovered from '127.0.0.1:30308'
* Distributed recovery has finished
The instance '127.0.0.1:30310' was successfully added to the cluster.
MySQL 127.0.0.1:30308 ssl JS >
[ 23s ] thds: 8 tps: 247.05 qps: 1482.28 (r/w/o: 0.00/988.19/494.09) lat (ms,95%): 46.63 err/s: 0.00 reconn/s: 0.00
[ 24s ] thds: 8 tps: 258.98 qps: 1553.89 (r/w/o: 0.00/1035.92/517.96) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 8 tps: 239.01 qps: 1434.08 (r/w/o: 0.00/956.05/478.03) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00
[ 26s ] thds: 8 tps: 232.01 qps: 1392.04 (r/w/o: 0.00/928.03/464.01) lat (ms,95%): 49.21 err/s: 0.00 reconn/s: 0.00
[ 27s ] thds: 8 tps: 82.96 qps: 497.78 (r/w/o: 0.00/331.85/165.93) lat (ms,95%): 484.44 err/s: 0.00 reconn/s: 0.00
[ 28s ] thds: 8 tps: 24.01 qps: 144.05 (r/w/o: 0.00/96.04/48.02) lat (ms,95%): 458.96 err/s: 0.00 reconn/s: 0.00 #### Here write load impacted on primary due to add instance
[ 29s ] thds: 8 tps: 191.00 qps: 1146.02 (r/w/o: 0.00/764.01/382.01) lat (ms,95%): 50.11 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 8 tps: 213.99 qps: 1283.93 (r/w/o: 0.00/855.96/427.98) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00
[ 31s ] thds: 8 tps: 219.01 qps: 1314.04 (r/w/o: 0.00/876.03/438.01) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00
[ 32s ] thds: 8 tps: 222.98 qps: 1337.91 (r/w/o: 0.00/891.94/445.97) lat (ms,95%): 49.21 err/s: 0.00 reconn/s: 0.00
[ 33s ] thds: 8 tps: 224.01 qps: 1344.08 (r/w/o: 0.00/896.05/448.03) lat (ms,95%): 50.11 err/s: 0.00 reconn/s: 0.00
[ 34s ] thds: 8 tps: 189.91 qps: 1139.44 (r/w/o: 0.00/759.62/379.81) lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 8 tps: 217.01 qps: 1302.08 (r/w/o: 0.00/868.05/434.03) lat (ms,95%): 51.94 err/s: 0.00 reconn/s: 0.00
MySQL 127.0.0.1:30308 ssl JS > cluster.status()
{
"clusterName": "testCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "127.0.0.1:30308",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"127.0.0.1:30308": {
"address": "127.0.0.1:30308",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.33"
},
"127.0.0.1:30309": {
"address": "127.0.0.1:30309",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.33"
},
"127.0.0.1:30310": {
"address": "127.0.0.1:30310",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"replicationLag": "applier_queue_applied",
"role": "HA",
"status": "ONLINE",
"version": "8.0.33"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "127.0.0.1:30308"
###########
removeInstance() also has some performance impact on Primary writes
MySQL 127.0.0.1:30308 ssl JS > cluster.removeInstance('icadmin@127.0.0.1:30310')
The instance will be removed from the InnoDB Cluster.
* Waiting for instance '127.0.0.1:30310' to synchronize with the primary...
** Transactions replicated ############################################### 100%
* Instance '127.0.0.1:30310' is attempting to leave the cluster...
The instance '127.0.0.1:30310' was successfully removed from the cluster.
[ 34s ] thds: 8 tps: 222.01 qps: 1332.05 (r/w/o: 0.00/888.04/444.02) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 8 tps: 213.98 qps: 1283.87 (r/w/o: 0.00/855.91/427.96) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00
[ 36s ] thds: 8 tps: 233.01 qps: 1398.05 (r/w/o: 0.00/932.03/466.02) lat (ms,95%): 47.47 err/s: 0.00 reconn/s: 0.00
[ 37s ] thds: 8 tps: 202.98 qps: 1217.88 (r/w/o: 0.00/811.92/405.96) lat (ms,95%): 54.83 err/s: 0.00 reconn/s: 0.00
[ 38s ] thds: 8 tps: 225.04 qps: 1347.24 (r/w/o: 0.00/897.16/450.08) lat (ms,95%): 51.02 err/s: 0.00 reconn/s: 0.00
[ 39s ] thds: 8 tps: 92.00 qps: 555.02 (r/w/o: 0.00/371.02/184.01) lat (ms,95%): 580.02 err/s: 0.00 reconn/s: 0.00 #### Here
[ 40s ] thds: 8 tps: 219.00 qps: 1313.98 (r/w/o: 0.00/875.98/437.99) lat (ms,95%): 46.63 err/s: 0.00 reconn/s: 0.00
[ 41s ] thds: 8 tps: 216.98 qps: 1301.90 (r/w/o: 0.00/867.93/433.97) lat (ms,95%): 63.32 err/s: 0.00 reconn/s: 0.00
[ 42s ] thds: 8 tps: 214.03 qps: 1284.21 (r/w/o: 0.00/856.14/428.07) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00
[ 43s ] thds: 8 tps: 221.95 qps: 1331.70 (r/w/o: 0.00/887.80/443.90) lat (ms,95%): 51.02 err/s: 0.00 reconn/s: 0.00
[ 44s ] thds: 8 tps: 222.01 qps: 1332.05 (r/w/o: 0.00/888.03/444.02) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00
Suggested fix:
Adding or removing a secondary instance in InnoDB cluster should not impact the Primary.