Bug #103421 | DDL in group replication cluster also if online cause full stop of write operati | ||
---|---|---|---|
Submitted: | 22 Apr 2021 8:22 | Modified: | 15 Jul 2021 8:46 |
Reporter: | Marco Tusa | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S2 (Serious) |
Version: | OS: | CentOS | |
Assigned to: | CPU Architecture: | Any |
[22 Apr 2021 8:22]
Marco Tusa
[7 May 2021 12:53]
MySQL Verification Team
Hi Marco, Thanks for the report and the reproduction process. I went through it all and I reproduced it enough to come to a conclusion that you are on to something. To me, at first look, this did not look like a bug. Second pass through it I'm not so sure anymore so I'm verifying this for GR team to check in-depth. all best Bogdan
[25 May 2021 17:23]
Marco Tusa
Hi Bogdan, any news from the replication team? I am quite surprised to find the described behaviour in the server, and to be honest I was expecting the replication team to be able to explain if this is on purpose (if so why) or not. As indicated, have customers to accept twice the time for an alter is something difficult to propose, especially when competitors do not have that. Also potentially suffer for absence of a primary for long time is, to be nice, "weird". Please let us know because this is impacting the solution credibility. thanks
[9 Jun 2021 16:42]
Nuno Carvalho
Hi Marco, Thank you for your bug report. You present the scenario: * 3 members on a Single-Primary Group Replication * create a table and populate with data enough that an ALTER TABLE take some time * Execute ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE; concurrently with INSERTs on the same table. When this scenario is run with default configuration, that is with `group_replication_consistency = EVENTUAL`, the observed behavior is: 1) the primary is capable of running the ONLINE DDL and DML concurrently. 2) the secondaries are not capable of running the ONLINE DDL and DML concurrently, but that does not have any impact on the primary, neither on primary elections. This behavior on the secondaries is common on all replication replicas, that is, on Asynchronous and Group Replication. https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-limitations.html ``` Long running online DDL operations can cause replication lag. An online DDL operation must finish running on the source before it is run on the replica. Also, DML that was processed concurrently on the source is only processed on the replica after the DDL operation on the replica is completed. ``` There is a feature request to extend the ONLINE DDLs to replicas BUG#73196: Allow ALTER TABLE to run concurrently on master and slave When we run the above scenario with `group_replication_consistency = AFTER` we are instructing the primary to wait until each transaction is applied on the secondaries before reply back to the client. Thence we observe the behavior: 1) on the primary, despite the DDL is ONLINE, once it reaches the commit stage, all transactions that will start after this point will wait for the DDL to complete, which is waiting for its apply on the secondaries. 2) the secondaries are not capable of running the ONLINE DDL and DML concurrently, which means that the primary will wait for this sequential apply on the secondaries. This means that a ONLINE DDL on a primary will be affected by the secondaries, more precisely: a) the ONLINE DDL will allow concurrent DMLs until it reaches the commit stage. b) once the ONLINE DDL reaches the commit stage, DMLs will wait until the ONLINE DDL is complete before proceeding. Since `group_replication_consistency = AFTER` does include the behavior of `BEFORE_ON_PRIMARY_FAILOVER`, if the primary changes, only after all the ongoing transactions are complete the primary will allow new transactions. https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-consistency-guarante... `group_replication_consistency = AFTER` is doing the needed work to ensure that no session reads outdated data on the secondaries. It is possible to get read consistency from the group, if you use `group_replication_consistency = BEFORE`. This will not block the primary. The primary will externalize the effects of the DDL (or any transaction for that matter) as soon as it is replicated successfully (but not applied). However, if you go and read from the secondary, you will still observe latency, as the DDL is only replicated once the primary finished its execution. https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-consistency-guarante... Summary ------- Long running online DDL operations can cause replication lag. An online DDL operation must finish running on the primary before it is run on the secondaries. Also, DML that was processed concurrently on the primary is only processed on the secondaries after the DDL operation on the secondaries is completed. https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-limitations.html and BUG#73196: Allow ALTER TABLE to run concurrently on master and slave `group_replication_consistency = AFTER` is doing the needed work to ensure that no session reads outdated data on the secondaries. `group_replication_consistency = EVENTUAL` does not prevent DML on primary during online DDL. Please let us know if everything is clarified. Best regards, Nuno Carvalho
[15 Jul 2021 8:46]
Nuno Carvalho
Duplicate of BUG#73196: Allow ALTER TABLE to run concurrently on master and slave