Bug #104096 set gtid_next may block query
Submitted: 24 Jun 2021 0:44 Modified: 2 Nov 2021 20:55
Reporter: phoenix Zhang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.25, 8.0.26 OS:Any
Assigned to: CPU Architecture:Any

[24 Jun 2021 0:44] phoenix Zhang
Description:
In group_replication cluster, if group_name is A, the next gtid_no is xx, and set gtid_next='A:xx' in one session, but other session can still take over the gtid_no.

If consistcy is BEFORE, after gtid_no=xx written, new query will blocked

How to repeat:
1. first use mtr  to start 2 servers
./mysql-test/mtr gr_abort_query_on_plugin_error --start

2. connect 13000, and start mgr
mysql> set GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> set persist group_replication_group_name='aabbccdd-aabb-aabb-aabb-aabbccddeeff',persist group_replication_group_seeds='127.0.0.1:33061,127.0.0.1:33062',persist group_replication_local_address='127.0.0.1:33061';set global group_replication_consistency='eventual';CHANGE MASTER TO MASTER_USER="root" FOR CHANNEL "group_replication_recovery";start group_replication;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 3 warnings (0.02 sec)

Query OK, 0 rows affected (2.24 sec)

mysql> set GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

3. connect 13002, and start mgr
mysql> set persist group_replication_group_name='aabbccdd-aabb-aabb-aabb-aabbccddeeff',persist group_replication_group_seeds='127.0.0.1:33061,127.0.0.1:33062',persist group_replication_local_address='127.0.0.1:33062';set global group_replication_consistency='eventual';CHANGE MASTER TO MASTER_USER="root" FOR CHANNEL "group_replication_recovery";start group_replication;
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 3 warnings (0.03 sec)

Query OK, 0 rows affected (2.44 sec)

mysql> select * from performance_schema.replication_group_members;                                                                                                                                          +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 259fd4ab-d483-11eb-a7a6-c8f7507e5048 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.25         |
| group_replication_applier | 25a182f4-d483-11eb-a7be-c8f7507e5048 | 127.0.0.1   |       13002 | ONLINE       | PRIMARY     | 8.0.25         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.03 sec)

4. connect 13000, init and set gtid_next
mysql> create table t1 (c1 int primary key, c2 int) engine=innodb;                                                                                                                                          Query OK, 0 rows affected (0.05 sec)

mysql> insert into t1 values (1,1);                                                                                                                                                                         Query OK, 1 row affected (0.02 sec)

mysql> show master status;                                                                                                                                                                                  +--------------------------+----------+--------------+------------------+------------------------------------------+
| File                     | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+--------------------------+----------+--------------+------------------+------------------------------------------+
| server-binary-log.000001 |     1375 |              |                  | aabbccdd-aabb-aabb-aabb-aabbccddeeff:1-4 |
+--------------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

mysql> set session gtid_next='aabbccdd-aabb-aabb-aabb-aabbccddeeff:5';
Query OK, 0 rows affected (0.00 sec)

5. use another terminal, connect 13000, do insert
mysql> show master status;                                                                                                                                                                                  +--------------------------+----------+--------------+------------------+------------------------------------------+
| File                     | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+--------------------------+----------+--------------+------------------+------------------------------------------+
| server-binary-log.000001 |     1375 |              |                  | aabbccdd-aabb-aabb-aabb-aabbccddeeff:1-4 |
+--------------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into t1 values (2,2);
Query OK, 1 row affected (0.02 sec)

mysql> show master status;
+--------------------------+----------+--------------+------------------+------------------------------------------+
| File                     | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+--------------------------+----------+--------------+------------------+------------------------------------------+
| server-binary-log.000001 |     1665 |              |                  | aabbccdd-aabb-aabb-aabb-aabbccddeeff:1-5 |
+--------------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

6. and then do query, it will not return
mysql> set session group_replication_consistency='BEFORE';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1;

7. open other connect to 13000
mysql> show processlist;
+----+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+---------+-----------+---------------+
| Id | User            | Host            | db   | Command | Time | State                                                  | Info                             | Time_ms | Rows_sent | Rows_examined |
+----+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+---------+-----------+---------------+
|  5 | event_scheduler | localhost       | NULL | Daemon  |  706 | Waiting on empty queue                                 | NULL                             |  706067 |         0 |             0 |
| 21 | system user     |                 | NULL | Connect |  573 | waiting for handler commit                             | Group replication applier module |  572692 |         0 |             0 |
| 24 | system user     |                 | NULL | Query   |  573 | Slave has read all relay log; waiting for more updates | NULL                             |  547320 |         0 |             0 |
| 32 | root            | localhost:44478 | test | Sleep   |  512 |                                                        | NULL                             |  512441 |         0 |             0 |
| 33 | root            | localhost:44488 | test | Query   |  481 | Executing hook on transaction begin.                   | select * from t1                 |  480745 |         0 |             0 |
| 34 | root            | localhost:44502 | test | Query   |    0 | init                                                   | show processlist                 |       0 |         0 |             0 |
+----+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+---------+-----------+---------------+
6 rows in set (0.01 sec)
[28 Jun 2021 7:04] MySQL Verification Team
Hello phoenix Zhang!

Thank you for the report and feedback.

regards,
Umesh
[15 Sep 2021 9:23] MySQL Verification Team
Hello phoenix,

I tried to repro on 8.0.26 build(current GA) but observed that issue is no longer seen in 8.0.26. Could you please check and confirm? Thank you.
I'll be joining the activity log shortly for your reference.

regards,
Umesh
[15 Sep 2021 9:24] MySQL Verification Team
MySQL 8.0.26 test results

Attachment: 104096_8.0.26.results (application/octet-stream, text), 10.63 KiB.

[19 Sep 2021 6:51] phoenix Zhang
it repeat in mysql 8.0.26 when i test

Attachment: test.result (application/octet-stream, text), 7.79 KiB.

[20 Sep 2021 4:29] MySQL Verification Team
Thank you for the details.

regards,
Umesh
[22 Sep 2021 12:23] MySQL Verification Team
MySQL 8.0.26 test results

Attachment: 104096_8.0.26_22SEP2021.results (application/octet-stream, text), 31.92 KiB.

[2 Nov 2021 20:55] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.28:

In Group Replication, if a SET gtid_next statement is used on a group member to set the GTID for the next transaction, it is possible for the same GTID to be used for a transaction that starts concurrently on another member. If both transactions reach the commit stage, the second one in the total order is rolled back, resolving the situation. However, when the transaction consistency level for Group Replication (the group_replication_consistency system variable) was set to BEFORE or BEFORE_AND_AFTER, members could reach a deadlock with one holding ownership of a GTID in the gtid_owned set, and another waiting for ownership to be released before committing the transaction. The wait function now only considers the GTIDs for committed transactions and not the GTIDs that are owned but not committed, except where a session owns a GTID that is concurrently committed, in which case the executing session errors out.