Bug #106096 master's command of "reset master" perhaps makes slave/dr gtid gap
Submitted: 7 Jan 2022 9:08 Modified: 7 Feb 2022 11:04
Reporter: YINZHOU WU Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S3 (Non-critical)
Version:5.7.23 OS:CentOS (CentOS Linux release 7.6.1810)
Assigned to: CPU Architecture:Any

[7 Jan 2022 9:08] YINZHOU WU
Description:
we found that our new-created cluster may have a gtid-gap in slave/dr with little probability
e.g. 9cd48d9c-6e29-11ec-92d8-98039ba567ea:1-402:404-5103, lack of gtid 403
and the case can be repeated successfully.

How to repeat:
step 1. 
deploy a python scripts including following context:

heartbeat_sql="replace into repl_heartbeat values (@@hostname,now(3));"
while 1:
    cursor.execute(heartbeat_sql)

and start running the scripts.(tips:don't add any code like "time.sleep(1)" in while, make sure that the sql is executed as fastly as possible)

step 2.
log in the master instance of cluster,and execute "reset master",and stop the scripts in step 1 ,then show master status.
If everything goes well, we can find that master's gtid will be like "744c6c9a-766d-11eb-9807-fa163e6a649a:1-31:135"
and we can't find gtid 135 in mysql binlog,it seems like we create a ghost gtid.

step 3.
log in the slave instance to complete the creating of cluster: stop slave;reset master;change master to xxx;start slave;show slave status;

and we can find  Executed_Gtid_Set: 744c6c9a-766d-11eb-9807-fa163e6a649a:1-31 without gtid 135, because slave cannot find gtid 135 in master's binlog.

step 4.
write anything to make master's gtid increase ,until the gtid comes "744c6c9a-766d-11eb-9807-fa163e6a649a:1-140", at this moment, master's gtid-gap disappears.
But in slave/dr, "show slave status" will be like Executed_Gtid_Set: 744c6c9a-766d-11eb-9807-fa163e6a649a:1-134:136-140, the gtid-gap appears.

question: when the actions of "write heartbeat" and "reset master" are executed at the same time, it seems like a crash happened and a ghost gtid was created , which will make a slave/dr's gtid-gap .we think that it is a bug of command "reset master"

Suggested fix:
we plan to add a "flush tables with read lock" before "reset master", to make sure that any insert/replace action(like heartbeat) will not be executed at the same time(haven't experimented, perhaps works)
but  the introduction of "reset master" says that "reset master" including a global lock:
This variable can be read and modified in four places:
    - During server startup, holding global_sid_lock.wrlock;
    - By a client thread holding global_sid_lock.wrlock (doing a RESET MASTER);
    - By a client thread calling MYSQL_BIN_LOG::write_gtid function (often the
      group commit FLUSH stage leader).

so it's confusing
[7 Jan 2022 11:04] MySQL Verification Team
Thank you for the bug report. Please when reporting bugs check with current release you are reporting a quite older version 5.7.23 current version is 5.7.36 older versions are handled as unsupported, also provide the complete scripts used in the how to repeat instructions not just partial instructions. Thanks in advance.
[8 Feb 2022 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".