Description:
In general, during backups, it is possible to execute them on either the source or the replica. Therefore, I believe backup tools should refrain from executing statements that write to the binlog to avoid generating additional GTIDs on the replica.
However, I've noticed that mysqldump, when using certain parameters, executes "flush tables" instead of "flush local tables", resulting in the creation of a new GTID. If this backup is performed on the replica, it could lead to a discrepancy in GTIDs between the source and the replica. Only when using the --master-data parameter "flush local tables" will be executed. Relevant code:
static int do_flush_tables_read_lock(MYSQL *mysql_con) {
/*
We do first a FLUSH TABLES. If a long update is running, the FLUSH TABLES
will wait but will not stall the whole mysqld, and when the long update is
done the FLUSH TABLES WITH READ LOCK will start and succeed quickly. So,
FLUSH TABLES is to lower the probability of a stage where both mysqldump
and most client connections are stalled. Of course, if a second long
update starts between the two FLUSHes, we have that bad stall.
*/
return (mysql_query_with_error_report(
mysql_con, nullptr,
((opt_master_data != 0) ? "FLUSH /*!40101 LOCAL */ TABLES"
: "FLUSH TABLES")) ||
mysql_query_with_error_report(mysql_con, nullptr,
"FLUSH TABLES WITH READ LOCK"));
}
......
if ((opt_lock_all_tables || opt_master_data ||
(opt_single_transaction &&
(flush_logs || server_with_gtids_and_opt_purge_not_off))) &&
do_flush_tables_read_lock(mysql))
goto err;
I don't quite understand why there would be two different scenarios; it seems like using "flush local tables" directly is the best choice.
How to repeat:
1. Performing backups on the replica, like: mysqldump --single-transaction --set-gtid-purged=on --all-databases > backup.sql
2. Executing "show master status" and comparing the GTID differences between the source and replica.
3. Using mysqlbinlog to parse the additional GTID reveals that a "flush tables" statement was executed.