MySQL Bugs: #115022: Using mysqldump leads to inconsistent GTIDs between source and replica.

Bug #115022	Using mysqldump leads to inconsistent GTIDs between source and replica.
Submitted:	16 May 2024 2:50	Modified:	16 May 2024 9:53
Reporter:	Jinghua Lin	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: mysqldump Command-line Client	Severity:	S4 (Feature request)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
In general, during backups, it is possible to execute them on either the source or the replica. Therefore, I believe backup tools should refrain from executing statements that write to the binlog to avoid generating additional GTIDs on the replica.

However, I've noticed that mysqldump, when using certain parameters, executes "flush tables" instead of "flush local tables", resulting in the creation of a new GTID. If this backup is performed on the replica, it could lead to a discrepancy in GTIDs between the source and the replica. Only when using the --master-data parameter "flush local tables" will be executed. Relevant code:

static int do_flush_tables_read_lock(MYSQL *mysql_con) {
  /*
    We do first a FLUSH TABLES. If a long update is running, the FLUSH TABLES
    will wait but will not stall the whole mysqld, and when the long update is
    done the FLUSH TABLES WITH READ LOCK will start and succeed quickly. So,
    FLUSH TABLES is to lower the probability of a stage where both mysqldump
    and most client connections are stalled. Of course, if a second long
    update starts between the two FLUSHes, we have that bad stall.
  */
  return (mysql_query_with_error_report(
              mysql_con, nullptr,
              ((opt_master_data != 0) ? "FLUSH /*!40101 LOCAL */ TABLES"
                                      : "FLUSH TABLES")) ||
          mysql_query_with_error_report(mysql_con, nullptr,
                                        "FLUSH TABLES WITH READ LOCK"));
}

......

  if ((opt_lock_all_tables || opt_master_data ||
       (opt_single_transaction &&
        (flush_logs || server_with_gtids_and_opt_purge_not_off))) &&
      do_flush_tables_read_lock(mysql))
    goto err;

I don't quite understand why there would be two different scenarios; it seems like using "flush local tables" directly is the best choice.

How to repeat:
1. Performing backups on the replica, like: mysqldump --single-transaction --set-gtid-purged=on --all-databases > backup.sql

2. Executing "show master status" and comparing the GTID differences between the source and replica.

3. Using mysqlbinlog to parse the additional GTID reveals that a "flush tables" statement was executed.

Hi Mr. Lin,

Thank you for your report.

We consider your report a fine feature request for the environments with replication.

verified as a feature request for the versions 8.0 and higher.