| Bug #115022 | Using mysqldump leads to inconsistent GTIDs between source and replica. | ||
|---|---|---|---|
| Submitted: | 16 May 2024 2:50 | Modified: | 7 May 6:27 |
| Reporter: | Jinghua Lin | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: mysqldump Command-line Client | Severity: | S4 (Feature request) |
| Version: | 8.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[16 May 2024 9:53]
MySQL Verification Team
Hi Mr. Lin, Thank you for your report. We consider your report a fine feature request for the environments with replication. verified as a feature request for the versions 8.0 and higher.
[7 May 6:27]
Jinghua Lin
This issue was fixed in 8.4.1 Executing mysqldump on a replica would insert the FLUSH TABLES operation, an operation that writes to the binary log. Now FLUSH LOCAL TABLES is inserted instead to prevent GTID related issues during replication due to these binary log changes. The workaround was to set the --source-data option to 1 or 2. (Bug #35665076) References: This issue is a regression of: Bug #33630199.

Description: In general, during backups, it is possible to execute them on either the source or the replica. Therefore, I believe backup tools should refrain from executing statements that write to the binlog to avoid generating additional GTIDs on the replica. However, I've noticed that mysqldump, when using certain parameters, executes "flush tables" instead of "flush local tables", resulting in the creation of a new GTID. If this backup is performed on the replica, it could lead to a discrepancy in GTIDs between the source and the replica. Only when using the --master-data parameter "flush local tables" will be executed. Relevant code: static int do_flush_tables_read_lock(MYSQL *mysql_con) { /* We do first a FLUSH TABLES. If a long update is running, the FLUSH TABLES will wait but will not stall the whole mysqld, and when the long update is done the FLUSH TABLES WITH READ LOCK will start and succeed quickly. So, FLUSH TABLES is to lower the probability of a stage where both mysqldump and most client connections are stalled. Of course, if a second long update starts between the two FLUSHes, we have that bad stall. */ return (mysql_query_with_error_report( mysql_con, nullptr, ((opt_master_data != 0) ? "FLUSH /*!40101 LOCAL */ TABLES" : "FLUSH TABLES")) || mysql_query_with_error_report(mysql_con, nullptr, "FLUSH TABLES WITH READ LOCK")); } ...... if ((opt_lock_all_tables || opt_master_data || (opt_single_transaction && (flush_logs || server_with_gtids_and_opt_purge_not_off))) && do_flush_tables_read_lock(mysql)) goto err; I don't quite understand why there would be two different scenarios; it seems like using "flush local tables" directly is the best choice. How to repeat: 1. Performing backups on the replica, like: mysqldump --single-transaction --set-gtid-purged=on --all-databases > backup.sql 2. Executing "show master status" and comparing the GTID differences between the source and replica. 3. Using mysqlbinlog to parse the additional GTID reveals that a "flush tables" statement was executed.