| Bug #88891 | Filtered replication leaves GTID holes with create database if not exists | ||
|---|---|---|---|
| Submitted: | 13 Dec 2017 6:50 | Modified: | 26 Apr 2018 14:38 |
| Reporter: | Simon Mudd (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.7.19/20 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | filtering, GTID, holes, replication | ||
[22 Dec 2017 10:45]
MySQL Verification Team
Hi Simon, I can verify that this is how it behaves, I can't say for sure if it's a bug or not. It behaves the way I expect it to behave but maybe my mind is corrupted that way. Also, in order to preserve gtid's if you filter some out there has to be a hole don't you agree? Filtering server could rearrange them but then the gtids would change between boxes and I'm sure there are cases where that would be a problem. Anyhow I'm setting this to "verified" and will let GR engineers decide :) thanks for the report Bogdan
[24 Dec 2017 8:48]
Simon Mudd
I must admit that I would first have expected, if using filtered replication, to get a large number of gaps representing the transactions which were NOT processed (the filtered ones). The problem with this is that for many filter type operations that will leave a GTID_EXECUTED set with a huge size, a large number of holes and I'd expect that if I ran such a setup for months or years MySQL would eventually break if it couldn't actually store the huge GTID_EXECUTED set in memory. Checking for each transaction prior to executing it would likely be very slow too. This behaviour is not explicitly documented. (or not as clearly as I'd expect). So I'm assuming that to avoid these troubles the behaviour has been modified to replace the transaction upstream of the filter with an empty transaction downstream of the filter. That is the GTID_EXECUTED remains the _same_ even though the actual transaction has not been applied. What's been applied is a _filtered_ (empty) transaction. This behaviour matches what is currently seen. * the rewrite rules I show earlier correctly filter the statements for the specific tables I wanted to keep data for. That worked fine. The upstream server had more databases and more tables and all transactions relating to those other tables seemed to be filtered out, though clearly I saw that the GTID_EXECUTED value on the downstream slaves was unchanged from upstream. This behaviour works fine. * it also works fine if you do 'CREATE TABLE IF NOT EXISTS for a table that's in one of the filtered databases but not the one of the requested tables to be passed through. * what does not work is the CREATE DATABASE IF NOT EXISTS of the database that's being filtered. This statement is _ignored_ and the GTID value of the statement is not added with an empty event to the downstream server. Feel free to contact the replication developers on this. I'd be delighted to understand their thinking but I think in this case it's a statement type that they've missed. I do have, if I make a bit of effort, ways to talk to the devs directly but it struck me for something like this that the bug report was the correct way to report the issue I saw. Personally I think the filtering with empty transactions should possibly have a couple of extra things attached: * a counter of how many of this type of event are actually filtered as "empty events" * maybe this behaviour should be optional, if it is it should be good to explain the limitations that this will provoke due to the GTID implementation in MySQL * more documentation on the filtering and the limitations would be good. Clearly this is a complex topic and the filtering was designed long before MySQL had GTID so many of the concepts make sense outside of the GTID environment and are harder to apply when GTID is enabled. * I suspect that the issues described here are due to lack of "in the field" experience. With any complex bit of software like MySQL sometimes it gets used (often for quite valid reasons) in ways the original developers never anticipated. I think it's hard if you're a developer and don't get "in field experience" to understand some of these edge cases, yet for those of us out here we have problems to solve and use the tooling provided, or ask for such tooling. This seems to be something that goes along those lines. So any feedback on this would be good.
[25 Dec 2017 10:22]
MySQL Verification Team
Hi Simon, Of course you can :D talk directly to the devs :) now as for the bug reported, as I already verified it the devs are informed and will decide if/how they will proceed. We also know how to contact you directly if need be so that's covered too. take care, happy holidays Bogdan
[26 Apr 2018 14:38]
Margaret Fisher
Posted by developer: Thanks for the report. Changelog entry added for MySQL 8.0.12, 5.7.23, and 5.6.41: When GTIDs are in use for replication, replicated transactions that are filtered out on the slave are persisted. If binary logging is enabled on the slave, the filtered-out transaction is written to the binary log as a Gtid_log_event followed by an empty transaction containing only BEGIN and COMMIT statements. If binary logging is disabled, the GTID of the filtered-out transaction is written to the mysql.gtid_executed table. This process ensures that there are no gaps in the set of executed GTIDs, and that the filtered-out transactions are not retrieved again if the slave reconnects to the master. Previously, this process was not done for CREATE DATABASE, ALTER DATABASE, and DROP DATABASE statements, but it is now carried out for those statements as well as for others. - The GTIDs documentation for MySQL 8.0 was recently updated to cover this area better. Please see https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html and the first three subtopics.
[31 May 2018 7:01]
MySQL Verification Team
Bug #91086 marked as duplicate of this one.

Description: 1. Filtered replication leaves GTID holes when filtering create database if not exists statements. When setting up the replication filtering I noticed GTID gaps were being produced due to a statement 'CREATE DATABASE IF NOT EXISTS db3' [no filtering of this database was configured.] A similar CREATE TABLE IF NOT EXISTS table, in the same database does not produce GTID gaps and it looks like the replication code deliberately adds an "empty event" to ensure there are no in the binlogs generated by server2. This functionality seems to be missing for the statement CREATE DATABASE IF NOT EXISTS db3. 2. Background/Context: Splitting a database chain requires me to move selected tables in one schema/db to another one. Below the filtered box, server2, the servers have a normal replication setup and only see the tables in db2. The intention: * to replicate ONLY the following tables in db1 to db2: - table_a, table_b, table_d, table_e, table_f, table_g, table_h, table_i * to maintain replication of a heartbeat table db3.table_c through the filtering server, server2. How to repeat: 3. Servers: master: server1: 5.7.19-log - no replication filters/config intermediate_master: server2: 5.7.20-log - filters as shown, replicates from server1 master2+slaves: replicate from server2 (and only see the filtered tables in the new db2 + the db3 heartbeat table). Not shown here. 4. Observed behaviour binlog on server1 shows up as: (output from mysqlbinlog binlog.00nnnn) and this event (and GTID value) is NOT seen on server2. # at 96762167 #171211 13:08:15 server id 198087021 end_log_pos 96762232 CRC32 0xa970c2b0 GTID last_committed=196349 sequence_number=196351 rbr_only=no SET @@SESSION.GTID_NEXT= '9f58c169-d121-11e7-835b-ac162db9c048:134313416'/*!*/; # at 96762232 #171211 13:08:15 server id 198087021 end_log_pos 96762364 CRC32 0x85d2c0d6 Query thread_id=4376477 exec_time=0 error_code=0 SET TIMESTAMP=1512994095/*!*/; create database if not exists db3 /*!*/; Logic being replicated was: --------------------------- --- snip --- create database if not exists db3; <=== ERROR: GTID gap seen use db3; create table if not exists tableX ( <=== OK: no GTID gap seen ...table content unimportant... ); truncate tableX; <=== OK: no GTID gap seen --- snip --- SHOW SLAVE STATUS on server2 shows: ----------------------------------- root@server2 [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: server1 Master_User: uuuuuuuuuu Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000767 Read_Master_Log_Pos: 96520395 Relay_Log_File: relaylog.002247 Relay_Log_Pos: 96520562 Relay_Master_Log_File: binlog.000767 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db2.table_a,db2.table_b,db3.table_c,db2.table_d,db2.table_e,db2.table_f,db2.table_g,db2.table_h,db2.table_i Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 96520395 Relay_Log_Space: 96520846 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 198087021 Master_UUID: 9f58c169-d121-11e7-835b-ac162db9c048 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 9f58c169-d121-11e7-835b-ac162db9c048:15438311-150635915 Executed_Gtid_Set: 2c5adab4-d64a-11e5-82df-ac162d72dac0:1-247743812, 9f58c169-d121-11e7-835b-ac162db9c048:1-56060985:56060987-56061175:56061177-56061224:56061226-75201528:75201530-75201755:75201757-75201983:75201985-75407550:75407552-75407604:75407606-75407661:75407663-87889848:87889850-87889935:87889937-87890042:87890044-88391955:88391957-88392125:88392127-88392245:88392247-88755771:88755773-88755826:88755828-88755921:88755923-100279047:100279049-100279126:100279128-100279247:100279249-121672430:121672432-121672503:121672505-121672524:121672526-122946019:122946021-122946291:122946293-122946469:122946471-134313284:134313286-134313415:134313417-134313648:134313650-136492728:136492730-136492784:136492786-136492904:136492906-145582402:145582404-145582439:145582441-145582463:145582465-147455222:147455224-147455262:147455264-147455277:147455279-149319049:149319051-149319261:149319263-150635915, a6d83ff6-bfcf-11e7-8c93-246e96158550:1-126618302 Auto_Position: 1 Replicate_Rewrite_DB: (db1,db2) Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) root@server2 [(none)]> Notice the GTID gaps which are being produced. These do NOT come from the fact I'm using parallel replication and some of the transactions are being executed out of order. The logic being replicated runs periodically every few hours and leaves these gaps. 5. Configuration settings ------------------------- Full my.cnf can be provided but probably the relevant settings are: server2: [mysqld] binlog_format = ROW binlog_row_image = minimal sync_binlog = 0 expire_logs_days = 5 log_slave_updates master_info_repository = TABLE relay_log_info_repository = TABLE relay_log_recovery = 1 gtid_mode = ON enforce_gtid_consistency = 1 binlog_group_commit_sync_delay = 0 slave_parallel_workers = 10 slave_parallel_type = LOGICAL_CLOCK replicate-rewrite-db = db1->db2 replicate-wild-do-table = db2.table_a replicate-wild-do-table = db2.table_b replicate-wild-do-table = db3.table_c replicate-wild-do-table = db2.table_d replicate-wild-do-table = db2.table_e replicate-wild-do-table = db2.table_f replicate-wild-do-table = db2.table_g replicate-wild-do-table = db2.table_h replicate-wild-do-table = db2.table_i Suggested fix: Executed_Gtid_Set should have NO gaps. GTID events will be created but the statement will not be passed through the filters. This is consistent with the other statements shown. I'm not sure that the documentation clearly specifies what happens with replication filtering with GTID when things are thrown away. It should be reported that the GTID set seen on the downstream boxes may be identical to the upstream boxes even though the events have NOT been replicated and thus the state of the downstream server after filtering will be different. I assume this behaviour has been setup this way to avoid gaps which would cause issues after filtering has been applied and as that's the case it would be good to indicate that this happens very clearly as this is especially important for GTID replication when filtering is being applied.