Bug #101717 Poor performance on a GTID slave when log_slave_updates is disabled
Submitted: 23 Nov 2020 12:03 Modified: 11 Dec 2020 5:23
Reporter: Venkatesh Prasad Venugopal (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0, 8.0.22 OS:Any
Assigned to: CPU Architecture:Any
Tags: Clone, regression, rpl

[23 Nov 2020 12:03] Venkatesh Prasad Venugopal
Description:
This is a related to Bug#100118 and was discovered after fixing Bug#100118 in Percona Server 8.0.

Background
----------
Since the introduction of a dedicated thread for persisting GTIDs of InnoDB transactions by https://dev.mysql.com/worklog/task/?id=9211, new behavior for updating the mysql.gtid_executed table is as follows,

- If binary log is enabled, the mysql.gtid_executed table is updated on next binlog rotation (either by FLUSH LOGS or server restart).
- If binary log is disabled or log_slave_updates is disabled (for slave threads), then
    - If it is an InnoDB transaction, it leaves it to the gtid persister thread to update.
    - Otherwise, it writes its GTID into mysql.gtid_executed table during transaction commit.

Analysis
--------
For achieving the above mentioned behavior, the server makes use of several gtid sets to track and it performs several operations on them in order to update the the mysql.gtid_executed table.
Among those, the important GTID sets are

1. executed_gtids        - The set of GTIDs executed. (May include the gtids that are not in the table)
2. lost_gtids            - The set of GTIDs that existed in some previously purged binary log.
3. gtids_only_in_table   - The set of GTIDs that exists only in gtid_executed table, and not in any binary logs.
4. previous_gtids_logged - The previous GTIDs in the last binlog

Here, `previous_gtids_logged` gtid set was designed to store only the gtids from the last binlog. However, WL#9211 modified the design to store the GTIDs of InnoDB transactions (GTID persister thread) in `previous_gtids_logged` and eventually merge it with the non-InnodB in `Gtid_state::save_gtids_of_last_binlog_into_table()` (https://github.com/mysql/mysql-server/blob/mysql-8.0.22/sql/rpl_gtid_state.cc#L716).

The above logic works very well for both binlogless replica (slave) and replicas with log_slave_updates enabled but doesn't work very well on replicas with log_slave_updates disabled (binary logging is enabled).

What happens when binary logging is enabled and log_slave_updates is disabled on slave?

Before WL#9211:
1. Slave applier threads add both InnoDB and other transactions' GTIDs to the gtid_executed table during transaction commit(commit_owned_gtids()).
2. Slave applier threads add all transactions' GTIDs to the `executed_gtids`, `gtids_only_in_table` and `lost_gtids` gtid sets.
3. `previous_gtids_logged` gtid set was never updated and remained empty always.

After WL#9211:
1. Slave applier threads add InnoDB transactions' GTIDs to the flush list of the gtid persister thread and persister thread takes the responsibility to update the table.
2. Slave applier threads write other(non-InnoDB) GTIDs to the gtid_executed table during transaction commit (commit_owned_gtids()).
3. All transactions' GTIDs are added to `executed_gtids`, `gtids_only_in_table` and `lost_gtids` gtid sets.
4. However in this case, the gtid persister thread adds the InnoDB transaction' GTIDs to the `previous_gtids_logged` gtid set (https://github.com/mysql/mysql-server/blob/mysql-8.0.22/sql/rpl_gtid_state.cc#L590)

5. In other cases (when log_slave_updates is enabled), the `previous_gtids_logged` is merged with the GTIDs of the last binlog (https://github.com/mysql/mysql-server/blob/mysql-8.0.22/sql/rpl_gtid_state.cc#L716) thereby filling up the GTID intervals and resulting in a GTID set with less number of intervals. Eg: UUID:1-10001

6. But when log_slave_updates is disabled, the `previous_gtids_logged` gtid set is never merged and the server will end up storing only the GTIDs of InnoDB transactions'. 

7. When there is a mixed workload (both transactional and non-transactional DMLS), this causes the `previous_gtids_logged` to store only InnoDB GTIDs and causes it grow continuously, consuming more memory and CPU while performing set operations (https://github.com/mysql/mysql-server/blob/mysql-8.0.22/sql/rpl_gtid_state.cc#L587) because of the more number of GTID intervals.
Eg: UUID:1:3:5:7:9:11:13:15:17:19:21:23:25.............9995:9997:9999:10001.

Summarizing:
The below shall be the value stored in the previous_gtids_logged of a GTID slave in different scenarios (10000 transactions and assuming every alternate transaction on master is an InnoDB transaction).

previous_gtids_logged when binlog is disabled: "" (Empty)
previous_gtids_logged when binlog is enabled and log_slave_updates is enabled: UUID:1-10001 (will have only 1 interval)
previous_gtids_logged when binlog is enabled and log_slave_updates is disabled: UUID:1:3:5:7:9:11:13:15:17:19:21:23:25.............9995:9997:9999:10001 (Will have ~5000 intervals)

Considering the nature of the issue, it is significant that the time taken to perform GTID set operations keeps on increasing for when log-bin=1 and log-slave-updates=0 and this happens due to the design issue in the way the gtid persister thread updates the previous_gtid_logged when log_slave_updates=OFF. In addition to that, we also need to observe that this issue not only makes the persister threads slower, but also degrades slave's performance since all the worker threads will have to wait for the `global_sid_lock` while the persister thread is busy doing set operations in `update_prev_gtids()` function since the function takes write lock on the global_sid_lock, and thus results in delayed replication even with multiple worker threads(MTS).

Please note that this problem may not be seen directly on the latest MySQL (8.0.22) since the Bug#100118 is still unfixed and we still have an issue with the gtid persister thread getting stuck compressing the table resulting in the huge gtid gaps. And, this problem is more likely to affect once the underlying bug is fixed.

Even though there are less cases where the servers are configured to be running with log_slave_updates=OFF, but still this is a critical design issue and needs to fixed ASAP.

How to repeat:
1. Setup GTID based replication with log_slave_updates disabled on replica(slave) server.
2. On Slave, execute
slave> CHANGE REPLICATION FILTER REPLICATE_WILD_IGNORE_TABLE=('db2.%');

3. On Source(Master), execute
CREATE DATABASE db1;
CREATE DATABASE db2;
CREATE TABLE db1.t1(i int)engine=innodb;
CREATE TABLE db2.t1(i int)engine=innodb;

4. On Source(Master), execute

# db1.t1 is an Innodb transaction on slave and adds the GTID to the persister thread.
mysqlslap --create-schema=test --delimiter=";" --number-of-queries=10000000 --query="insert into db1.t1 values(rand() * 1000)" --concurrency=100 --silent 2>&1

# db2.t1 causes empty transactions on slave and adds the GTID directly to the table.
mysqlslap --create-schema=test --delimiter=";" --number-of-queries=10000000 --query="insert into db2.t1 values(rand() * 1000)" --concurrency=100 --silent 2>&1

5. Print the contents of `previous_gtids_logged` to replica's error log and check that its contents keep growing over time.
[23 Nov 2020 12:41] Venkatesh Prasad Venugopal
Here is a small comparision of how much time the server takes to update the `previous_gtid_logged` in `update_prev_gtids()` in different scenarios

1. Time taken by the `update_prev_gtids()` with log-bin=1 and log-slave-updates=1 (printing the average of every 1000 calls)
===BEGIN===
 [DEBUG] Average of last 1000 calls = 128 microseconds
 [DEBUG] Average of last 1000 calls = 108 microseconds
 [DEBUG] Average of last 1000 calls = 111 microseconds
 [DEBUG] Average of last 1000 calls = 108 microseconds
 [DEBUG] Average of last 1000 calls = 105 microseconds
 [DEBUG] Average of last 1000 calls = 104 microseconds
 [DEBUG] Average of last 1000 calls = 105 microseconds
 [DEBUG] Average of last 1000 calls = 106 microseconds
 [DEBUG] Average of last 1000 calls = 113 microseconds
 [DEBUG] Average of last 1000 calls = 108 microseconds
 [DEBUG] Average of last 1000 calls = 106 microseconds
 [DEBUG] Average of last 1000 calls = 104 microseconds
 [DEBUG] Average of last 1000 calls = 109 microseconds
 [DEBUG] Average of last 1000 calls = 112 microseconds
 [DEBUG] Average of last 1000 calls = 115 microseconds
 [DEBUG] Average of last 1000 calls = 112 microseconds
 [DEBUG] Average of last 1000 calls = 100 microseconds
 [DEBUG] Average of last 1000 calls = 93 microseconds
 [DEBUG] Average of last 1000 calls = 94 microseconds
 [DEBUG] Average of last 1000 calls = 98 microseconds
 [DEBUG] Average of last 1000 calls = 91 microseconds
 [DEBUG] Average of last 1000 calls = 95 microseconds
 [DEBUG] Average of last 1000 calls = 97 microseconds
 [DEBUG] Average of last 1000 calls = 97 microseconds
 [DEBUG] Average of last 1000 calls = 99 microseconds
 [DEBUG] Average of last 1000 calls = 100 microseconds
 [DEBUG] Average of last 1000 calls = 94 microseconds
 [DEBUG] Average of last 1000 calls = 102 microseconds
===END===

2. Time taken by the `update_prev_gtids()` with log-bin=1 and log-slave-updates=0 (printing the average of every 1000 calls)

===BEGIN===
 [DEBUG] Average of last 1000 calls = 205 microseconds
 [DEBUG] Average of last 1000 calls = 577 microseconds
 [DEBUG] Average of last 1000 calls = 917 microseconds
 [DEBUG] Average of last 1000 calls = 1250 microseconds
 [DEBUG] Average of last 1000 calls = 1650 microseconds
 [DEBUG] Average of last 1000 calls = 1984 microseconds
 [DEBUG] Average of last 1000 calls = 2377 microseconds
 [DEBUG] Average of last 1000 calls = 2843 microseconds
 [DEBUG] Average of last 1000 calls = 3218 microseconds
 [DEBUG] Average of last 1000 calls = 3813 microseconds
 [DEBUG] Average of last 1000 calls = 4179 microseconds
 [DEBUG] Average of last 1000 calls = 4746 microseconds
 [DEBUG] Average of last 1000 calls = 4938 microseconds
 [DEBUG] Average of last 1000 calls = 5528 microseconds
 [DEBUG] Average of last 1000 calls = 6339 microseconds
 [DEBUG] Average of last 1000 calls = 6446 microseconds
 [DEBUG] Average of last 1000 calls = 7013 microseconds
 [DEBUG] Average of last 1000 calls = 7489 microseconds
 [DEBUG] Average of last 1000 calls = 8246 microseconds
 [DEBUG] Average of last 1000 calls = 9053 microseconds
 [DEBUG] Average of last 1000 calls = 9830 microseconds
 [DEBUG] Average of last 1000 calls = 10464 microseconds
 [DEBUG] Average of last 1000 calls = 11233 microseconds
 [DEBUG] Average of last 1000 calls = 11800 microseconds
 [DEBUG] Average of last 1000 calls = 12671 microseconds
===END===
[23 Nov 2020 12:47] Venkatesh Prasad Venugopal
Error logs with debug prints of previous_gtids_logged in update_prev_gtids()

Attachment: Bug101717_slave_error_logs.txt (text/plain), 528.83 KiB.

[24 Nov 2020 6:50] MySQL Verification Team
Hello Venu,

Thank you for the report and feedback.
Could you please share exact cmake options used for the build(or I can use the same build from Bug #100118?), any patch etc required for step (5) in your how to repeat ? Thank you.

regards,
Umesh
[25 Nov 2020 7:00] Venkatesh Prasad Venugopal
Testcase to repeat the bug

Attachment: 0001-Bug-101717-Testcase-to-repeat-the-bug.patch (text/x-patch), 6.39 KiB.

[25 Nov 2020 7:23] Venkatesh Prasad Venugopal
Hi Umesh,

Yes, you can use the same build from Bug #100118.

Steps:
1. Apply the atached patch (0001-Bug-101717-Testcase-to-repeat-the-bug.patch) on the 8.0.22 code.
2. Build the server.

   I used cmake -DBUILD_CONFIG=mysql_release -DWITH_BOOST=$HOME/utilities/boost/ -DCMAKE_BUILD_TYPE=debug -DWITH_UNIT_TESTS=OFF -DWITH_ROUTER=OFF ../

3. Run the `rpl_gtid_bug_101717.test` MTR testcase.
4. Check slave's error logs (var/log/mysqld.2.err) for the debug prints.
5. Execute
       grep "Average of" var/log/mysqld.2.err
   for the details of the average time taken by the `update_prev_gtids()` function when the test progressed.

Regards,
Venkatesh
[25 Nov 2020 7:24] MySQL Verification Team
Thank you Venu for the requested details, I'll get back to you if anything further needed. Thank you!
[25 Nov 2020 9:35] MySQL Verification Team
Thank you Venu, on a medium-large test box I'm seeing this:

-
grep "Average of" var/log/mysqld.2.err
[DEBUG] Average of last 100 calls = 78 microseconds
[DEBUG] Average of last 100 calls = 139 microseconds
[DEBUG] Average of last 100 calls = 188 microseconds
[DEBUG] Average of last 100 calls = 250 microseconds
[DEBUG] Average of last 100 calls = 317 microseconds
[DEBUG] Average of last 100 calls = 374 microseconds
[DEBUG] Average of last 100 calls = 418 microseconds
[DEBUG] Average of last 100 calls = 491 microseconds
[DEBUG] Average of last 100 calls = 530 microseconds
[DEBUG] Average of last 100 calls = 629 microseconds
[DEBUG] Average of last 100 calls = 974 microseconds
[DEBUG] Average of last 100 calls = 736 microseconds
[DEBUG] Average of last 100 calls = 823 microseconds
[DEBUG] Average of last 100 calls = 884 microseconds
[DEBUG] Average of last 100 calls = 876 microseconds
[DEBUG] Average of last 100 calls = 993 microseconds
[DEBUG] Average of last 100 calls = 1076 microseconds
[DEBUG] Average of last 100 calls = 1089 microseconds
[DEBUG] Average of last 100 calls = 1150 microseconds
[DEBUG] Average of last 100 calls = 1218 microseconds
[DEBUG] Average of last 100 calls = 1301 microseconds
[DEBUG] Average of last 100 calls = 1373 microseconds
[DEBUG] Average of last 100 calls = 1422 microseconds
[DEBUG] Average of last 100 calls = 1660 microseconds
[DEBUG] Average of last 100 calls = 1576 microseconds
[DEBUG] Average of last 100 calls = 1628 microseconds
[DEBUG] Average of last 100 calls = 1622 microseconds
[DEBUG] Average of last 100 calls = 1759 microseconds
[DEBUG] Average of last 100 calls = 1840 microseconds
[DEBUG] Average of last 100 calls = 1868 microseconds
[DEBUG] Average of last 100 calls = 2603 microseconds
[DEBUG] Average of last 100 calls = 2004 microseconds
[DEBUG] Average of last 100 calls = 2025 microseconds
[DEBUG] Average of last 100 calls = 2051 microseconds
[DEBUG] Average of last 100 calls = 2088 microseconds
[DEBUG] Average of last 100 calls = 2240 microseconds
[DEBUG] Average of last 100 calls = 2262 microseconds
[DEBUG] Average of last 100 calls = 2364 microseconds
[DEBUG] Average of last 100 calls = 2354 microseconds
[DEBUG] Average of last 100 calls = 2526 microseconds
[DEBUG] Average of last 100 calls = 2475 microseconds
[DEBUG] Average of last 100 calls = 2604 microseconds
[DEBUG] Average of last 100 calls = 2682 microseconds
[DEBUG] Average of last 100 calls = 3238 microseconds
[DEBUG] Average of last 100 calls = 2792 microseconds
[DEBUG] Average of last 100 calls = 2789 microseconds
[DEBUG] Average of last 100 calls = 2848 microseconds
[DEBUG] Average of last 100 calls = 2903 microseconds
[DEBUG] Average of last 100 calls = 2964 microseconds

regards,
Umesh
[8 Dec 2020 10:35] Yakir Gibraltar
I'm able to reproduce the issue also with log_slave_updates and log_bin disabled:
```
abc625:(none)> select @@log_slave_updates, @@log_bin;
+---------------------+-----------+
| @@log_slave_updates | @@log_bin |
+---------------------+-----------+
|                   0 |         0 |
+---------------------+-----------+
1 row in set (0.00 sec)
abc625:(none)>  select format(count(0),0) from mysql.gtid_executed;
+--------------------+
| format(count(0),0) |
+--------------------+
| 31,335,250         |
+--------------------+
1 row in set (5.55 sec)
```
And restart to MySQL failed:
[ERROR] [MY-011975] [InnoDB] Wait for GTID thread to start timed out

Thank you, Yakir.
[10 Dec 2020 18:34] Pedro Figueiredo
Posted by developer:
 
Hi!

This bug is a duplicate of bug #101717 / #31599938

Tried to reproduce the issue against 8.0.23 and the result of the script to reproduce was:

[DEBUG] Average of last 100 calls = 25 microseconds
[DEBUG] Average of last 100 calls = 28 microseconds

Thank you for your time.

Regards,
[11 Dec 2020 5:23] Venkatesh Prasad Venugopal
Hi Pedro,

Do you mean that the issue is fixed in 8.0.23? Was it fixed as part of another bug?

Thank you,
[15 Dec 2020 12:53] Pedro Figueiredo
Posted by developer:
 
Hi!

Mixed up the bug numbers, the above issue was fixed as part of bug #100118

Thank you for your time.

Regards,