Bug #104546 MySQL 8.0.26 single thread replication performance is slower than MySQL 5.7.35.
Submitted: 5 Aug 9:50 Modified: 27 Aug 7:25
Reporter: Takanori Sejima Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S5 (Performance)
Version:8.0.26 OS:Ubuntu (18.04.5 LTS)
Assigned to: CPU Architecture:x86 (Skylake-SP)
Tags: innodb, replication

[5 Aug 9:50] Takanori Sejima
Description:
The replication apply performance of MySQL 8.0.26 is slower than MySQL 5.7.35 when using sysbench update_non_index test on master(MySQL5.7).

I built a cluster with source MySQL 5.7 and two replicas(5.7.35 and 8.0.26).

I used the following binaries for replicas. I download from https://dev.mysql.com/downloads/mysql/ .
mysql-community-server_5.7.35-1ubuntu18.04_amd64.deb
mysql-community-server-core_8.0.26-1ubuntu18.04_amd64.deb

server configuration:
OS: Ubuntu 18.04.5 LTS
kernel: 5.4
CPU: 16Cores-HT Xeon Silver 4109T(Skylake-SP)
RAM: 96GB
SSD: Intel SATA SSD DC S4600 960GB
filesystem: ext4

I executed sysbench oltp_update_non_index in the following cases:
5.7.35:
- master_info_repository = TABLE & relay_log_info_repository = TABLE
- master_info_repository = FILE & relay_log_info_repository = FILE
8.0.26:
- master_info_repository = TABLE & relay_log_info_repository = TABLE & innodb_log_writer_threads = ON
- master_info_repository = TABLE & relay_log_info_repository = TABLE & innodb_log_writer_threads = OFF
- master_info_repository = FILE & relay_log_info_repository = FILE & innodb_log_writer_threads = ON
- master_info_repository = FILE & relay_log_info_repository = FILE & innodb_log_writer_threads = OFF

The summary table, the number are UPDATE query/sec (the more the better)
+------------------------------------------------------+--------------+--------------+
| case                                                 | MySQL 5.7.35 | MySQL 8.0.26 |
+------------------------------------------------------+--------------+--------------+
| master_info_repository=TABLE, log_writer_threads=ON  | 14.74 k qps  | 9.82 k       |
+------------------------------------------------------+--------------+--------------+
| master_info_repository=TABLE, log_writer_threads=OFF | NaN          | 9.05 k       |
+------------------------------------------------------+--------------+--------------+
| master_info_repository=TABLE, log_writer_threads=ON  | 12.04 k      | 6.89 k       | |+-----------------------------------------------------+--------------+--------------+
| master_info_repository=TABLE, log_writer_threads=OFF | NaN          | 6.56 k       |
+------------------------------------------------------+--------------+--------------+

Summary:
- In this case, innodb_log_writer_threads = ON had better performance.
- When master_info_repository = TABLE & relay_log_info_repository = TABLE, the performance is about 2k-3kQPS lower than that of FILE.
- MySQL 5.7.35 with master_info_repository = FILE has more than double the performance of single thread replication of MySQL 8.0.26 with master_info_repository = TABLE.

How to repeat:
source settings:
character-set-server     = utf8mb4
skip-character-set-client-handshake
max_binlog_size         = 200M
binlog_cache_size       = 1M
binlog_format           = ROW
binlog_row_image        = MINIMAL
binlog_rows_query_log_events       = ON
sync_binlog             = 0
binlog-row-event-max-size = 1024

replica (5.7.35) settings:

character-set-server     = utf8mb4
skip-character-set-client-handshake
max_binlog_size         = 200M
binlog_format           = ROW
binlog_row_image        = MINIMAL
binlog_rows_query_log_events       = ON
sync_binlog             = 0
sync_master_info        = 0
sync_relay_log          = 0
sync_relay_log_info     = 0
binlog-row-event-max-size = 1024
innodb_thread_concurrency       = 16
innodb_buffer_pool_size         = 46G
innodb_log_file_size            = 2G
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_read_io_threads          = 16
innodb_write_io_threads         = 16
innodb_flush_neighbors          = 0
innodb_read_ahead_threshold     = 0
skip-innodb-doublewrite
innodb_io_capacity    = 100
innodb_io_capacity_max =  8000
performance_schema_digests_size = 2000
performance_schema_max_table_instances = 2000
performance_schema_max_file_instances = 4000
log-slave-updates
gtid-mode = OFF_PERMISSIVE

repica(8.0.26) settings:
character-set-server     = utf8mb4
skip-character-set-client-handshake
binlog_expire_logs_seconds = 0
max_binlog_size         = 200M
binlog_format           = row
binlog_row_image        = minimal
binlog_rows_query_log_events       = on
sync_binlog             = 0
sync_master_info        = 0
sync_relay_log          = 0
sync_relay_log_info     = 0
binlog-row-event-max-size = 1024
innodb_thread_concurrency       = 16
innodb_buffer_pool_size         = 46G
innodb_log_file_size            = 2G
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_autoextend_increment     = 8
innodb_undo_log_truncate        = off
innodb_autoinc_lock_mode        = 1
innodb_read_io_threads          = 16
innodb_write_io_threads         = 16
innodb_flush_neighbors          = 0
innodb_read_ahead_threshold     = 0
innodb_io_capacity    = 100
innodb_io_capacity_max =  8000
skip-innodb-doublewrite
performance_schema_digests_size = 2000
performance_schema_max_table_instances = 2000
performance_schema_max_file_instances = 4000
log-slave-updates
gtid-mode = OFF_PERMISSIVE

sysbench:
prepare
$ sysbench --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_common.lua prepare

run (time=600 or 300)
$ sysbench --time=600 --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_update_non_index.lua run
[5 Aug 11:38] Takanori Sejima
sorry. The summary table was incorrect.

The correct summary table is:

+------------------------------------------------------+--------------+--------------+
| case                                                 | MySQL 5.7.35 | MySQL 8.0.26 |
+------------------------------------------------------+--------------+--------------+
| master_info_repository=FILE,  log_writer_threads=ON  | 14.74 k qps  | 9.82 k       |
+------------------------------------------------------+--------------+--------------+
| master_info_repository=FILE,  log_writer_threads=OFF | NaN          | 9.05 k       |
+------------------------------------------------------+--------------+--------------+
| master_info_repository=TABLE, log_writer_threads=ON  | 12.04 k      | 6.89 k       | 
|+-----------------------------------------------------+--------------+--------------+
| master_info_repository=TABLE, log_writer_threads=OFF | NaN          | 6.56 k       |
+------------------------------------------------------+--------------+--------------+
[7 Aug 12:11] Takanori Sejima
I also investigated the case of multi-threaded replication.
Also, in the case of multi-threaded replication, MySQL 8.0.26 is slower than 5.7.35.

I built a cluster with source MySQL 5.7.15  and  two replicas(5.7.35 and 8.0.26) using slave_parallel_type=LOGICAL_CLOCK and slave_parallel_workers=16;

And I also tuned innodb_io_capacity and innodb_io_capacity_max because multi-threaded replication increases I/O.

I set io_cpacity:
innodb_io_capacity              = 2000
innodb_io_capacity              = 20000

I  executed sysbench oltp_update_non_index on source(MySQL5.7).

The summary table, the number are UPDATE query/sec

+-----------------------+-------------+
| source (MySQL 5.7.15) |    34 k qps |
+-----------------------+-------------+
| replica(MySQL 5.7.35) |    29 k     |
+-----------------------+-------------+
| replica(MySQL 8.0.26) |    22 k     |
+-----------------------+-------------+

Both single-threaded replication and multithreaded replication, MySQL 8.0.26 seems to be more than 25% slower than MySQL 5.7.35.

In addtion,  I compared the performance of MySQL 8.0.26 and MySQL 5.7.35 using oltp_read_only and update_non_index in sysbench without replication.(I executed sysbench directly on the server I was using as replicas.)

oltp_read_only:
$ sysbench --time=300 --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_read_only.lua run

+ --------- + --------------- +
| version  |  queries/sec | 
+ --------- + --------------- +
|  5.7.35   |     86913.78 |
+ --------- + --------------- +
|  8.0.26   |   114882.11 |
+ --------- + --------------- +

update_non_index(tables=10/threads=100) :
$ sysbench --time=300 --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_update_non_index.lua run

+ --------- + --------------- +
| version  |  queries/sec | 
+ --------- + --------------- +
|  5.7.35   |     51403.20 |
+ --------- + --------------- +
|  8.0.26   |     33374.33 |
+ --------- + --------------- +

update_non_index(tables=10/threads=1) :
$ sysbench --time=300 --table-size=8000000 --tables=10 --threads=1 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_update_non_index.lua run
+ --------- + --------------- +
| version  |  queries/sec | 
+ --------- + --------------- +
|  5.7.35   |    10042.61  |
+ --------- + --------------- +
|  8.0.26   |      8433.33  |
+ --------- + --------------- +

In the case of read only transaction, MySQL 8.0.26 is faster than 5.7.35.
However,  in the case of  UPDATE, MySQL 8.0.26 may be slower than 5.7.35.
[9 Aug 19:09] MySQL Verification Team
Hi,

Can you share config files on both 5.7 and 8.0, there are lot of differences in default values between 5.7 and 8.0, so in order to compare speeds you really need to setup all config values to same values

Thanks
[10 Aug 13:31] Takanori Sejima
compared 5.7.35 and 8.0.26 by sysbench without replication and performance_schema.

Attachment: mysql-data-104546.zip (application/x-zip-compressed, text), 279.69 KiB.

[10 Aug 13:37] Takanori Sejima
I attached mysql-data-104546.zip.

For ease of comparison, I set performance_schema = off, defaulted the other parameters as much as possible, made the difference between 5.7.35 and 8.0.26 as small as possible, and reconfigured my.cnf as small as possible.

5.7.35 $ egrep -v -e'^ $'-e'^ #'/etc/my.cnf | grep -v -e 'log-bin' -e 'log-error' -e 'slow_query_log_file' -e 'relay-log' -e 'report_host' | tee mysql57.cnf

8.0.26 $ egrep -v -e '^ $'-e'^ #'/etc/my.cnf | grep -v -e 'log-bin' -e 'log-error' -e 'slow_query_log_file' -e 'relay-log' -e 'report_host' | tee mysql80.cnf

sorry, log file name and hostname are masked.

Using those my.cnf, I re-benchmarked sysbench oltp_update_non_index with MySQL 5.7.35 and 8.0.26 without replication.

The result of sysbench is sysbench_mysql5_7_35.txt
And sysbench_mysql8_0_26.txt.

While running sysbench, I executed "sudo perf top -p `pidof mysqld`". The result is mysql57_perf_top.txt and mysql80_perf_top.txt.

Also, while running sysbench, I executed "sudo perf record -F 99 -p `pidof mysqld` -g --sleep 600" and left a record with FlameGraph. The result is flamegraph_mysql57.svg and flamegraph_mysql80.svg.

Looking at these results, I noticed that in MySQL 8.0.26, ut_delay has low CPU usage (low PAUSE).

So, after starting MySQL 8.0.26 with mysql80.cnf,

mysql> SET GLOBAL innodb_log_writer_threads=off;

And I executed sysbench.

$ sysbench --time=600 --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_update_non_index.lua run

The record of executing "sudo perf top -p `pidof mysqld`" at that time is mysql80_writer_off_perf_top.txt. After all, I thought that ut_delay was lower than MySQL 5.7.35.
[11 Aug 10:24] MySQL Verification Team
Hi,

thanks for the detailed report, I'll let our performance team have a go on it, I'm not getting same results as you are but it has to do with hw too (I'm testing on slower hardware so differences are minimal)

thanks
[11 Aug 10:24] MySQL Verification Team
If not a problem, please share on what exact hardware you tested this (cpu number, core number, ram, disk..)
[11 Aug 11:27] Takanori Sejima
server configuration:

OS: Ubuntu 18.04.5 LTS
kernel: 5.4
CPU: 16Cores-HT Xeon Silver 4109T(Skylake-SP) https://ark.intel.com/content/www/us/en/ark/products/123549/intel-xeon-silver-4109t-proces...
RAM: 96GB (DDR4 16GB*6)
SSD: Intel SATA SSD DC S4600 960GB https://www.intel.com/content/www/us/en/products/sku/120518/intel-ssd-dc-s4600-series-960g...
filesystem: ext4

I don't use the discard option when mounting an ext4 partition. I'm using fstrim instead.

I have set intel_idle.max_cstate=1, processor.max_cstate=1, intel_pstate=no_hwp in the kernel boot parameter for disable hardware-managed P-states (HWP) and use scaling_governor=performance.

The details are as follows.

$ lscpu | grep Xeon
Model name:          Intel(R) Xeon(R) Silver 4109T CPU @ 2.00GHz
$ dmesg | grep microcode
[    0.000000] microcode: microcode updated early to revision 0x2006b06, date = 2021-03-08
[    1.859951] microcode: sig=0x50654, pf=0x1, revision=0x2006b06
[    1.860088] microcode: Microcode Update Driver: v2.2.
$ uname -a
Linux 172-18-59-138.gree.internal 5.4.0-80-generic #90~18.04.1-Ubuntu SMP Tue Jul 13 19:40:02 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg --list libc-bin
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                  Version                         Architecture                    Description
+++-=====================================================-===============================-===============================-===============================================================================================================
ii  libc-bin                                              2.27-3ubuntu1.4                 amd64                           GNU C Library: Binaries
$ df -h /home/mysql/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5       844G  147G  655G  19% /home
$ grep sda5 /etc/mtab
/dev/sda5 /home ext4 rw,noatime 0 0
$ grep sda5 /proc/mounts
/dev/sda5 /home ext4 rw,noatime 0 0
$ cat /sys/block/sda/queue/scheduler
[mq-deadline] none
$ sudo dmidecode | grep -C10 DIM | grep -e MB -e Speed -e DDR | grep -v Unknown
        Size: 16384 MB
        Type: DDR4
        Speed: 2666 MT/s
        Configured Clock Speed: 2400 MT/s
        Size: 16384 MB
        Type: DDR4
        Speed: 2666 MT/s
        Configured Clock Speed: 2400 MT/s
        Size: 16384 MB
        Type: DDR4
        Speed: 2666 MT/s
        Configured Clock Speed: 2400 MT/s
        Size: 16384 MB
        Type: DDR4
        Speed: 2666 MT/s
        Configured Clock Speed: 2400 MT/s
        Size: 16384 MB
        Type: DDR4
        Speed: 2666 MT/s
        Configured Clock Speed: 2400 MT/s
        Size: 16384 MB
        Type: DDR4
        Speed: 2666 MT/s
        Configured Clock Speed: 2400 MT/s
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-80-generic root=UUID=00441827-4161-4d0f-97f3-3eb39eb2078a ro net.ifnames=0 biosdevname=0 vga=normal nomodeset intel_idle.max_cstate=1 processor.max_cstate=1 intel_pstate=no_hwp mce=ignore_ce
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ grep performance /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | wc -l
16
$
[17 Aug 11:37] Takanori Sejima
I executed sysbench oltp_update_non_index on MySQL 8.0.26 on Ubuntu 20.04.2 LTS.

Attachment: mysql-data-104546-focal.zip (application/x-zip-compressed, text), 116.43 KiB.

[17 Aug 11:46] Takanori Sejima
I executed sysbench oltp_update_non_index on MySQL 8.0.26 on Ubuntu 20.04.2 LTS to verify that the performance was not affected by libraries such as glibc.

The result was almost same as 18.04.5 LTS.

mysql-data-104546-focal.zip contains the flamegraph_mysql80_focal.svg generated by FlameGraph and a result of sysbench and the result of "sudo perf top -p`pidof mysqld` "when running sysbench.

It seems that kernel and glibc are not the reason why MySQL 8.0.26 has lower update performance than MySQL 5.7.35.
[27 Aug 7:20] Takanori Sejima
I compared the performance when binlog was enabled or disabled.

Attachment: mysql-data-104546_skip_log_bin.zip (application/x-zip-compressed, text), 406.13 KiB.

[27 Aug 7:25] Takanori Sejima
I have attached mysql-data-104546_skip_log_bin.zip.

I compared the performance of sysbench oltp_update_non_index on MySQL 5.7.35 and 8.0.26 on Ubuntu 18.04.5 LTS (kernel 5.4) without replication. I also compared the performance when binlog was enabled or disabled.

attached my.cnf: 
5.7.35 $ egrep -v -e'^$' -e '^#' /etc/my.cnf | sed -e 's/^\(log.bin\).*/\1 = MASKED/' -e 's/^\(log.error\).*/\1 = MASKED/' -e 's/^\(slow.query.log.file\).*/\1 = MASKED/' -e 's/^\(relay.log\) .*/\1 = MASKED/' -e 's/^\(report.host\).*/\1 = MASKED/' | tee mysql57_log_bin.cnf

5.7.35(skip-log-bin) $ egrep -v -e'^$' -e '^#' /etc/my.cnf | sed -e 's/^\(log.bin\).*/\1 = MASKED/' -e 's/^\(log.error\).*/\1 = MASKED/' -e 's/^\(slow.query.log.file\).*/\1 = MASKED/' -e 's/^\(relay.log\) .*/\1 = MASKED/' -e 's/^\(report.host\).*/\1 = MASKED/' | tee mysql57_skip_log_bin.cnf

8.0.26 $ egrep -v -e'^$' -e '^#' /etc/my.cnf | sed -e 's/^\(log.bin\).*/\1 = MASKED/' -e 's/^\(log.error\).*/\1 = MASKED/' -e 's/^\(slow.query.log.file\).*/\1 = MASKED/' -e 's/^\(relay.log\) .*/\1 = MASKED/' -e 's/^\(report.host\).*/\1 = MASKED/' | tee mysql80_log_bin.cnf

8.0.26(skip-log-bin) $ egrep -v -e'^$' -e '^#' /etc/my.cnf | sed -e 's/^\(log.bin\).*/\1 = MASKED/' -e 's/^\(log.error\).*/\1 = MASKED/' -e 's/^\(slow.query.log.file\).*/\1 = MASKED/' -e 's/^\(relay.log\) .*/\1 = MASKED/' -e 's/^\(report.host\).*/\1 = MASKED/' | tee mysql80_skip_log_bin.cnf

sorry, log file name and hostname are masked.

prepare:
$ sysbench --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_common.lua prepare

run:
$ $ sysbench --time=300 --table-size=8000000 --tables=10 --threads=1 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_update_non_index.lua run

or

$ sysbench --time=300 --table-size=8000000 --tables=10 --threads=100 --mysql-socket=/tmp/mysql.sock --mysql-db='sbtest' --mysql-password='sbtest' src/lua/oltp_update_non_index.lua run

I got the following results:

+------------------------------------+--------------+--------------+
| case                               | MySQL 5.7.35 | MySQL 8.0.26 |
+------------------------------------+--------------+--------------+
| sysbench threads=1,        log-bin | 10495.43 qps |  8454.26     |
+------------------------------------+--------------+--------------+
| sysbench threads=100,      log-bin | 52959.51     | 36801.57     |
+------------------------------------+--------------+--------------+
| sysbench threads=1,   skip-log-bin | 14929.08     | 13741.24     |
+------------------------------------+--------------+--------------+
| sysbench threads=100, skip-log-bin | 69533.67     | 81504.02     |
+------------------------------------+--------------+--------------+

With sysbench --threads=100 and skip-log-bin, MySQL 8.0.26 seems to have better update performance than MySQL 5.7.35.

When I ran "sudo perf top -p `pidof mysqld` " while running sysbench on MySQL 8.0.26 with binlog enabled, the CPU usage of __pthread_mutex_lock and __pthread_mutex_cond_lock seems to be higher than MySQL5.7.35.

While running sysbench with binlog enabled, I ran "sudo perf record -F 99 -p ` pidof mysqld` -g -- sleep 300" and left a record with FlameGraph. The result is flamegraph_57_100threads_log_bin.svg(MySQL 5.7.35) and flamegraph_80_100threads_log_bin.svg(MySQL 8.0.26).

Looking at flamegraph_80_100threads_log_bin.svg, it seems that trans_commit_stmt occupies a higher ratio in mysql_execute_command than MySQL 5.7.35. Also, the ratio of __pthread_mutex_lock in MYSQL_BIN_LOG::change_stage seems to be higher than MySQL 5.7.35.
(You can zoom by opening flamegraph_80_100threads_log_bin.svg in your browser and clicking trans_commit_stmt.)

MySQL 5.7
MYSQL_BIN_LOG::change_stage
https://github.com/mysql/mysql-server/blob/89713bf41c33528c3e7682d219dc838915f5af9a/sql/bi...
Stage_manager::enroll_for
https://github.com/mysql/mysql-server/blob/89713bf41c33528c3e7682d219dc838915f5af9a/sql/bi...

MySQL 8.0
MYSQL_BIN_LOG::change_stage
https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/sql/bi...
Commit_stage_manager::enroll_for
https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/sql/rp...

In a high concurrency workload, Commit_stage_manager::enroll_for()
in MYSQL_BINLOG::change_stage may be a hotspot for mutex contention at MySQL 8.0.26.

It is not directly related to the single thread replication performance of MySQL 8.0.26 being lower than MySQL 5.7.35. However, one of the reasons why the multi thread performance of MySQL 8.0.26 with binlog enabled is lower than MySQL 5.7.35 seems to be the mutex in trans_commit_stmt.