Bug #93734 MySQL 8.0 is 36 times slower than MySQL 5.7
Submitted: 24 Dec 2018 18:44 Modified: 14 Mar 20:18
Reporter: Vadim Tkachenko Email Updates:
Status: In review Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version: OS:Ubuntu (18.04)
Assigned to: Sunny Bains CPU Architecture:x86

[24 Dec 2018 18:44] Vadim Tkachenko
Description:
I am testing MySQL 8.0 on my home Intel NUC with the storage Samsung 970 Pro NVMe. 
This type of the storage is critical because it is fast in random read/writes but slow in fsync().

So I am testing sysbench oltp prepare with the following command:
time sysbench oltp_read_only --tables=8 --table-size=10000000 --rand-type=uniform --mysql-user=root --mysql-socket=/tmp/mysql.sock --threads=8  prepare

the time to accomplish this operation for MySQL 8.0 is
real    175m35.374s
user    7m2.704s
sys     0m7.454s

At the same time for MySQL 5.7 the time is:
real    4m46.539s
user    2m25.878s
sys     0m2.338s

Which means that it takes 36 times longer for MySQL 8.0 to insert data.

The quick pt-pmp shows that MySQL 8.0 spends the most time in

__nanosleep(../sysdeps/unix/sysv/linux/nanosleep.c:28),log_buffer_write_completed(thread:373),mtr_t::Command::execute

which I think corresponds to the slow fsync() operation on the storage I use.

How to repeat:
Storage Samsung 970 Pro NVMe is required.
My config is

[mysqld]
datadir=/mnt/nvme/mysql
socket=/tmp/mysql.sock
ssl=0

skip-log-bin
log-error=error.log

# general
  table_open_cache = 200000
  table_open_cache_instances=64
  back_log=3500
  max_connections=4000

# files
  innodb_file_per_table
  innodb_log_file_size=15G
  innodb_log_files_in_group=2
  innodb_open_files=4000

# buffers
  innodb_buffer_pool_size= 23G
  innodb_buffer_pool_instances=8
  innodb_log_buffer_size=64M

# tune
  innodb_doublewrite= 0
  innodb_thread_concurrency=0 
  innodb_flush_log_at_trx_commit= 1
  innodb_flush_method=O_DIRECT_NO_FSYNC
  innodb_max_dirty_pages_pct=90
  innodb_max_dirty_pages_pct_lwm=10
  innodb_lru_scan_depth=1024
  innodb_page_cleaners=4
  join_buffer_size=256K
  sort_buffer_size=256K
  innodb_use_native_aio=1
  innodb_stats_persistent = 1
  #innodb_spin_wait_delay=96

# perf special
  innodb_adaptive_flushing = 1
  innodb_flush_neighbors = 0
  innodb_read_io_threads = 4
  innodb_write_io_threads = 4
  innodb_io_capacity=2000 
  innodb_io_capacity_max=4000
  innodb_purge_threads=4

  innodb_adaptive_hash_index=1

#innodb_change_buffering=none

max_prepared_stmt_count=1000000

Suggested fix:
Make MySQL 8 Great Again
[24 Dec 2018 18:49] Vadim Tkachenko
I should add that the option
innodb_flush_log_at_trx_commit= 1
is the one that makes the performance impact.

When I use innodb_flush_log_at_trx_commit= 0
then MySQL 8.0 performance is roughly the same as MySQL 5.7
[24 Dec 2018 18:53] Mark Callaghan
8.0.what?

Is this using 36x more CPU?
[24 Dec 2018 18:55] Vadim Tkachenko
MySQL 8.0.13

Vmstat for 8.0.13:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 14814912  28384 10282952    0    0  1217  3199 85381 187052  4  5 90  0  0
 0  0      0 14803844  28384 10289100    0    0  1196  2818 81898 179582  5  7 88  0  0
 3  0      0 14793008  28384 10295088    0    0  1200  3007 84725 185657  5  6 89  0  0

vmstat for MySQl 5.7
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 8  0      0 13856900  12504 13034624    0    0     0 205012 2338 4143 83  1 15  1  0
 8  0      0 12861932  12504 13560592    0    0     0 202415 2303 4144 81  2 17  1  0
 6  0      0 11894124  12504 14073404    0    0     0 194457 2371 4159 83  1 15  1  0
 1  0      0 10964124  12512 14565408    0    0     0 187349 2250 3868 82  1 16  1  0
[25 Jan 19:29] Vadim Tkachenko
For MySQL 8.0.14 the result is
time sysbench oltp_read_only --tables=8 --table-size=10000000 --rand-type=uniform --mysql-user=root --mysql-socket=/tmp/mysql.sock --threads=8  prepare

real    76m26.892s
user    6m24.992s
sys     0m6.751s
[25 Jan 19:43] Shane Bester
>
>  This type of the storage is critical because it is fast in random > 
>  read/writes but slow in fsync().
>

So, we just need to find out where the extra calls to fsync() are coming about and, if they're justified?  I think it's fair to say MySQL 8.0 is not 36x slower,  but reporter's disk's fsync() is the one who's 36x slower than normal ?
[13 Mar 16:38] Sinisa Milivojevic
Hi Vadim,

Thank you for this report.

I have setup the server as you specified. I do not have storage that you describe as "Samsung 970 Pro NVMe". Is that some kind of Flash Drive ??? Is it essential in some ways for the results ??? May be it has very slow fsync() ???

I used my iMac with Fusion drive, which has a 1 Tb flash drive as a primary storage. As my machine is not as fast as yours, I used only 1 million rows for each table.

With 5.7.24, I have got the following results:

real	12m14.404s
user	0m13.832s
sys	0m1.290s

Now, I will be running it on 8.0.15.
[13 Mar 16:42] Vadim Tkachenko
Sinisa,

"Samsung 970 Pro NVMe" is a PCIe SSD.
It indeed seems to have a slow fsync() .
The point here that even with slow fsync(), MySQL 5.7 performs fine, but MySQL 8.0 is much slower in inserting data.
[13 Mar 17:45] Sinisa Milivojevic
Vadim,

My results with 8.0.15 are the following ones:

real	12m7.974s
user	0m15.085s
sys	0m1.479s

Hence, as you can see, I can't repeat your test case. And I do not think that we got that Samsung PCI SSD. But, I will check ...

Can you let us know the family of devices that would repeat this behaviour ????

If we manage to repeat this behaviour, I have an idea how to diagnose it, but that is not a subject at this moment.

Hence, your feedback would be truly welcome ......
[14 Mar 0:15] Sunny Bains
I used the config in the bug report. It's a low end machine, the FS is ext4 and I can't reproduce the problem. 

Are you sure you weren't using a debug binary?

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Stepping:            9
CPU MHz:             804.715
CPU max MHz:         4200.0000
CPU min MHz:         800.0000
BogoMIPS:            7200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d

It has an older SSD:

  *-disk
       description: ATA Disk
       product: Samsung SSD 840
       physical id: 0.0.0
       bus info: scsi@2:0.0.0
       logical name: /dev/sdb
       version: 5B0Q
       serial: S1AXNEAD605540R
       size: 476GiB (512GB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=0e627372

First is with mysql 8.0

time sysbench oltp_read_only --db-driver=mysql --tables=8 --table-size=10000000 --rand-type=uniform --mysql-user=root --mysql-socket=/tmp/mysql-sunny.sock --threads=8  prepare
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Initializing worker threads...

Creating table 'sbtest8'...Creating table 'sbtest7'...
Creating table 'sbtest5'...

Creating table 'sbtest1'...
Creating table 'sbtest6'...
Creating table 'sbtest2'...
Creating table 'sbtest3'...
Creating table 'sbtest4'...
Inserting 10000000 records into 'sbtest1'
Inserting 10000000 records into 'sbtest3'
Inserting 10000000 records into 'sbtest4'
Inserting 10000000 records into 'sbtest6'
Inserting 10000000 records into 'sbtest7'
Inserting 10000000 records into 'sbtest5'
Inserting 10000000 records into 'sbtest2'
Inserting 10000000 records into 'sbtest8'
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest4'...
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest7'...

real	16m3.893s
user	4m17.926s
sys	0m6.921s

Second is with mysql 5.7.

time sysbench oltp_read_only --db-driver=mysql --tables=8 --table-size=10000000 --rand-type=uniform --mysql-user=root --mysql-socket=/tmp/mysql-sunny.sock --threads=8  prepare
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Initializing worker threads...

Creating table 'sbtest6'...
Creating table 'sbtest8'...
Creating table 'sbtest2'...
Creating table 'sbtest3'...
Creating table 'sbtest7'...
Creating table 'sbtest4'...
Creating table 'sbtest1'...
Creating table 'sbtest5'...
Inserting 10000000 records into 'sbtest8'
Inserting 10000000 records into 'sbtest2'
Inserting 10000000 records into 'sbtest1'
Inserting 10000000 records into 'sbtest7'
Inserting 10000000 records into 'sbtest5'
Inserting 10000000 records into 'sbtest4'
Inserting 10000000 records into 'sbtest3'
Inserting 10000000 records into 'sbtest6'
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest4'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest3'...

real	23m43.940s
user	3m36.421s
sys	0m6.535s
[14 Mar 13:56] Sinisa Milivojevic
Hello Vadim,

Actually, several of us were not able to reproduce this.

We are waiting on some slower disks, so that we may attempt some new benchmarks later.

Hence, no feedback is needed from.

For the moment, this bug will remain in the stand-by status.

Thanks again.
[14 Mar 15:09] Sunny Bains
I can reproduce the problem if I slow down the fsync() call by adding a sleep.
[14 Mar 16:18] Sinisa Milivojevic
Sunny,

Thank you very much.

I will transfer this bug into internal database and will leave some comments there.