Bug #86215 MySQL is much slower in 5.7 vs 5.6
Submitted: 8 May 13:35 Modified: 11 Jul 22:44
Reporter: Mark Callaghan Email Updates:
Status: Analyzing Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.1 OS:Any
Assigned to: Dimitri Kravtchuk

[8 May 13:35] Mark Callaghan
Description:
On my tests InnoDB in MySQL 5.7.17 gets up to 1.5X more QPS than InnoDB on 8.0.1.

I used in-memory & IO-bound sysbench and insert benchmark on a small home server (2 cores, 4 HW threads, 8gb RAM, 110gb Samsung 850 EVO SSD). The server uses Ubuntu 16.04

How to repeat:
I use current sysbench run by my helper scripts at :
https://github.com/mdcallag/mytools/tree/master/bench/sysbench.lua

With MySQL installed at ~b/orig801 and ~b/orig5717, sysbench installed at ~b/sysbench I used these command lines for in-memory sysbench. This creates 4 tables with 1M rows/table.
bash all_small.sh 4 1000000 600 600 300 innodb 1 0 ~/b/orig801/bin/mysql none ~/b/sysbench/share/sysbench
bash all_small.sh 4 1000000 600 600 300 innodb 1 0 ~/b/orig5717/bin/mysql none ~/b/sysbench/share/sysbench

And for IO-bound I use 4 tables with 40M rows/table and these command lines:
bash all_small.sh 4 40000000 600 600 300 innodb 1 0 ~/b/orig801/bin/mysql none ~/b/sysbench/share/sysbench
bash all_small.sh 4 40000000 600 600 300 innodb 1 0 ~/b/orig5717/bin/mysql none ~/b/sysbench/share/sysbench

I used these cmake command lines:

Well, maybe I used this one for 5.7.17
cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DENABLE_DTRACE=0 -DCMAKE_INSTALL_PREFIX=$1

For MySQL 8 and maybe 5.7.17
cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DENABLE_DTRACE=0 -DCMAKE_INSTALL_PREFIX=$1 -DWITH_BOOST=$PWD/boost

my.cnf for 5.7.17
[mysqld]
sql_mode=no_engine_substitution,no_unsigned_subtraction
default_tmp_storage_engine=MyISAM
tmpdir=/data/m/my
query_cache_size=0
query_cache_type=0
long_query_time=5
log_bin=/data/m/my/binlogs/bl
max_binlog_size=128M
sync_binlog=0
server_id=77
binlog_format=ROW
datadir=/data/m/my/data
innodb_max_dirty_pages_pct=75
innodb_adaptive_flushing=1
innodb_file_format=Barracuda
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_doublewrite=1
innodb_page_size=16K
innodb_buffer_pool_size=6G
innodb_flush_neighbors=0
innodb_io_capacity=10000
innodb_io_capacity_max=20000
innodb_lru_scan_depth=5000
innodb_data_home_dir=/data/m/my/data
innodb_data_file_path=ibdata1:512M:autoextend
innodb_log_group_home_dir=/data/m/my/txlogs
performance_schema=1
[mysqld_safe]
malloc-lib=/usr/lib/x86_64-linux-gnu/libjemalloc.so

my.cnf for 8.0.1
[mysqld]
sql_mode=no_engine_substitution,no_unsigned_subtraction
default_tmp_storage_engine=MyISAM
tmpdir=/data/m/my
query_cache_size=0
query_cache_type=0
long_query_time=5
log_bin=/data/m/my/binlogs/bl
max_binlog_size=128M
sync_binlog=0
server_id=77
binlog_format=ROW
datadir=/data/m/my/data
innodb_max_dirty_pages_pct=75
innodb_adaptive_flushing=1
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_doublewrite=1
innodb_page_size=16K
innodb_buffer_pool_size=6G
innodb_flush_neighbors=0
innodb_io_capacity=10000
innodb_io_capacity_max=20000
innodb_lru_scan_depth=5000
innodb_data_home_dir=/data/m/my/data
innodb_data_file_path=ibdata1:512M:autoextend
innodb_log_group_home_dir=/data/m/my/txlogs
performance_schema=1
[mysqld_safe]
malloc-lib=/usr/lib/x86_64-linux-gnu/libjemalloc.so
[8 May 13:45] Mark Callaghan
The all_small.sh script I use to run sysbench does the following tests in order:
* update with index maintenance
* update without index maintenance
* delete-only
* write-only
* read-write with X row range scans for X in 100 10,000
* read-write with 10,000 row range scan
* read-only with X row range scan for X in 10 100 1,000 10,000
* point-query
* insert-only

I will share results for most of the tests here for in-memory sysbench (4 tables, 1M rows/table). Tests are run for 1, 2 & 4 clients.

QPS for the tests for InnoDB from MySQL 5.6.35, 5.7.17 and 8.0.1 is at https://gist.github.com/mdcallag/3aa4edb6471f1188c9167207d3013050
I pasted them via a gist to preserve formatting with a fixed width font.

Note that 5.6 and 5.7 get up to 2X more QPS than 8.0.1 on many of the tests.
[8 May 13:50] Mark Callaghan
Next up, IO-bound sysbench -- 4 tables, 40M rows/table. 5.7 is about 1.5X faster than 8.0.1 in the worst case, which isn't as bad as the in-memory workload comparison.
https://gist.github.com/mdcallag/de5d1793e90c59321ab598872c9a4ae1
[8 May 14:15] Mark Callaghan
Next up, in-memory insert benchmark using my scripts at https://github.com/mdcallag/mytools/tree/master/bench/ibench

This runs 3 tests:
1) insert 10M rows using 1 connection
2) insert 1000 rows/sec using 1 connection, range queries with another connection
3) inset 100 rows/sec using 1 connection, range queries with another connection

Summary for test #1:
* 20202, 18727, 16447 inserts/sec for 5.6.35, 5.7.17 and 8.0.1

Summary for test #2:
* 4492, 3924, 3267 queries/sec for 5.6.35, 5.7.17 and 8.0.1

Summary for test #3:
* 4385, 3859, 3219 queries/sec for 5.6.35, 5.7.17 and 8.0.1

Full perf results at https://gist.github.com/mdcallag/43f69854bf7ce0fc9347d7698db2640f

Some differences visible in the full perf results.

From test #1:
* 8.0.1 uses 10.5% more CPU/insert than 5.7 and 9.9% more than 5.6 - see Mcpu/i
* 8.0.1 writes 19.7% more/insert than 5.7 and 39% more than 5.6 - see wKB/i. Part of this might be that 8.0.1 maintains a lower number of dirty pages, but at test end 8.0.1 datadir was ~2.5gb versus ~2.1gb for 5.6 and 5.7. I did not debug the difference.

From test #2:
* 8.0.1 uses 20.5% more CPU/query than 5.7 and 34.1% more than 5.6 - see Mcpu/q
* 8.0.1 writes 12.7% more/insert than 5.7 and 6.2% more/insert than 5.6
* at test end datadir was 3.4gb for 8.0.1 vs 3.1gb for 5.6 & 5.7

From test #3:
* 8.0.1 uses 20% more CPU/query than 5.7 and 33.9% more than 5.6 - see Mcpu/q
* at test end datadir was 3.5gb for 8.0.1, 3.2gb for 5.7 and 3.1gb for 5.6
[8 May 15:24] Mark Callaghan
I didn't share the results, but I also ran tests with performance_schema=0. And PS overhead doesn't explain this bug.

From looking at top-N results from flat Linux perf from point-query workload with 1 connection I see:
* __memcpy_avx_unaligned uses more CPU in 8.0.1 - see https://gist.github.com/mdcallag/410ae78e852ca9243f4b085f0942aacd
* malloc/free use more CPU in 8.0.1 -
** for malloc - https://gist.github.com/mdcallag/1e0d7d75dc8c1eb5addc63e305442be3
** for free - https://gist.github.com/mdcallag/a0f09c0991776e0f8cdebe0ce400b30e

for 5.7.17
     1.41%  mysqld   mysqld               [.] btr_search_guess_on_hash
     1.12%  mysqld   mysqld               [.] row_search_mvcc
     1.10%  mysqld   mysqld               [.] JOIN::optimize
     0.88%  mysqld   mysqld               [.] pfs_start_stage_v1
     0.81%  mysqld   mysqld               [.] dispatch_command
     0.76%  mysqld   mysqld               [.] pfs_end_statement_v1
     0.73%  mysqld   mysqld               [.] THD::enter_stage
     0.67%  mysqld   libpthread-2.23.so   [.] pthread_mutex_unlock
     0.65%  mysqld   mysqld               [.] PolicyMutex<TTASEventMutex<GenericPolicy> >::enter
     0.64%  mysqld   [kernel.kallsyms]    [k] __fget
     0.63%  mysqld   mysqld               [.] pfs_get_thread_statement_locker_v1
     0.59%  mysqld   [kernel.kallsyms]    [k] do_sys_poll
     0.59%  mysqld   mysqld               [.] JOIN::make_join_plan
     0.57%  mysqld   libpthread-2.23.so   [.] pthread_getspecific
     0.57%  mysqld   mysqld               [.] alloc_root
     0.57%  mysqld   mysqld               [.] mysql_execute_command
     0.55%  mysqld   [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
     0.54%  mysqld   libjemalloc.so.1     [.] malloc
     0.53%  mysqld   [kernel.kallsyms]    [k] effective_load.isra.39
     0.52%  mysqld   [kernel.kallsyms]    [k] tcp_recvmsg
     0.51%  mysqld   mysqld               [.] THD::send_result_metadata
     0.50%  mysqld   libpthread-2.23.so   [.] pthread_mutex_lock
     0.50%  mysqld   mysqld               [.] TrxInInnoDB::enter
     0.50%  mysqld   [kernel.kallsyms]    [k] tcp_transmit_skb
     0.48%  mysqld   mysqld               [.] open_tables
     0.47%  mysqld   [kernel.kallsyms]    [k] dequeue_entity
     0.47%  mysqld   libjemalloc.so.1     [.] free
     0.44%  mysqld   mysqld               [.] create_ref_for_key
     0.43%  mysqld   [kernel.kallsyms]    [k] enqueue_entity
     0.43%  mysqld   [kernel.kallsyms]    [k] __netif_receive_skb_core
     0.43%  mysqld   mysqld               [.] insert_events_statements_history
     0.42%  mysqld   [kernel.kallsyms]    [k] __local_bh_enable_ip
     0.42%  mysqld   mysqld               [.] pfs_start_mutex_wait_v1
     0.41%  mysqld   mysqld               [.] THD::decide_logging_format
     0.41%  mysqld   libc-2.23.so         [.] __memcpy_avx_unaligned
     0.41%  mysqld   mysqld               [.] Item_field::set_field
     0.40%  mysqld   mysqld               [.] st_select_lex::optimize
     0.40%  mysqld   [kernel.kallsyms]    [k] __do_softirq
     0.39%  mysqld   mysqld               [.] mysql_lock_tables

for 8.0.1
     1.05%  mysqld   mysqld               [.] btr_search_guess_on_hash
     1.03%  mysqld   mysqld               [.] row_search_mvcc
     0.83%  mysqld   mysqld               [.] JOIN::optimize
     0.71%  mysqld   mysqld               [.] pfs_start_stage_v1
     0.66%  mysqld   libc-2.23.so         [.] __memcpy_avx_unaligned
     0.65%  mysqld   mysqld               [.] dispatch_command
     0.63%  mysqld   libpthread-2.23.so   [.] pthread_mutex_unlock
     0.62%  mysqld   libjemalloc.so.1     [.] malloc
     0.62%  mysqld   mysqld               [.] pfs_end_statement_v1
     0.61%  mysqld   mysqld               [.] THD::enter_stage
     0.57%  mysqld   mysqld               [.] JOIN::make_join_plan
     0.55%  mysqld   [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
     0.54%  mysqld   libjemalloc.so.1     [.] free
     0.51%  mysqld   libpthread-2.23.so   [.] pthread_getspecific
     0.46%  mysqld   mysqld               [.] alloc_root
     0.46%  mysqld   [kernel.kallsyms]    [k] __fget
     0.46%  mysqld   [kernel.kallsyms]    [k] tcp_transmit_skb
     0.46%  mysqld   libpthread-2.23.so   [.] pthread_mutex_lock
     0.43%  mysqld   libcrypto.so.1.0.0   [.] SHA1_Final
     0.42%  mysqld   [kernel.kallsyms]    [k] copy_user_enhanced_fast_string
     0.41%  mysqld   libcrypto.so.1.0.0   [.] SHA1_Update
     0.40%  mysqld   mysqld               [.] btr_cur_search_to_nth_level
     0.40%  mysqld   [kernel.kallsyms]    [k] effective_load.isra.39
     0.40%  mysqld   mysqld               [.] open_table
     0.39%  mysqld   mysqld               [.] TrxInInnoDB::enter
     0.38%  mysqld   [kernel.kallsyms]    [k] tcp_recvmsg
     0.38%  mysqld   [kernel.kallsyms]    [k] tcp_sendmsg
     0.37%  mysqld   mysqld               [.] openssl_lock_function
     0.37%  mysqld   libc-2.23.so         [.] __memset_avx2
     0.36%  mysqld   mysqld               [.] create_ref_for_key
     0.36%  mysqld   mysqld               [.] ha_innobase::build_template
     0.36%  mysqld   libcrypto.so.1.0.0   [.] 0x000000000009f8c8
     0.36%  mysqld   [kernel.kallsyms]    [k] enqueue_entity
     0.35%  mysqld   mysqld               [.] THD::decide_logging_format
     0.34%  mysqld   mysqld               [.] ha_innobase::external_lock
     0.34%  mysqld   [kernel.kallsyms]    [k] do_sys_poll
     0.32%  mysqld   mysqld               [.] open_tables
     0.32%  mysqld   mysqld               [.] ha_innobase::info_low
     0.32%  mysqld   mysqld               [.] ha_commit_trans
[8 May 23:51] Mark Callaghan
An expert who works at MySQL suggested I use latin1 as utf8 is the default for MySQL 8. I will repeat tests with that.
[9 May 14:14] Mark Callaghan
Repeated in-memory insert benchmark with this added to my.cnf for 8.0.1:
character_set_server=latin1
collation_server=latin1_bin

utf8 vs latin1 doesn't explain the perf overhead for 8.0.1 in this test.

Summary for test #1:
* 20202, 18727, 16447, 16667 inserts/sec for 5.6.35, 5.7.17, 8.0.1 utf, 8.0.1 latin

Summary for test #2:
* 4492, 3924, 3267, 3277 queries/sec for 5.6.35, 5.7.17, 8.0.1 utf, 8.0.1 latin

Summary for test #3:
* 4385, 3859, 3219, 3221 queries/sec for 5.6.35, 5.7.17, 8.0.1 utf, 8.0.1 latin

full perf results at https://gist.github.com/mdcallag/43f69854bf7ce0fc9347d7698db2640f
[9 May 20:31] Mark Callaghan
Repeated 4 table, 1M rows/table sysbench with latin1/latin_bin

Updated results at https://gist.github.com/mdcallag/3aa4edb6471f1188c9167207d3013050

It helps a lot with tests that do long range scans read-only.range1000, read-only.range10000, read-write.range10000. But even there the perf for 8.0.1 is much worse than 5.7.17

So charset/collation is not the root cause.
[9 May 20:46] Mark Callaghan
My test server is an Intel NUC 5th generation core i3. It will take me a few weeks to get results from a faster CPU.

http://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc5i3ryh.html
[10 May 14:23] Mark Callaghan
Problem repeats with 4 table, 1M rows/table sysbench using latin1 charset and latin1_swedish_ci collation.  Updates results at https://gist.github.com/mdcallag/3aa4edb6471f1188c9167207d3013050
[10 May 23:22] Mark Callaghan
Charsets are not the problem. Repeated for 4 tables & 40M rows/table with latin1 charset and both latin1_bin and latin1_swedish_ci collation. Updated results at https://gist.github.com/mdcallag/de5d1793e90c59321ab598872c9a4ae1
[25 May 15:16] Sinisa Milivojevic
Hi Mark,

I do not have a single doubt in my mind that your results are correct. However, we still need to be able to repeat them ourselves. That is not needed just in order to verify this bug, but also in order that developer can profile the execution in the MySQL server and find the new bottlenecks, while also performing other necessary analyses. 

Hence, we need your entire scripts, whether you used mysqlslap or sysbench. We need entire dump of the tables and scripts with other info, like the number of threads etc. You already provided the configuration, so that is not required.

Thank you in advance.
[27 May 16:38] Mark Callaghan
I already shared how I run sysbench along with the command line. My helper scripts are at:
https://github.com/mdcallag/mytools/blob/master/bench/sysbench.lua

And with those scripts I use this to run the tests:
https://github.com/mdcallag/mytools/blob/master/bench/sysbench.lua/all_small.sh

Command lines were provided earlier in this thread.

It runs a sequence of sysbench tests for 1, 2 and 4 clients -- edit this line to change that https://github.com/mdcallag/mytools/blob/master/bench/sysbench.lua/all_small.sh#L13

I am not going to paste a dump of the sysbench tables. You get that data via the "prepare" step from sysbench. 

I used modern sysbench from https://github.com/akopytov/sysbench
[27 May 16:51] Mark Callaghan
Results to this point were from an old Intel NUC (5th generation core i3, 8gb RAM, slow SSD). I repeated tests with a new Intel NUC that is 2X faster (7th generation core i5, 16gb RAM, fast SSD). Both use Ubuntu 16.04 LTS. Summary of the results is:
* there is a big regression from 5.6 to 5.7
* there is a small regression from 5.7 to 8
* problem might be worse on faster NUC (core i5) than on slower NUC (core i3).

I have many blog posts and a few open bugs for the perf regression from 5.6 to 5.7.

Update results for sysbench using older/slower NUC and new/faster NUC
* in-memory sysbench - https://gist.github.com/mdcallag/fc2042fa91f074d44d861d66477a3659
* IO-bound sysbench - https://gist.github.com/mdcallag/eede54ce8133ae9574a5e8c920cf5577

For in-memory MySQL 5.7 and 8 get between 60% to 70% of the QPS versus 5.6 at one thread.

For IO-bound MySQL 5.7 and 8 get between 60% and 80% of the QPS versus 5.6 at one thread.
[27 May 16:55] Mark Callaghan
Can you update the title? This is more of a regression from 5.6 to 5.7 to 8 than from 5.7 to 8.
[29 May 16:50] Sinisa Milivojevic
Done.
[30 May 14:53] Mark Callaghan
Summary of my work on low-concurrency performance regression at http://smalldatum.blogspot.com/2017/05/the-history-of-low-concurrency.html
[30 May 15:22] Mark Callaghan
This has a lot more information on how I run sysbench
http://smalldatum.blogspot.com/2017/02/using-modern-sysbench-to-compare.html
[30 May 17:47] Mark Callaghan
Charts for the sysbench in-memory tests are at http://smalldatum.blogspot.com/2017/05/sysbench-in-memory-intel-nuc.html
[31 May 13:50] Mark Callaghan
Charts for sysbench with an IO-bound workload
http://smalldatum.blogspot.com/2017/05/sysbench-io-bound-intel-nuc-for-mysql.html
[2 Jun 19:22] Mark Callaghan
Charts for the insert benchmark with an in-memory workload
http://smalldatum.blogspot.com/2017/06/insert-benchmark-in-memory-intel-nuc.html
[14 Jun 21:22] Dimitri Kravtchuk
Mark, thanks for reporting this !
we're in progress to reproduce and investigate this issue..

Rgds,
-Dimitri
[17 Jun 14:35] Mark Callaghan
Summary of my recent tests is here:
http://smalldatum.blogspot.com/2017/06/summary-of-recent-performance-tests-for.html

In-memory sysbench for MySQL 5.0 to 8 shows that for most tests the biggest drop in QPS is from 5.6 to 5.7
http://smalldatum.blogspot.com/2017/06/sysbench-for-mysql-50-51-55-56-57-and-8.html
[19 Jun 13:09] Sinisa Milivojevic
Very interesting table of data and graphs.

5.6 -> 5.7 drop is profound. Good news is that 8.0 is holding up, plus it is still in development.
[28 Jun 17:59] Mark Callaghan
An interesting result at June 2017 - http://smalldatum.blogspot.com/2017/06/myisam-small-servers-and-sysbench-at.html
* QPS loss for MyISAM is similar to InnoDB
* innodb_purge_threads=4, the default, hurts throughput on a server with 4 HW threads
[29 Jun 12:42] Sinisa Milivojevic
Yes, I have read it yesterday when it was published on FB.

Should be interesting to see where the bottleneck is, when Dimitri does profiling. It is evidently not in storage engines.
[10 Jul 17:42] Mark Callaghan
tl;dr - it was me & SSL, not MySQL 5.7.
I think this can be closed as not a bug.

http://smalldatum.blogspot.com/2017/07/i-overstated-cpu-regression-in-mysql-57.html
[10 Jul 18:21] Dimitri Kravtchuk
Mark, I'd rather propose to not close this bug report, but rather change the title as "MySQL is slower on low load workloads" and continue to follow it..

Rgds,
-Dimitri
[11 Jul 13:34] Sinisa Milivojevic
I agree with Dimitri on his suggestion.
[11 Jul 22:44] Mark Callaghan
That is good for me. I plan to start by looking at the regressions from 5.5 to 5.6