Bug #68825 | performance regressions for single-threaded workloads | ||
---|---|---|---|
Submitted: | 1 Apr 2013 16:28 | Modified: | 1 Sep 2015 14:38 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.6.10 | OS: | Any |
Assigned to: | Roy Lyseng | CPU Architecture: | Any |
[1 Apr 2013 16:28]
Mark Callaghan
[1 Apr 2013 21:59]
Mark Callaghan
I use my own branch of sysbench -- https://code.launchpad.net/~mdcallag/sysbench/0.4-dev For the update-only test: ./sysbench --batch --batch-delay=10 --test=oltp --oltp-table-size=128000000 --max-time=1800 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-table-name=sbtest1 --oltp-skip-trx --oltp-test-mode=incupdate2 --oltp-dist-type=uniform --oltp-range-size=100 --oltp-connect-delay=0 --percentile=99 --num-threads=1 --seed-rng=1 [ prepare | run ] For the read-only test: ./sysbench --batch --batch-delay=10 --test=oltp --oltp-table-size=128000000 --max-time=1800 --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-table-name=sbtest1 --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-all-cols --oltp-dist-type=uniform --oltp-range-size=100 --oltp-connect-delay=0 --percentile=99 --num-threads=1 --seed-rng=1 [ prepare | run ]
[1 Apr 2013 22:02]
Mark Callaghan
Typical my.cnf for 5.6.10 during my tests: sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES innodb_log_file_size=1900M innodb_flush_method=O_DIRECT innodb_thread_concurrency=0 innodb_max_dirty_pages_pct=80 innodb_file_format=barracuda innodb_file_per_table table-definition-cache=1000 table-open-cache=2000 max_connections=2000 key_buffer_size=200M innodb_flush_log_at_trx_commit=2 innodb_doublewrite=0 query_cache_size=0 query_cache_type=0 innodb_thread_concurrency=0 metadata_locks_hash_instances=256 innodb_checksum_algorithm=CRC32 innodb_buffer_pool_size=64g innodb_io_capacity=8192 innodb_buffer_pool_instances=8 table_open_cache_instances=1 innodb_lru_scan_depth=8192
[2 Apr 2013 1:37]
Mark Callaghan
Results for an update-only workload. The regression is still there and the perf schema makes it worse -- http://mysqlha.blogspot.com/2013/04/mysql-56-single-thread-update-only.html
[2 Apr 2013 17:21]
Mark Callaghan
Repeated tests using HANDLER rather than SELECT to avoid optimizer overhead. InnoDB AHI was also disabled. 5.1.63 does about 3.6% better so the optimizer might account for a bit more than half of the regression. 15968 QPS - 5.1.63 with FB patch 15400 QPS - 5.6.10 without perf schema I also ran "perf record" for 120 seconds during each test and make_join_statistics + JOIN::optimize was about 1.5X more expensive for 5.6.10.
[2 Apr 2013 18:36]
Mark Callaghan
Compiled 5.6 with -DOPTIMIZER_TRACE=0. That didn't make much of a difference. But it looks like there is much more overhead in the optimizer. I used "perf record -p $( pidof mysqld ) -- sleep 120" to collect CPU profiling over 120 seconds when running the test and the output below is from "perf report | grep JOIN". This is for MySQL 5.1.63 with the Facebook patch: 1.08% mysqld mysqld.fs1 [.] JOIN::optimize() 0.96% mysqld mysqld.fs1 [.] _ZL20make_join_statisticsP4JOINP10TABLE_LISTP4ItemP16st_dynamic_array 0.49% mysqld mysqld.fs1 [.] JOIN::prepare(Item***, TABLE_LIST*, unsigned int, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, st_select_lex*, st_select_lex_unit*) 0.38% mysqld mysqld.fs1 [.] JOIN::exec() 0.35% mysqld mysqld.fs1 [.] _ZL18create_ref_for_keyP4JOINP13st_join_tableP8keyuse_ty 0.25% mysqld mysqld.fs1 [.] _ZL14simplify_joinsP4JOINP4ListI10TABLE_LISTEP4Itemb 0.21% mysqld mysqld.fs1 [.] _ZL9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure 0.17% mysqld mysqld.fs1 [.] JOIN::cleanup(bool) 0.17% mysqld mysqld.fs1 [.] _ZL14add_key_fieldsP4JOINPP11key_field_tPjP4ItemyPP17st_sargable_param 0.11% mysqld mysqld.fs1 [.] _ZL8end_sendP4JOINP13st_join_tableb 0.10% mysqld mysqld.fs1 [.] JOIN::destroy() 0.09% mysqld mysqld.fs1 [.] JOIN::join_free() 0.08% mysqld mysqld.fs1 [.] setup_end_select_func(JOIN*) 0.08% mysqld mysqld.fs1 [.] _ZL18make_join_readinfoP4JOINy 0.07% mysqld mysqld.fs1 [.] _ZL12remove_constP4JOINP8st_orderP4ItembPb 0.07% mysqld mysqld.fs1 [.] _ZL17calc_group_bufferP4JOINP8st_order 0.06% mysqld mysqld.fs1 [.] JOIN::alloc_func_list() 0.06% mysqld mysqld.fs1 [.] _ZL13optimize_condP4JOINP4ItemP4ListI10TABLE_LISTEPNS1_11cond_resultE 0.05% mysqld mysqld.fs1 [.] JOIN::cleanup_item_list(List<Item>&) const 0.03% mysqld mysqld.fs1 [.] _ZL12set_positionP4JOINjP13st_join_tableP8keyuse_t 0.03% mysqld mysqld.fs1 [.] select_result::initialize_tables(JOIN*)
[2 Apr 2013 18:38]
Mark Callaghan
This is part 1 for MySQL 5.6.10 with -DOPTIMIZER_TRACE=0 1.45% mysqld mysqld.noopttrc [.] _ZL20make_join_statisticsP4JOINP10TABLE_LISTP4ItemP14Mem_root_arrayI7Key_useLb1EEb 1.09% mysqld mysqld.noopttrc [.] JOIN::optimize() 0.54% mysqld mysqld.noopttrc [.] JOIN::prepare(TABLE_LIST*, unsigned int, Item*, unsigned int, st_order*, st_order*, Item*, st_select_lex*, st_select_lex_unit*) 0.52% mysqld mysqld.noopttrc [.] JOIN::get_best_combination() 0.36% mysqld mysqld.noopttrc [.] create_ref_for_key(JOIN*, st_join_table*, Key_use*, unsigned long long) 0.25% mysqld mysqld.noopttrc [.] _ZL14simplify_joinsP4JOINP4ListI10TABLE_LISTEP4ItembbPS6_Pj 0.19% mysqld mysqld.noopttrc [.] _ZL14add_key_fieldsP4JOINPP9Key_fieldPjP4ItemyPP17st_sargable_param 0.17% mysqld mysqld.noopttrc [.] JOIN::make_tmp_tables_info() 0.16% mysqld mysqld.noopttrc [.] JOIN::exec() 0.13% mysqld mysqld.noopttrc [.] JOIN::make_sum_func_list(List<Item>&, List<Item>&, bool, bool) 0.12% mysqld mysqld.noopttrc [.] _ZL8end_sendP4JOINP13st_join_tableb 0.10% mysqld mysqld.noopttrc [.] is_indexed_agg_distinct(JOIN*, List<Item_field>*) 0.10% mysqld mysqld.noopttrc [.] calc_group_buffer(JOIN*, st_order*) 0.09% mysqld mysqld.noopttrc [.] JOIN::cleanup(bool) 0.09% mysqld mysqld.noopttrc [.] _ZL12remove_constP4JOINP8st_orderP4ItembPbPKc.isra.189 0.09% mysqld mysqld.noopttrc [.] make_join_readinfo(JOIN*, unsigned long long, unsigned int) 0.08% mysqld mysqld.noopttrc [.] JOIN::flatten_subqueries() 0.08% mysqld mysqld.noopttrc [.] JOIN::generate_derived_keys()
[2 Apr 2013 18:39]
Mark Callaghan
This is part 2 from 5.6.10 with -DOPTIMIZER_TRACE=0 0.07% mysqld mysqld.noopttrc [.] JOIN::join_free() 0.07% mysqld mysqld.noopttrc [.] _ZL31setup_semijoin_dups_eliminationP4JOINyj.isra.171 0.07% mysqld mysqld.noopttrc [.] _ZL43optimize_semijoin_nests_for_materializationP4JOIN 0.07% mysqld mysqld.noopttrc [.] JOIN::destroy() 0.07% mysqld mysqld.noopttrc [.] JOIN::update_equalities_for_sjm() 0.06% mysqld mysqld.noopttrc [.] setup_end_select_func(JOIN*, st_join_table*) 0.06% mysqld mysqld.noopttrc [.] JOIN::decide_subquery_strategy() 0.06% mysqld mysqld.noopttrc [.] JOIN::alloc_func_list() 0.06% mysqld mysqld.noopttrc [.] JOIN::cleanup_item_list(List<Item>&) const 0.05% mysqld mysqld.noopttrc [.] JOIN::prepare_result(List<Item>**) 0.05% mysqld mysqld.noopttrc [.] select_result::initialize_tables(JOIN*) 0.05% mysqld mysqld.noopttrc [.] JOIN::optimize_fts_limit_query() 0.04% mysqld mysqld.noopttrc [.] JOIN::cache_const_exprs() 0.04% mysqld mysqld.noopttrc [.] JOIN::drop_unused_derived_keys() 0.04% mysqld mysqld.noopttrc [.] JOIN::set_semijoin_info() 0.04% mysqld mysqld.noopttrc [.] JOIN::set_prefix_tables() 0.03% mysqld mysqld.noopttrc [.] JOIN::optimize_fts_query() 0.03% mysqld mysqld.noopttrc [.] _ZL12set_positionP4JOINjP13st_join_tableP7Key_use.isra.8 0.03% mysqld mysqld.noopttrc [.] update_depend_map(JOIN*) 0.03% mysqld mysqld.noopttrc [.] JOIN::set_access_methods() 0.03% mysqld mysqld.noopttrc [.] JOIN::refine_best_rowcount()
[2 Apr 2013 20:22]
Mark Callaghan
filed http://bugs.mysql.com/bug.php?id=68840 -- this doesn't explain the regression, but it is new code in the optimizer that doesn't do anything for InnoDB
[4 Apr 2013 6:10]
Roy Lyseng
Thank you for the report. I summed the overhead to about 1%, but then grepping for JOIN is not an exact tool for identifying the optimizer functions. We will need to perform more exact measurements in-house to see if we can pinpoint possible performance enhancements in the optimizer.
[22 Apr 2013 6:24]
Roy Lyseng
Doing detailed performance analysis of optimizer benchmarks, and checks impact of some possible changes.
[14 May 2013 19:56]
Mark Callaghan
More results on regressions. For these 4.0.30 is almost 1.5X faster than 5.6.11 on single-threaded inserts and single threaded point queries -- http://mysqlha.blogspot.com/2013/05/mysql-56-single-threaded-performance.html For the inserts the "mysql" client uses almost 3X more CPU in 5.6.11. 5 second flat CPU profiles from linux perf utility are listed below first for 4.0.30 64.70% mysql mysql [.] _ZL10read_linesb 9.66% mysql mysql [.] batch_readline(st_line_buffer*) 7.16% mysql libc-2.13.so [.] __memcpy_ssse3 6.97% mysql libc-2.13.so [.] __strstr_sse42 5.38% mysql [kernel.kallsyms] [k] copy_user_generic_string 1.68% mysql libc-2.13.so [.] __strchr_sse42 1.42% mysql libc-2.13.so [.] __strlen_sse42 0.93% mysql [kernel.kallsyms] [k] intel_pmu_disable_all 0.42% mysql [kernel.kallsyms] [k] put_page 0.28% mysql [kernel.kallsyms] [k] sock_def_readable 0.28% mysql [kernel.kallsyms] [k] default_wake_function 0.28% mysql mysql [.] _ZL12find_commandPcc 0.28% mysql [kernel.kallsyms] [k] find_get_page 0.28% mysql [kernel.kallsyms] [k] _raw_spin_lock 0.28% mysql [kernel.kallsyms] [k] get_page_from_freelist 0.02% mysql [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.00% mysql [kernel.kallsyms] [k] intel_pmu_enable_all And then for 5.6.11: 57.76% mysql mysql [.] _ZL16read_and_executeb 10.66% mysql libc-2.13.so [.] __strstr_sse42 9.72% mysql mysql [.] my_utf8_uni 9.07% mysql mysql [.] my_ismbchar_utf8 3.22% mysql mysql [.] batch_readline(st_line_buffer*, bool) 2.13% mysql [kernel.kallsyms] [k] copy_user_generic_string 1.77% mysql mysql [.] is_prefix 1.48% mysql [kernel.kallsyms] [k] intel_pmu_disable_all 1.37% mysql libc-2.13.so [.] __memcpy_ssse3 0.73% mysql [kernel.kallsyms] [k] _raw_spin_lock 0.30% mysql libc-2.13.so [.] __strlen_sse42 0.29% mysql [kernel.kallsyms] [k] perf_event_task_tick 0.16% mysql mysql [.] _ZL12find_commandPc 0.15% mysql [kernel.kallsyms] [k] idle_cpu 0.15% mysql libc-2.13.so [.] _IO_str_init_static_internal 0.15% mysql mysql [.] strcont 0.15% mysql [kernel.kallsyms] [k] activate_page 0.15% mysql [kernel.kallsyms] [k] update_wall_time 0.15% mysql [kernel.kallsyms] [k] pick_next_task_rt 0.15% mysql [kernel.kallsyms] [k] apic_timer_interrupt 0.15% mysql [kernel.kallsyms] [k] account_user_time 0.15% mysql [kernel.kallsyms] [k] _raw_spin_lock_irq 0.01% mysql [kernel.kallsyms] [k] schedule 0.00% mysql [kernel.kallsyms] [k] intel_pmu_enable_all
[14 May 2013 20:42]
Roy Lyseng
Mark, can you please file a separate bug in category command-line utilities for the MySQL client issue?
[14 May 2013 22:15]
Mark Callaghan
created http://bugs.mysql.com/bug.php?id=69236 for non-optimizer regressions
[29 Jul 2013 11:37]
Roy Lyseng
We have first targeted simple queries uses const tables. Here is a benchmark used for such queries: (Tables are in MyISAM for smaller performance impact across versions). Benchmark BM-1: create table p1(a integer primary key, b integer); insert into p1 values(1,10),(2,20),(3,30); create table p2(a integer primary key, c integer); insert into p2 values(10,100),(20,200),(30,300); select c from p1 join p2 on p1.b=p2.a where p1.a=1; delimiter // create function f1(arg integer) returns integer reads sql data begin declare val integer; select b from p1 where p1.a=arg into @val; return @val; end// create function f2(arg integer) returns integer reads sql data begin declare val integer; select c from p1 join p2 on p1.b=p2.a where p1.a=arg into @val; return @val; end// delimiter ; select benchmark(1000000,f1(1)); select benchmark(1000000,f2(1));
[22 Sep 2013 17:52]
Roy Lyseng
A partial fix has been pushed to release 5.6.14: Please check the patch for this bug: Bug#17234723: Shortcut execution path in JOIN::optimize()
[25 Sep 2013 2:58]
Mark Callaghan
More results for cached & single-threaded sysbench http://mysqlha.blogspot.com/2013/09/mysql-572-single-threaded-performance.html
[3 Oct 2014 6:55]
Roy Lyseng
An updated report that includes 5.7: http://smalldatum.blogspot.co.uk/2014/10/single-thread-performance-in-mysql-575.html
[3 Oct 2014 6:56]
Roy Lyseng
Another update that includes 5.7, the heap engine and more single-row operations: http://smalldatum.blogspot.co.uk/2014/10/low-concurrency-performance-for-point.html
[4 Oct 2014 18:26]
Roy Lyseng
Even more 5.7 benchmarks: PK operations, non-covering and covering secondary index retrievals, for 1 up to 32 threads: http://smalldatum.blogspot.no/2014/10/low-concurrency-performance-for-range.html
[1 Sep 2015 14:38]
Mark Callaghan
regressions continue with mysql 5.7.8 http://smalldatum.blogspot.com/2015/08/single-threaded-linkbench-performance.html
[11 Jul 2018 15:09]
MySQL Verification Team
A duplicate of this bug is also the following one: https://bugs.mysql.com/bug.php?id=91598