Bug #93635 MySQL high load with high system cpu used
Submitted: 17 Dec 2018 8:11 Modified: 19 Apr 14:03
Reporter: victor zheng Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:mysql Ver 14.14 Distrib 5.7.20, for Lin OS:Debian (20)
Assigned to: CPU Architecture:Any
Tags: system cpu high

[17 Dec 2018 8:11] victor zheng
Description:
i run my test with 100 threads and 200 threads seperately

And I foud that  mysql use much system cpu and cause server load to be high.

I seed that many connections to mysql but just very few of them is running

I use the poormanprofiler to see what wrong is with mysql.

the results are showed in below

How to repeat:
1. install the mysql 5.7.20 

2.run my test
* the test is specific serveice

3. and the result turn up again
[17 Dec 2018 8:11] victor zheng
mysql config:

[client]
port        = 3306
socket      = /tmp/mysql.sock

[mysql]
default-character-set = latin1
no-auto-rehash

[mysqld]
server_id=5934

max_connections=3000

datadir=/home/database/data
tmpdir=/home/database/tmp

default_storage_engine=MyISAM
default_tmp_storage_engine=MyISAM

core_file

general_log=OFF

slow_query_log=ON
long_query_time=2

slave-skip-errors = 1062

###replication###
log_bin=/home/database/binlog/localhost-bin
relay-log=/home/database/binlog/localhost-relay-bin

#change sync_binlog 0->1
sync_binlog=1

#change log_slave_updates OFF->ON
log_slave_updates=ON

slave_net_timeout=3600

#change binlog_format STATEMENT->ROW
binlog_format=ROW

#mysql5.7 add
relay_log_recovery=1
master_info_repository=TABLE
relay_log_info_repository=TABLE

###multi_threads_slave###
#mysql5.7 add
#slave_parallel_type=LOGICAL_CLOCK
#slave_parallel_workers=8

###semi_sync###
#rpl_semi_sync_master_enabled=1
#rpl_semi_sync_master_timeout=1000
#rpl_semi_sync_slave_enabled=1

query_cache_size=0
key_buffer_size=128M
myisam_mmap_size=12M

tmp_table_size=2M
sort_buffer_size=2M
max_heap_table_size=2M
join_buffer_size=2M

bulk_insert_buffer_size=2M

explicit_defaults_for_timestamp=ON

###innodb###

innodb_io_capacity=2000
innodb_io_capacity_max=20000
innodb_write_io_threads=16

innodb_data_home_dir=/home/database/innodb
innodb_file_format=Barracuda
innodb_file_format_max=Barracuda
innodb_file_per_table=ON

innodb_buffer_pool_size=128G
innodb_sort_buffer_size=32M

innodb_log_group_home_dir=/home/database/innodb
innodb_log_files_in_group=3
innodb_log_file_size=1024M
innodb_log_buffer_size=128M
innodb_flush_log_at_trx_commit=1
innodb_flush_method=O_DIRECT

innodb_print_all_deadlocks=ON

wait_timeout=604800
interactive_timeout=604800

###mysql5.7###
sql_mode=""
log_timestamps=system
secure_file_priv=''

# group replication
#gtid_mode=ON
#enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE

transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="13855fca-d2ab-11e6-8f37-005056b8286c"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "xxx:13306"
loose-group_replication_group_seeds= "xxxx"
loose-group_replication_bootstrap_group= off

loose-group_replication_single_primary_mode=off
loose-group_replication_enforce_update_everywhere_checks=on

group_replication_ip_whitelist="...";

slave_preserve_commit_order=ON

thread_cached_size = 1024
[17 Dec 2018 8:12] victor zheng
the result 
1. perf top -z
  77.83%  [kernel]                     [k] _raw_spin_lock
   1.06%  [kernel]                     [k] try_to_wake_up
   0.70%  mysqld                       [.] MYSQLparse(THD*)
   0.40%  [kernel]                     [k] futex_wait_setup
   0.29%  mysqld                       [.] my_uca_scanner_next_any
   0.27%  libc-2.19.so                 [.] malloc
   0.21%  [kernel]                     [k] ixgbe_poll

2. poormanprofiler  samples

           23 ??,??,??,free_root,Prepared_statement::~Prepared_statement,Prepared_statement::~Prepared_statement,my_hash_delete,Prepared_statement_map::erase,Prepared_statement::deallocate,mysqld_stmt_close,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
     20 ??,??,malloc,operator,mysqld_stmt_prepare,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
     16 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,<Per_thread_connection_handler::LOCK_thread_cache>,,<Per_thread_connection_handler::LOCK_thread_cache>,,at,handle_connection,pfs_spawn_thread,start_thread,clone
      5 ??,??,??,mem_free,__in_chrg=<optimized,stmt_id=<optimized,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone

           16 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,<Per_thread_connection_handler::LOCK_thread_cache>,,<Per_thread_connection_handler::LOCK_thread_cache>,,at,handle_connection,pfs_spawn_thread,start_thread,clone
      7 ??,??,??,free_root,Prepared_statement::~Prepared_statement,Prepared_statement::~Prepared_statement,my_hash_delete,Prepared_statement_map::erase,Prepared_statement::deallocate,mysqld_stmt_close,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      5 ??,??,malloc,operator,mysqld_stmt_prepare,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      5 ??,??,malloc,my_raw_malloc,size=size@entry=8160,,alloc_root,Item::operator,MYSQLparse,parse_sql,Prepared_statement::prepare,mysqld_stmt_prepare,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      4 ??,??,??,my_realloc,String::mem_realloc,String::replace,String::replace,Prepared_statement::insert_params,Prepared_statement::set_parameters,Prepared_statement::execute_loop,mysqld_stmt_execute,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      4 ??,??,malloc,my_raw_malloc,size=size@entry=216,,my_realloc,String::mem_realloc,String::replace,String::replace,Prepared_statement::insert_params,Prepared_statement::set_parameters,Prepared_statement::execute_loop,mysqld_stmt_execute,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,wait,out>,,reset_sig_count=reset_sig_count@entry=0),srv_worker_thread,start_thread,clone

****** Another test with change the malloc with tcmalloc*****

1. the system cpu used is almost zero and qps is up

2. poormanprofiler samples:

     38 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,<Per_thread_connection_handler::LOCK_thread_cache>,,<Per_thread_connection_handler::LOCK_thread_cache>,,at,handle_connection,pfs_spawn_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,wait,out>,,reset_sig_count=reset_sig_count@entry=0),srv_worker_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,wait,out>,,reset_sig_count=reset_sig_count@entry=0),buf_flush_page_cleaner_worker,start_thread,clone
      1 ??,sigwaitinfo,timer_notify_thread_func,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,wait,out>,,reset_sig_count=reset_sig_count@entry=6282),srv_purge_coordinator_suspend,out>),start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,wait,out>,,reset_sig_count=reset_sig_count@entry=0),buf_resize_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,wait,out>,,reset_sig_count=reset_sig_count@entry=0),buf_dump_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,out>,,out>,,at,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,cond=0x1a2f6ade0),"/home/zhangjy/package/mysql-5.7.20/sql/binlog.cc",,thd=thd@entry=0x1a2fa6000,,next_event,thd=0x1a2fa6000),at,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_error_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,pc_sleep_if_needed,out>),start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,lock_wait_timeout_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,dict_stats_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,native_cond_timedwait,mp=0x1e258c0,"/home/zhangjy/package/mysql-5.7.20/sql/binlog.cc",,<mysql_bin_log>,,wait_with_heartbeat,this=0x7eeff06d4b50),this=0x7eeff06d4b50),log_cache=log_cache@entry=0x7eeff06d45f0,,Binlog_sender::run,mysql_binlog_send,com_binlog_dump,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1 poll,vio_io_wait,vio_socket_io_wait,vio_read,vio_read_buff,net_read_raw_loop,net_read_packet_header,com
[17 Dec 2018 8:15] victor zheng
two more question to ask

1. thread_cached_size is the key config to  affect the system cpu.

the system cpu is always high during the test.  But the thread_created is not increased

2. question about general log

* why open general log decrease the qps . greatly? because the io is not high and cpu is not high where it is opend.

but i see many connection is connected and most is running (they record the gen log?)
[24 Dec 2018 2:01] zhai weixiang
Looks like the problem is related to memory allocator.(we used to see similar problem on production). I'd suggest taking a try of jemalloc. :)
[24 Dec 2018 9:18] victor zheng
@zhaiweixiang

hi, thanks for your reply. 

yes, maybe it is about memory aollocation.

we alsoe solve it by using tcmalloc instead.

but I still have questions  wandering in my mind.

1. the concurrency is just 100-200. it is not too high.
why this heat the threshold of memory malloc?

2.which part of mysql  memory malloc cause this problem?

prepare statement struct?   or what part in MYSQLparse(THD*) function ?
[19 Mar 14:03] Sinisa Milivojevic
Hi,

Thank you for your report.

I must say that this does not look like a bug report, but more like a request for the help with configuration of our server. We do not provide those services here.

Only a couple of notes.

Turn query cache, totally by adding:

query_cache_type = 0
query_cache_limit = 1M

Turn thread cache totally off. Increase tmp_table_size as much as you can, but keep in mind that it will be used concurrently by all threads that need intrinsic temporary tables in order to resolve queries.

You have set buffer pool to 128 Gb. How much memory do you have ????

Last, but not least, you have turned all ACID-related variables ON, so what your profiler shows is expected behaviour. Your disk fsyncing will cause high CPU utilisation by kernel.
[20 Apr 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".