Bug #71197 | mysql slave sql thread memory leak | ||
---|---|---|---|
Submitted: | 21 Dec 2013 8:48 | Modified: | 19 Feb 2014 10:38 |
Reporter: | qinglin zhang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.6.15, 5.6.14, 5.6.13, 5.6.16, 5.7.3 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution, mysql memory leak |
[21 Dec 2013 8:48]
qinglin zhang
[23 Dec 2013 5:54]
qinglin zhang
parameters go like bellow: +------------------------------+-----------------------+ | Variable_name | Value | +------------------------------+-----------------------+ | init_slave | | | log_slave_updates | ON | | log_slow_slave_statements | OFF | | pseudo_slave_mode | OFF | | rpl_stop_slave_timeout | 31536000 | | slave_allow_batching | OFF | | slave_checkpoint_group | 512 | | slave_checkpoint_period | 300 | | slave_compressed_protocol | OFF | | slave_exec_mode | STRICT | | slave_load_tmpdir | /u01/5615/tmp | | slave_max_allowed_packet | 1073741824 | | slave_net_timeout | 3600 | | slave_parallel_workers | 8 | | slave_pending_jobs_size_max | 1073741824 | | slave_rows_search_algorithms | TABLE_SCAN,INDEX_SCAN | | slave_skip_errors | OFF | | slave_sql_verify_checksum | ON | | slave_transaction_retries | 10 | | slave_type_conversions | | | sql_slave_skip_counter | 0 | +------------------------------+-----------------------+ when I execute stop slave sql_thread, the memory become normal; Has anyone pay attention to this bug ?
[24 Dec 2013 8:32]
qinglin zhang
my configuration is : [client] socket = /u01/5615/run/mysql.sock [innotop] socket = /u01/5615/run/mysql.sock [mysql] auto-rehash [mysqld] sort_buffer_size=1k core-file #### Baes dir #### basedir = /u01/5615 datadir = /u01/5615/data tmpdir = /u01/5615/tmp socket = /u01/5615/run/mysql.sock #### Base configure info #### port = 3406#5615 skip-name-resolve old_passwords = 0 lower_case_table_names = 1 open_files_limit = 65535 read_rnd_buffer_size = 5M max_allowed_packet = 24M max_connect_errors = 50000 max_connections = 5000 max_user_connections = 2000 thread_cache_size=64 #rpl_semi_sync_slave_enabled=1 relay_log_recovery=1 table_open_cache_instances=5 metadata_locks_hash_instances = 32 metadata_locks_cache_size=2048 slave_pending_jobs_size_max = 1073741824 innodb_use_native_aio = 1 innodb_file_format=Barracuda bind-address = 0.0.0.0 innodb_buffer_pool_dump_at_shutdown=1 innodb_buffer_pool_load_at_startup = 1 innodb_print_all_deadlocks = 1 innodb_buffer_pool_instances=4 innodb_online_alter_log_max_size=2G binlog_format=row #### Log info #### log-error = /u01/5615/log/alert.log log-queries-not-using-indexes long_query_time = 1 #### Binary log && Relay log #### log-bin = /u01/5615/log/mysql-bin log-slave-updates = 1 relay-log = /u01/5615/log/mysqld-relay-bin relay-log-index = /u01/5615/log/mysqld-relay-bin.index master-info-file = /u01/5615/log/master.info relay-log-info-file = /u01/5615/log/relay-log.info #expire_logs_days=1 max_binlog_size = 500M max_binlog_cache_size = 2G relay_log_purge=1 ### Server id #### server-id=3406 #innodb default-storage-engine = INNODB innodb_file_per_table = 1 innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout = 100 innodb_buffer_pool_size = 4G innodb_log_buffer_size= 400M innodb_log_file_size = 400M innodb_log_files_in_group = 4 innodb_max_dirty_pages_pct = 50 innodb_thread_concurrency=16 innodb_purge_threads = 4 innodb_io_capacity=1000 innodb_lru_scan_depth=8192 #innodb_flush_method = O_DIRECT_NO_FSYNC innodb_flush_method = O_DIRECT innodb_sort_buffer_size=2M innodb_online_alter_log_max_size=2G performance_schema_instrument = '%=on' performance_schema=ON slave_parallel_workers= 8 relay_log_info_repository='TABLE' master_info_repository='TABLE' enforce_gtid_consistency = OFF gtid_mode = OFF explicit_defaults_for_timestamp=ON #memcache #innodb_api_enable_binlog = 1 #daemon_memcached_option="-p11222" #daemon_memcached_r_batch_size=1 #daemon_memcached_w_batch_size=1 thread_stack = 1524288 [mysqld_safe] pid-file = /u01/5615/run/mysql.pid to repeat it : use 5615 mysql as slave and run sysbench to update,and watch the memory the slave use
[24 Dec 2013 22:03]
Sveta Smirnova
Thank you for the report. Verified as described. To repeat: start MTR as ./mtr --suite=rpl rpl_alter &, create database sbtest and run standard oltp test of sysbench 0.4.12 connected to master, than watch memory usage of slave. Memory usage did not decrease neither after all updates from master were applied, nor after I issued FLUSH TABLES.
[25 Dec 2013 8:48]
qinglin zhang
Thank your very much for your response, I wonder when this can be fixed ?
[30 Dec 2013 12:16]
qinglin zhang
suggested fix: Index: sql/sql_class.h =================================================================== --- sql/sql_class.h (revision 4987) +++ sql/sql_class.h (working copy) @@ -2073,6 +2073,7 @@ public: MDL_context mdl_context; + MEM_ROOT main_mem_root; /* Used to execute base64 coded binlog events in MySQL server */ Relay_log_info* rli_fake; /* Slave applier execution context */ @@ -4074,7 +4075,6 @@ - for prepared queries, only to allocate runtime data. The parsed tree itself is reused between executions and thus is stored elsewhere. */ - MEM_ROOT main_mem_root; Diagnostics_area main_da; Diagnostics_area *m_stmt_da; Index: sql/rpl_slave.cc =================================================================== --- sql/rpl_slave.cc (revision 4987) +++ sql/rpl_slave.cc (working copy) @@ -3729,6 +3729,11 @@ } } + if(ev->ends_group()) + { + free_root(&rli->info_thd->main_mem_root, MYF(MY_KEEP_PREALLOC)); + } + if (error) { /*
[2 Jan 2014 10:39]
qinglin zhang
suggest fix : Index: sql/sql_base.cc =================================================================== --- sql/sql_base.cc (revision 4989) +++ sql/sql_base.cc (working copy) @@ -20,6 +20,7 @@ #include "sql_priv.h" #include "unireg.h" #include "debug_sync.h" +#include "rpl_rli.h" #include "lock.h" // mysql_lock_remove, // mysql_unlock_tables, // mysql_lock_have_duplicate @@ -5790,8 +5791,16 @@ DBUG_ASSERT(thd->lock == 0); // You must lock everything at once TABLE **start,**ptr; - if (!(ptr=start=(TABLE**) thd->alloc(sizeof(TABLE*)*count))) - DBUG_RETURN(TRUE); + if (thd->rli_thd_memory) + { + if (!(ptr=start=(TABLE**) alloc_root(&thd->rli_slave->mts_coor_mem_root, (sizeof(TABLE*)*count)))) + DBUG_RETURN(TRUE); + } + else + { + if (!(ptr=start=(TABLE**) thd->alloc(sizeof(TABLE*)*count))) + DBUG_RETURN(TRUE); + } for (table= tables; table; table= table->next_global) { if (!table->placeholder()) Index: sql/sql_class.h =================================================================== --- sql/sql_class.h (revision 4989) +++ sql/sql_class.h (working copy) @@ -2078,6 +2078,8 @@ /* Slave applier execution context */ Relay_log_info* rli_slave; + /*a sign for rli thread*/ + bool rli_thd_memory; void reset_for_next_command(); /* Constant for THD::where initialization in the beginning of every query. Index: sql/sql_class.cc =================================================================== --- sql/sql_class.cc (revision 4989) +++ sql/sql_class.cc (working copy) @@ -901,6 +901,7 @@ m_statement_psi(NULL), m_idle_psi(NULL), m_server_idle(false), + rli_thd_memory(false), next_to_commit(NULL), is_fatal_error(0), transaction_rollback_request(0), Index: sql/sql_lex.h =================================================================== --- sql/sql_lex.h (revision 4989) +++ sql/sql_lex.h (working copy) @@ -2623,6 +2623,10 @@ { return context_stack.push_front(context); } + bool push_context(Name_resolution_context *context, MEM_ROOT* mem_root) + { + return context_stack.push_front(context, mem_root); + } void pop_context() { Index: sql/sql_lex.cc =================================================================== --- sql/sql_lex.cc (revision 4989) +++ sql/sql_lex.cc (working copy) @@ -27,6 +27,7 @@ #include <m_ctype.h> #include <hash.h> #include "sp.h" +#include "rpl_rli.h" #include "sp_head.h" #include "sql_table.h" // primary_key_name #include "sql_show.h" // append_identifier @@ -1784,7 +1785,12 @@ thus push_context should be moved to a place where query initialization is checked for failure. */ - parent_lex->push_context(&context); + if (parent_lex->thd->rli_thd_memory) + parent_lex->push_context(&context, + &parent_lex->thd->rli_slave->mts_coor_mem_root); + else + parent_lex->push_context(&context); + cond_count= between_count= with_wild= 0; max_equal_elems= 0; ref_pointer_array.reset(); Index: sql/sql_list.h =================================================================== --- sql/sql_list.h (revision 4989) +++ sql/sql_list.h (working copy) @@ -215,6 +215,20 @@ } return 1; } + inline bool push_front(void *info, MEM_ROOT *mem_root) + { + list_node *node=new(mem_root) list_node(info,first); + if (node) + { + if (last == &first) + last= &node->next; + first=node; + elements++; + return 0; + } + return 1; + } + void remove(list_node **prev) { list_node *node=(*prev)->next; @@ -482,6 +496,8 @@ inline bool push_back(T *a, MEM_ROOT *mem_root) { return base_list::push_back((void *) a, mem_root); } inline bool push_front(T *a) { return base_list::push_front((void *) a); } + inline bool push_front(T *a, MEM_ROOT *mem_root) + { return base_list::push_front((void *) a, mem_root); } inline T* head() {return (T*) base_list::head(); } inline T** head_ref() {return (T**) base_list::head_ref(); } inline T* pop() {return (T*) base_list::pop(); } Index: sql/log_event.h =================================================================== --- sql/log_event.h (revision 4989) +++ sql/log_event.h (working copy) @@ -1449,8 +1449,8 @@ */ virtual List<char>* get_mts_dbs(MEM_ROOT *mem_root) { - List<char> *res= new List<char>; - res->push_back(strdup_root(mem_root, get_db())); + List<char> *res= new(mem_root) List<char>; + res->push_back(strdup_root(mem_root, get_db()), mem_root); return res; } @@ -2113,12 +2113,12 @@ */ virtual List<char>* get_mts_dbs(MEM_ROOT *mem_root) { - List<char> *res= new (mem_root) List<char>; + List<char> *res= new(mem_root) List<char>; if (mts_accessed_dbs == OVER_MAX_DBS_IN_EVENT_MTS) { // the empty string db name is special to indicate sequential applying mts_accessed_db_names[0][0]= 0; - res->push_back((char*) mts_accessed_db_names[0]); + res->push_back((char*) mts_accessed_db_names[0], mem_root); } else { @@ -3336,8 +3336,8 @@ virtual uint8 mts_number_dbs() { return OVER_MAX_DBS_IN_EVENT_MTS; } virtual List<char>* get_mts_dbs(MEM_ROOT *mem_root) { - List<char> *res= new List<char>; - res->push_back(strdup_root(mem_root, "")); + List<char> *res= new(mem_root) List<char>; + res->push_back(strdup_root(mem_root, ""), mem_root); return res; } #endif @@ -3870,7 +3870,7 @@ } virtual List<char>* get_mts_dbs(MEM_ROOT *mem_root) { - List<char> *res= new List<char>; + List<char> *res= new(mem_root) List<char>; const char *db_name= get_db(); if (!rpl_filter->is_rewrite_empty() && !get_flags(TM_REFERRED_FK_DB_F)) @@ -3883,7 +3883,7 @@ } res->push_back(strdup_root(mem_root, - get_flags(TM_REFERRED_FK_DB_F) ? "" : db_name)); + get_flags(TM_REFERRED_FK_DB_F) ? "" : db_name), mem_root); return res; } Index: sql/rpl_slave.cc =================================================================== --- sql/rpl_slave.cc (revision 4989) +++ sql/rpl_slave.cc (working copy) @@ -5504,7 +5504,7 @@ thd = new THD; // note that contructor of THD uses DBUG_ ! thd->thread_stack = (char*)&thd; // remember where our stack is rli->info_thd= thd; - + rli->info_thd->rli_thd_memory= true; /* Inform waiting threads that slave has started */ rli->slave_run_id++; rli->slave_running = 1;
[9 Jan 2014 10:08]
qinglin zhang
Has anyone pay attention to this problem ?
[19 Jan 2014 10:27]
Dean Herbert
Can replicate this on three production hosts running 5.6.14 and 5.6.15. Occurs when slave-parallel-workers is set to 1 or higher. A consistent 1.2gb leak per day for me. Pretty serious stuff.
[20 Jan 2014 1:58]
qinglin zhang
There will be memory leak when relay_log_info_repository=Table and master_info_repository = Table, there are some reasons for this: 1. Relay thread call get_slave_worker to distribute log_event; 2. Memory used in init_query as well as open_table don't freed when after relay and master info is updated;
[20 Jan 2014 2:12]
zhai weixiang
The memory allocated from the mem_root of distribute thread (rli->info_thd->mem_root) was not freed until executing "STOP SLAVE". Is this the root reason of this problem ?
[21 Jan 2014 13:19]
Aaron Johnson
I am also experiencing this problem. Have tested several slave setting combinations with 5.6.13 to 5.6.15. Stopping and restarting the slave does not free the memory. Tested stopping the slave, changing the number of slave workers to zero, restarting the slave and then stopping and returning to the 12 workers we normally use. This did not free any memory. - Aaron
[22 Jan 2014 1:45]
qinglin zhang
Memory did come back when you stop slave in my case , I tried two machines, why not try the following patch? Index: sql/log_event.cc =================================================================== --- sql/log_event.cc (revision 5149) +++ sql/log_event.cc (working copy) @@ -2692,7 +2692,8 @@ { ulong gaq_idx; rli->mts_groups_assigned++; - + // free memory used by relay thread + free_root(rli->info_thd->mem_root, MYF(MY_KEEP_PREALLOC)); rli->curr_group_isolated= FALSE; group.reset(log_pos, rli->mts_groups_assigned); // the last occupied GAQ's array index
[22 Jan 2014 15:53]
cindy .
We have this issue and are having to do a full mysql restart at least once a week. stop slave;start slave will work for a few times then that will stop freeing any memory and a full restart of mysql is required. Affects all 10 production slaves. This is quite serious as not doing a restart will kick swap in and cause huge performance issues. Setting relay_log_info_repository and master_info_repository to file instead of table does not fix the leak.
[23 Jan 2014 5:35]
qinglin zhang
set the following three parameters may stop the leak: master_info_repository=file relay_log_info_repository=file slave_parallel_workers=0
[23 Jan 2014 18:53]
Sveta Smirnova
Bug #71425 was marked as duplicate of this one.
[7 Feb 2014 11:50]
Luis Soares
See also: BUG#69075.
[19 Feb 2014 10:38]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. Fixed in MySQL 5.6+. Documented in the MySQL 5.6.17 and MySQL 5.7.4 changelogs as follows: When relay_log_info_repository and master_info_repository were both set to TABLE, setting --slave-parallel-workers to 1 or greater and starting the slave caused the slave SQL thread to use but not to release memory until the server was restarted. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html Closed.
[19 Feb 2014 10:41]
Jon Stephens
Bug #69075 was closed as duplicate of this bug.
[20 Feb 2014 2:24]
Kshira Nadarajan
Setting relay_log_info_repository = FILE master_info_repository = FILE with slave_parallel_workers > 0 also didn't work.
[29 Mar 2014 8:14]
Laurynas Biveinis
5.6$ bzr log -r 5824 -n0 ------------------------------------------------------------ revno: 5824 [merge] committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: 5.6-fixes4 timestamp: Tue 2014-02-18 19:52:59 +0200 message: merging 5.6 repo with local Bug18001777 fixing branch. ------------------------------------------------------------ revno: 5822.1.1 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: 5.6-fixes4 timestamp: Tue 2014-02-18 14:34:43 +0200 message: Bug #18001777 MYSQL SLAVE SQL THREAD MEMORY hogging Technically it's not a leak which the reporter admits himself, see the bug report. Source code examination showed that the problem was in usage of an incorrect mem-root. The main THD root of Coordinator thread was never reset. After discussion with reviewer a solution to dismantle the specific root is decided. Instead array of pointers to database names of a maximum size is allocated in the stack which is correct 'cos database names do not get out scope of gets destructed while their pointers are processed. The fixes are verified on memory consumption to prove correctness of the found suspect and the fixing method.
[29 Mar 2014 8:20]
Laurynas Biveinis
5.6$ bzr log -r 5832 ------------------------------------------------------------ revno: 5832 committer: Andrei Elkin <andrei.elkin@oracle.com> branch nick: 5.6-fixes4 timestamp: Fri 2014-02-21 13:37:35 +0200 message: Bug18001777 post-push fixing.
[1 May 2014 16:01]
Dov Endress
I am seeing a similar memory leak in | version | 5.6.16-64.2-56-log | +---------------+--------------------+ repositories set to FILE, 0 parallel workers stopping slave releases the memory
[13 Oct 2014 20:33]
Yang Liu
I am seeing the same issue, we are running V5.6.19 , still see this issue, stop slave did not release the memory, this requires a full mysqld restart. This is causing serious issues for our production servers. Just wonder which version has the fix? 5.6.19 certainly does not have the fix
[13 Oct 2014 21:22]
Yang Liu
Our my.cnf have the following settings: slave_parallel_workers = 0 relay-log-info-repository = TABLE master-info-repository = TABLE relay_log_recovery = ON sync-master-info = 1 This is mysql v5.6.19 , still seeing the issue, stop slave does not release the memory, please reopen the bug, the bug has not been fixed yet
[14 Oct 2014 9:47]
Andrei Elkin
Thank you guys for following up. This bug is really about Multi-Threaded Slave as the report indicates with set global slave_parallel_workers=8; Since your setup specifies the workers=0 you must be hitting some other issue. Fill free to open a new bug and specify steps that reproduce your observations. Thanks.
[5 Jan 2015 2:17]
AmirBehzad Eslami
We have the same issue under MySQL 5.6.20. Memory leaks when parallel workers are enabled.