Bug #69075 Memory leak slave_parallel_workers
Submitted: 26 Apr 2013 3:07 Modified: 19 Feb 2014 10:39
Reporter: raza lei Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Memory storage engine Severity:S3 (Non-critical)
Version:5.6.11, 5.6.15 OS:Any
Assigned to: CPU Architecture:Any
Tags: memory leak

[26 Apr 2013 3:07] raza lei
Description:
We have 7 masters which version is MySQL 5.6.11 in very busy updating situation. (update: 2000+/s, insert: 600+/s, delete: 200/s, select: 200/s)

The slave is Dell R520 which has 6 SAS 15k HDs in raid 10 and a Intel SSD combined in flashcache, but the delay is increment while using single slave thread (slave_parallel_workers = 0). So we changed to MTS (slave_parallel_workers = 8), the delay is decreasing in 0, but we got memory leak problem on both 5.6.10 and 5.6.11.

The following is single slave thread (slave_parallel_workers = 0), there is no memory leak and the memory keeps at 4.7G:

PID  USER   PR  NI  VIRT   RES   SHR S %CPU %MEM    TIME+   COMMAND
8877 mysql  15  0   5432m  4.7g 7160 S 77.5 20.2  207:58.75  mysqld

But changed to MTS (slave_parallel_workers = 8), we got memory leak, 
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                       
2548 mysql     15   0 11.7g  10g 7168 S 83.7 34.9   1104:27 mysqld 

the following is the output of valgrind --tool=massif:

# grep 0x8C16D9 mysql_leak.log
| ->13.53% (35,246,416B)  0x8C16D9: alloc_root (my_alloc.c:224)
| ->19.93% (60,907,528B)  0x8C16D9: alloc_root (my_alloc.c:224)
| ->26.13% (95,265,384B)  0x8C16D9: alloc_root (my_alloc.c:224)
| ->28.77% (113,263,584B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->31.00% (129,233,992B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->31.52% (132,238,648B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->31.43% (132,372,272B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->31.97% (135,540,640B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->31.85% (135,598,824B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->32.30% (138,154,112B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->32.21% (138,557,944B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->32.64% (140,876,144B) 0x8C16D9: alloc_root (my_alloc.c:224)
| ->32.61% (141,829,944B) 0x8C16D9: alloc_root (my_alloc.c:224)

->46.67% (202,953,289B) 0x8C50F0: my_malloc (my_malloc.c:38)
| ->32.61% (141,829,944B) 0x8C16D9: alloc_root (my_alloc.c:224)
| | ->17.73% (77,093,640B) 0x91D0C7: innobase_create_handler(handlerton*, TABLE_SHARE*, st_mem_root*) (sql_alloc.h:40)
| | | ->17.73% (77,093,640B) 0x58B2E2: get_new_handler(TABLE_SHARE*, st_mem_root*, handlerton*) (handler.cc:442)
| | |   ->08.87% (38,557,224B) 0x7576D1: open_table_from_share(THD*, TABLE_SHARE*, char const*, unsigned int, unsigned int, unsigned int, TABLE*, bool) (table.cc:2102)
| | |   | ->08.87% (38,557,224B) 0x6853A1: open_table(THD*, TABLE_LIST*, Open_table_context*) (sql_base.cc:3047)
| | |   |   ->08.87% (38,557,224B) 0x6884FC: open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) (sql_base.cc:4516)
| | |   |     ->08.87% (38,557,224B) 0x68876F: open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) (sql_base.cc:5586)
| | |   |       ->08.86% (38,515,608B) 0x86556B: Rows_log_event::do_apply_event(Relay_log_info const*) (sql_base.h:472)
| | |   |       | ->08.86% (38,515,608B) 0x8AF111: slave_worker_exec_job(Slave_worker*, Relay_log_info*) (rpl_rli_pdb.cc:1880)
| | |   |       |   ->08.86% (38,515,608B) 0x895039: handle_slave_worker (rpl_slave.cc:4468)
| | |   |       |     ->08.86% (38,515,608B) 0x32A1A0677B: start_thread (in /lib64/libpthread-2.5.so)
| | |   |       |       ->08.86% (38,515,608B) 0x32A12D49AB: clone (in /lib64/libc-2.5.so)

How to repeat:
memory leak
[7 May 2013 18:12] Sveta Smirnova
Bug #69066 was marked as duplicate of this one.
[19 Jun 2013 14:37] MySQL Verification Team
200M+ for open table handlers maybe perfectly normal depending on configuration.
Can we see the entire massif output, as well as:

SHOW GLOBAL WHERE VALUE;
SHOW GLOBAL VARIABLES;
SHOW OPEN TABLES;
SHOW ENGINE PERFORMANCE_SCHEMA STATUS;
[20 Jul 2013 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[6 Feb 2014 8:09] Simon Mudd
I have a server which crashes due to this.

Seen in MySQL-server-5.6.15-1.el6.x86_64.  So please update the version affected as I can't do this.

CentOS 6 kernel says on a 96 GB dedicated server.:
Feb  6 00:53:38 myserver kernel: Out of memory: Kill process 20381 (mysqld) score 982 or sacrifice child
Feb  6 00:53:38 myserver kernel: Killed process 20381, UID 158, (mysqld) total-vm:102792932kB, anon-rss:97162236kB, file-rss:976kB

So the memory leak takes down a MEM slave using (slave_parallel_workers = 10) in a few days. For now applying a workaround of disabling the value.

If you need further feedback from me please let me know.
[6 Feb 2014 17:07] Sveta Smirnova
Simon,

please send information, requested by Shane:

 [19 Jun 2013 14:37] Shane Bester

200M+ for open table handlers maybe perfectly normal depending on configuration.
Can we see the entire massif output, as well as:

SHOW GLOBAL WHERE VALUE;
SHOW GLOBAL VARIABLES;
SHOW OPEN TABLES;
SHOW ENGINE PERFORMANCE_SCHEMA STATUS;
[7 Feb 2014 11:38] Simon Mudd
ok.

Also note, my configuration uses the following settings which I believe may be relevant:

the master_info_repository    = TABLE
relay_log_info_repository = TABLE

changing the slave_parallel_workers to 0 after doing stop slave, and then doing start slave does not seem to resolve the leak from what I can tell.
[7 Feb 2014 11:49] Luis Soares
See also: BUG#71197.
[7 Feb 2014 11:55] Simon Mudd
output from a slave that has this slave_parallel_worker set

Attachment: log_output (application/octet-stream, text), 243.55 KiB.

[13 Feb 2014 16:27] Aaron Johnson
My testing shows that changing slave workers from a value of greater than 0 back to 0 and restarting the slave does *not* recoup the memory as pointed out in another comment.

I think the severity of this issue should be higher.
[19 Feb 2014 10:39] Jon Stephens
This appears to be a duplicate of Bug #71197, which is fixed in MySQL 5.6.17 and 5.7.4.

Closed.