MySQL Bugs: #91690: MySQL Slave server consume all memory and swap compare to other slave and master

Bug #91690	MySQL Slave server consume all memory and swap compare to other slave and master
Submitted:	18 Jul 2018 3:18	Modified:	21 Jul 2018 1:52
Reporter:	Joey Aldrin Cruz	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	5.6.11	OS:	CentOS (6.9 Final)
Assigned to:	MySQL Verification Team	CPU Architecture:	x86 (any)
Tags:	Memory, MySQL, slave, swap

Description:
Hello,

Our set-up is 1 master and a 4 slave running on MHA. So, we have added this slave and apparently, this slave server is the only server on that eats up too much memory and swapping as well, all of the four servers having the same configuration and the same RAM and disk size.

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/LVMGROUP-lv_root
                       30G  3.7G   25G  14% /
tmpfs                 127G     0  127G   0% /dev/shm
/dev/sda1            1008M   27M  931M   3% /boot
/dev/mapper/LVMGROUP-lv_home
                       20G  625M   18G   4% /home
/dev/mapper/LVMGROUP-lv_data
                       99G   60M   94G   1% /mnt/data
/dev/mapper/3624a9370d02f6cbe216f44900001102e
                      1.1T  820G  209G  80% /mnt/mysql
//10.1.88.128/dbbackup01
                       15T   14T  1.9T  88% /mnt/storage

            total       used       free     shared    buffers     cached
Mem:           252        251          0          0          0          2
-/+ buffers/cache:        249          2
Swap:          127        117         10

[mysql]

# CLIENT #
port                           = 3306
socket                         = /mnt/mysql/data/mysql.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /mnt/mysql/data/mysql.sock
pid-file                       = /mnt/mysql/data/mysql.pid
#event_scheduler = ON
skip_name_resolve              = ON
lower_case_table_names         = 1
federated
slave_parallel_workers         = 2

# MyISAM #
key-buffer-size                = 32M
myisam-recover-options         = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000

# DATA STORAGE #
datadir                        = /mnt/mysql/data/

# BINARY LOGGING #
log-bin                        =/mnt/mysql/logbin/mysql-bin
binlog-ignore-db               =mysql
binlog-ignore-db               =test
binlog-ignore-db               =information_schema
binlog_format                  =mixed
expire-logs-days               = 4
sync-binlog                    = 1

# REPLICATION #
relay-log                      = /mnt/mysql/logbin/relay-bin
slave-net-timeout              = 60
sync-master-info               = 10000
sync-relay-log                 = 10000
sync-relay-log-info            = 10000
server-id                      = 152

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 900
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache               = 4096
group_concat_max_len           = 12000
read_rnd_buffer_size           = 16M
sort_buffer_size               = 16M
join_buffer_size               = 32M

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 2G
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 160G
innodb_io_capacity             = 2000
innodb_read_io_threads         = 64
innodb_write_io_threads        = 32
innodb_print_all_deadlocks     = 1
innodb-status-file             = 1
default-time-zone              = '+8:00'

# LOGGING #
log-error                      = /mnt/mysql/mysqllog/mysql-error.log
log-queries-not-using-indexes  = 0
slow-query-log                 = 1
long_query_time                = 2
slow-query-log-file            = /mnt/mysql/mysqllog/mysql-slow.log

# CHARACTER SET #
collation-server               = utf8_unicode_ci
init-connect                   = 'SET NAMES utf8'
character-set-server           = utf8
transaction-isolation = READ-COMMITTED

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

 ps -A --sort -rss -o comm,pmem | head -n 11
COMMAND         %MEM
mysqld          97.7
zabbix_agentd    0.0
rsyslogd         0.0
multipathd       0.0
sshd             0.0
sshd             0.0
sshd             0.0
sshd             0.0
pickup           0.0
sudo             0.0

By the way, I have read a bug that it is about or related with the master_info_repository and relay_log_repository that is set to TABLE. But, this server is already set to FILE.

mysql> show variables like '%repository%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| master_info_repository    | FILE  |
| relay_log_info_repository | FILE  |
+---------------------------+-------+
2 rows in set (0.00 sec)

Let me know what other details you need in order for us to fix this.

How to repeat:
Start server, observe virtual memory increase beyond physical memory in server; OS starts swapping

Suggested fix:
There should be a way to fix this, as I mentioned, Master and other slaves having the same set-up but only this slave server experiencing this, which is the big question for us.

htop_result

Attachment: htop_result.PNG (image/png, text), 109.92 KiB.

global_status_result

Attachment: global status.txt (text/plain), 40.58 KiB.

global_variables_result

Attachment: global_variables.txt (text/plain), 52.23 KiB.

Hi,

please upgrade to a latest 5.6 at least as since 5.6.11 really a lot of bugs are fixed, some of them related to memory usage; especially if you are using stored procedures.

best regards
Bogdan

Hello Bogdan,

I really appreciate your feedback. But, as I've mentioned. All of the four servers are using the same exact version, configuration, disk size and RAM size, but why only this server experiencing this? 
I will post current SWAP usage from top command.

top_result_current

Attachment: top_current_swap.PNG (image/png, text), 15.43 KiB.

current processlist while on 120GB of swap and 97.7% of memory usage

Attachment: current_processlist.txt (text/plain), 1.54 KiB.

Hi,

> All of the four servers are using the same exact version, 
> configuration, disk size and RAM size, but why only this
> server experiencing this? 

So yes, it's a bug. Since 5.6.11 there is 29 version releases each with a number of bug fixes, number of them with memory related bugs.

Just the next release, 5.6.12, has 3 memory leak bugs fixed (Bug #16754776, #16737332, 14685362). Guessing which one of the bugs you might be hitting is pretty futile waste of time as in any case the only solution is to upgrade in order to solve a problem, so when you are at it, upgrade to latest 5.6; or consider 5.7 

kind regards
Bogdan

Hello,

I've conducted more research and found out that even on much higher version there's a memory leak issues. And please enlighten us, because why we don't experiencing this on other 12 servers that have the same exact configuration, MySQL version and RAM,DISK size. To be honest, we really want to find the root cause. Is there any connection with waiting for event coordinator state? And if yes; why on the other servers we have the same, but don't have this kind of issues. Our memory is really big enough if you will try to noticed, but still consumed and the swap 120GB, (which is too high to consume) and with only 8 proces base on the processlist. I hope I can hear more strong and convincing reason on this.

Cheers,

Joey Cruz

Hello,

I'm suspecting that it is because of the slave_parallel_workers which is set to > 1. We will change the value and restart the MySQL and if ever that everything goes well, I'll update this bug report, so that whoever who can read this, they can have an idea on how can they fix it.

Cheers,