MySQL Bugs: #69848: mysql 5.6 slave out of memory error ?

Bug #69848	mysql 5.6 slave out of memory error ?
Submitted:	26 Jul 2013 13:51	Modified:	28 Apr 2015 9:11
Reporter:	jianjun yang	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S2 (Serious)
Version:	mysql 5.6.12	OS:	Linux (CentOS 6)
Assigned to:	Venkatesh Duggirala	CPU Architecture:	Any
Tags:	GTID, slave, swap

Description:
os verdion: CentOS 6
db version: mysql5.6.12
memory:16G
swap: 16G

master-slave GTID : master normal. in slave, swap partition used up over 99% , last time used up 100%, free 0M. slave restarted by OS  every 3 days average.

error info in file /var/log/message:
mysqld out of memory , stop by oom_killer.

mysql errorlog info:
 mysqld_safe Number of processes running now: 0

mysql5.5 master-slave has used by over 2 years , not occer error above.

How to repeat:
mysql 5.6.12 source code version, build master/slave GTID.
enable huge page, Linux AIO.

huge page set (/etc/sysctl.conf):
vm.swappiness=5
vm.nr_hugepages=7168   # memory:16G  huge page allocate 14G: 14*1024/2=7168 pages
vm.hugetlb_shm_group=500  # mysql user group

Suggested fix:
free swap partition space quickly.

Thank you for the report.

Please send us your slave configuration file and full error log file.

We too are encountering this problem too:

OS: Ubuntu 14.04.1 LTS x64
MySQL Server: 5.6.20 x64 (Dedicated)
Replication: Looped (Circular) with four masters, 4 slaves.
Server RAM: 16GB
innodb-buffer-pool-size = 4G

--- Begin Cut and Paste from HTOP ---

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
10839 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.00 /usr/sbin/mysqld
10840 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:12.25 /usr/sbin/mysqld
10841 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.57 /usr/sbin/mysqld
10842 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.55 /usr/sbin/mysqld
10843 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.88 /usr/sbin/mysqld
10844 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.80 /usr/sbin/mysqld
10845 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.31 /usr/sbin/mysqld
10846 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.61 /usr/sbin/mysqld
10847 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.60 /usr/sbin/mysqld
10848 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.62 /usr/sbin/mysqld
10849 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.56 /usr/sbin/mysqld
10850 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.52 /usr/sbin/mysqld
10851 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.45 /usr/sbin/mysqld
10852 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.65 /usr/sbin/mysqld
10853 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.78 /usr/sbin/mysqld
10854 mysql      20   0 18.6G 14.9G  1640 S  0.0 95.2  0:00.74 /usr/sbin/mysqld

--- End Cut and Paste from HTOP ---

There is a memory leak. We did discover one area that has reduced how fast the memory leak occurs. When setting max-allowed-packet, the slave-max-allowed-packet must exceed max-allowed-packet or a memory serious memory leak occurs.

We had this setup and it caused mysql to consume large amounts of RAM/SWAP overtime:
max-allowed-packet = 192M
slave-max-allowed-packet = (default)

We changed to this:
max-allowed-packet = 192M
slave-max-allowed-packet = 194M

and it seem to slow the amount of RAM/SWAP consumed over a week. However, we are finding that MySQL still has a memory leak somewhere as the amount of RAM it consumes far exceeds the settings. See our attached configuration file.

Read only slave configuration file consuming RAM

Attachment: atl-mysql01.my.cnf (application/octet-stream, text), 8.11 KiB.

We also noticed that when a MySQL server has consumed a lot of RAM (especially into swap) it takes a very long time for the STOP SLAVE command to finish. Sometimes as long as 15 minutes. This is a problem when a server is trying to reboot. In Ubuntu, if the process doesn't shutdown in five minutes then it's forcibly terminated.

While STOP SLAVE is running and you have (H)TOP running, you can see the SWAP memory usage going down.

Thank you for the feedback.

Verified as described with release server. Debug version crashes with:

2014-12-01 22:02:26 2aaabd6c8940  InnoDB: Assertion failure in thread 46912810813760 in file log0log.cc line 1462
InnoDB: Failing assertion: loop_count < 5

To repeat.

1. Create dummy test case for MTR:

$cat mysql-test/t/bug69848.test 
--source include/master-slave.inc

create table t1(f1 int) engine=innodb;
insert into t1 values(1);
select * from t1;

2. Create bug69848-master.opt and bug69848-slave.opt files with same content:

--skip-external-locking --skip-name-resolve --interactive-timeout=28800 --join-buffer-size=2M --lower-case-table-name=1 --max-allowed-packet=192M --max-connections=200 --max-heap-table-size=64M --max-sp-recursion-depth=25 --open-files-limit=64K --performance-schema=off --query-cache-limit=2M --query-cache-size=64M --query-cache-type=1 --query-prealloc-size=1M --read-buffer-size=1M --sort-buffer-size=2M --table-open-cache=4K --table-open-cache-instances=1 --thread-cache-size=10 --thread-stack=192K --wait-timeout=28800 --myisam-recover=BACKUP --gtid-mode=on --enforce-gtid-consistency=true --binlog-cache-size=256K --binlog-checksum=CRC32 --binlog-format=row --binlog-row-image=minimal --binlog-rows-query-log-events=1 --expire-logs-days=07 --log-bin-trust-function-creators=1 --master-info-repository=TABLE --master-verify-checksum=1 --max-binlog-size=1G --sync-master-info=1 --sync-binlog=0 --binlog-ignore-db=eci_elite_import --binlog-ignore-db=information_schema --binlog-ignore-db=oti_elite_import --binlog-ignore-db=performance_schema --binlog-ignore-db=test --binlog-ignore-db=wbsys --log-slave-updates=1 --read-only=1 --replicate-same-server-id=0 --skip-slave-start=0 --slave-net-timeout=15 --slave-parallel-workers=0  --slave-pending-jobs-size-max=32M  --slave-skip-errors=1053  --slave-sql-verify-checksum=1 --slave-max-allowed-packet=194M --innodb-buffer-pool-instances=2  --innodb-buffer-pool-size=2G --innodb-use-sys-malloc=1 --innodb-log-file-size=1G --innodb-log-buffer-size=8M --innodb-file-per-table=1 --innodb-flush-method=O_DIRECT --innodb-flush-log-at-trx-commit=2 --innodb-lock-wait-timeout=50 --innodb-open-files=4K --innodb-print-all-deadlocks=true --innodb-read-io-threads=64 --innodb-support-xa=0 --innodb-thread-concurrency=0 --innodb-use-sys-malloc=1 --innodb-write-io-threads=64 --innodb-use-native-aio=0 --key-buffer-size=128M

3. Start MTR: ./mtr bug69848 --start &

4. Connect to master (mysql -uroot -h127.0.0.1 -P13000) and run:

grant replication slave on *.* to 'repl'@'%' identified by 'replrepl';

5. Connect to slave and run:

change master to master_user='repl',master_password='replrepl', master_host='127.0.0.1', master_port=13000;
start slave;

6. On master: CREATE DATABASE sbtest;

7. In parallel window prepare table with sysbench 0.5:

~/blade12/bin/sysbench --test=$HOME/blade12/src/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=1000000 --oltp-tables-count=1 --mysql-db=sbtest --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=13000 prepare

8. On master: RENAME TABLE sbtest1 TO sbtest;

9. Run test:

for i in `seq 1 1 300`; do ~/blade12/bin/sysbench --test=$HOME/blade12/src/sysbench/sysbench/tests/db/oltp_complex_rw.lua --oltp-table-size=100000 --oltp-tables-count=20 --mysql-db=sbtest --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=13000 --num-threads=16 run; done

10. In third window watch permanent memory increase. Or watch assert if you use debug server.

Posted by developer:
 
Debug version of 5.7.6 does not crash, but still leaks memory.

Posted by developer:
 
P_S says this is memory/sql/thd::main_mem_root, allocated by IO thread, which holds memory:

mysql> select THREAD_ID,EVENT_NAME,COUNT_ALLOC,SUM_NUMBER_OF_BYTES_ALLOC,SUM_NUMBER_OF_BYTES_FREE as bf,CURRENT_NUMBER_OF_BYTES_USED as cbu,HIGH_NUMBER_OF_BYTES_USED as hbu from performance_schema.memory_summary_by_thread_by_event_name where SUM_NUMBER_OF_BYTES_ALLOC > 0 and THREAD_ID in (145) and CURRENT_NUMBER_OF_BYTES_USED>0 order by CURRENT_NUMBER_OF_BYTES_USED DESC;
+-----------+-------------------------------------------------+-------------+---------------------------+-------------+------------+------------+
| THREAD_ID | EVENT_NAME                                      | COUNT_ALLOC | SUM_NUMBER_OF_BYTES_ALLOC | bf          | cbu        | hbu        |
+-----------+-------------------------------------------------+-------------+---------------------------+-------------+------------+------------+
|       145 | memory/sql/thd::main_mem_root                   |        1039 |                1104280592 |           0 | 1104280592 | 1104280592 |
|       145 | memory/innodb/std                               |    19378997 |                2848628832 |  2838767096 |    9861736 |    9862208 |
|       145 | memory/sql/NET::buff                            |           3 |                    557077 |       16391 |     540686 |     557077 |
|       145 | memory/innodb/trx0undo                          |       11424 |                   3929856 |     3720360 |     209496 |     209496 |
|       145 | memory/mysys/array_buffer                       |          11 |                     25952 |        8320 |      17632 |      25504 |
|       145 | memory/vio/read_buffer                          |           1 |                     16384 |           0 |      16384 |      16384 |
|       145 | memory/sql/THD::transactions::mem_root          |           1 |                      4112 |           0 |       4112 |       4112 |
|       145 | memory/client/MYSQL                             |           4 |                      1366 |           0 |       1366 |       1366 |
|       145 | memory/client/mysql_options                     |           9 |                       545 |           0 |        545 |        545 |
|       145 | memory/vio/vio                                  |           1 |                       496 |           0 |        496 |        496 |
|       145 | memory/sql/Log_event                            |          11 |                      1510 |        1122 |        388 |        772 |
|       145 | memory/innodb/mem0mem                           |     2405744 |               11220964072 | 11220963816 |        256 |      21928 |
|       145 | memory/sql/String::value                        |           4 |                       232 |          16 |        216 |        216 |
|       145 | memory/sql/THD::Session_tracker                 |           6 |                       209 |           0 |        209 |        209 |
|       145 | memory/sql/Gtid_set::Interval_chunk             |           1 |                       200 |           0 |        200 |        200 |
|       145 | memory/sql/Mutex_cond_array::Mutex_cond         |           1 |                       168 |           0 |        168 |        168 |
|       145 | memory/mysys/lf_slist                           |           5 |                       160 |           0 |        160 |        160 |
|       145 | memory/sql/THD::debug_sync_control              |           1 |                       120 |           0 |        120 |        120 |
|       145 | memory/sql/Owned_gtids::sidno_to_hash           |           1 |                        96 |           0 |         96 |         96 |
|       145 | memory/mysys/lf_node                            |           2 |                        80 |           0 |         80 |         80 |
|       145 | memory/innodb/ha_innodb                         |           1 |                        72 |           0 |         72 |         72 |
|       145 | memory/sql/Sid_map::Node                        |           1 |                        20 |           0 |         20 |         20 |
|       145 | memory/sql/plugin_ref                           |           2 |                        16 |           0 |         16 |         16 |
|       145 | memory/sql/THD::Session_sysvar_resource_manager |           1 |                         8 |           0 |          8 |          8 |
+-----------+-------------------------------------------------+-------------+---------------------------+-------------+------------+------------+
24 rows in set (0.17 sec)

mysql> select * from memory_by_thread_by_current_bytes where thread_id in (145, 146);                                                                                           +-----------+---------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| thread_id | user          | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+-----------+---------------+--------------------+-------------------+-------------------+-------------------+-----------------+
|       145 | sql/slave_io  |             412396 | 1.04 GiB          | 2.64 KiB          | 1.03 GiB          | 17.10 GiB       |
|       146 | sql/slave_sql |              26083 | 1.79 MiB          | 72 bytes          | 1.00 MiB          | 12.03 GiB       |
+-----------+---------------+--------------------+-------------------+-------------------+-------------------+-----------------+
2 rows in set (2.79 sec)

5.6.23 heap profile saying this grows forever: parent_lex->push_context(&context);

Attachment: mybin.hprof.3825.heap.pdf (application/pdf, text), 9.28 KiB.

corresponding pprof --text output of the above profile

Attachment: mybin.hprof.3825.heap.txt (text/plain), 107.93 KiB.

Just a comment about the procedure to verify the bug.  I had used slightly different sysbench parametesr than sveta listed.

sysbench --test=./tests/db/oltp.lua --oltp-table-size=100000 --oltp-tables-count=10 --mysql-db=sbtest --mysql-user=root --mysql-host=192.168.1.2 --mysql-port=3306 prepare

for i in `seq 1 1 300`; do sysbench --test=./tests/db/oltp.lua --oltp-table-size=100000 --oltp-tables-count=10 --mysql-db=sbtest --mysql-user=root --mysql-host=192.168.1.2 --mysql-port=3306 --num-threads=16 run; done

In my case:

sync_master_info=0 <--- no leak
sync_master_info=1 <--- leaks

Looks like related or duplicate of http://bugs.mysql.com/bug.php?id=72885 ?

My version is 5.6.21, We are seeing the leak when 

relay-log-info-repository        = TABLE 
master-info-repository           = TABLE
sync-master-info                 = 1

We changed to 

relay-log-info-repository        = FILE
master-info-repository           = FILE

The situation is improved.

seems this is releated to 

relay-log-info-repository        = TABLE
master-info-repository           = TABLE

no matter RBR OR SBR

Bug#72885 is marked as duplicate of this bug.

Thanks for your feedback, this has been fixed in upcoming versions and the following was added to the changelog:
When master_info_repository=TABLE the receiver thread stores received event information in a table. The memory used in the process of updating the table was not being freed correctly and this could lead to an out of memory error. The fix ensures that after an event is flushed to the relay log file by a receiver thread, the memory used is freed.

This is fixed in 5.6.25 where the release note at http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-25.html describes the fix as:

"Replication: When master_info_repository=TABLE the receiver thread stores received event information in a table. The memory used in the process of updating the table was not being freed correctly and this could lead to an out of memory error. The fix ensures that after an event is flushed to the relay log file by a receiver thread, the memory used is freed. (Bug #72885, Bug #19390463, Bug #69848, Bug #20124342)"

James Day, MySQL Senior Principal Support Engineer, Oracle

commit 2401e62aed36c129b5cefe80143cff8fa5729d82
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date:   Wed Apr 1 15:19:33 2015 +0530

    Bug#20124342 MYSQL 5.6 SLAVE OUT OF MEMORY ERROR ?
    Problem: I/O thread on Slave with master_info_repository=TABLE settings,
    is leaking memory that leads to Out of memory (OOM) after some time.
    
    Analysis: SQL thread does free_root(thd->mem_root,...) after every
    Query_log_event and Row_log_event execution. Unlike SQL thread,
    I/O thread was not doing free_root of thd->mem_root in it's life time.
    In case if any of the I/O thread operations are using memory from this
    mem_root and if the same operation keeps repeating frequently, that
    will increase memory allocation from this mem_root rapidly.
    This situation will lead to OOM error in some time.
    
    For eg: In this bug scenario, when master_info_repository is set to TABLE,
    IO thread calls lex_start() in Rpl_info_table::do_flush_info
    operation which uses thread's memory root. If sync_master_info=1,
    the frequency of this operation is more and will lead to OOM error
    in very short time.
    
    The memory growth will be gone if the IO thread is stopped
    because we have free_root(thd->mem_root) in THD::~THD. But in the
    real production environment, no one will be interested in
    restarting the IO thread. And this is the reason the similar issue
    will not appear in a normal client thread because eventually
    a client thread will be exited
    
    
    Fix: After event is flushed to relay log file by IO thread, memory used
    by this mem_root is not required. Hence adding free_root(thd->mem_root,...)
    after every event is flushed.