Bug #69848 | mysql 5.6 slave out of memory error ? | ||
---|---|---|---|
Submitted: | 26 Jul 2013 13:51 | Modified: | 28 Apr 2015 9:11 |
Reporter: | jianjun yang | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S2 (Serious) |
Version: | mysql 5.6.12 | OS: | Linux (CentOS 6) |
Assigned to: | Venkatesh Duggirala | CPU Architecture: | Any |
Tags: | GTID, slave, swap |
[26 Jul 2013 13:51]
jianjun yang
[21 Jul 2014 17:44]
Sveta Smirnova
Thank you for the report. Please send us your slave configuration file and full error log file.
[20 Aug 2014 12:17]
Van Stokes
We too are encountering this problem too: OS: Ubuntu 14.04.1 LTS x64 MySQL Server: 5.6.20 x64 (Dedicated) Replication: Looped (Circular) with four masters, 4 slaves. Server RAM: 16GB innodb-buffer-pool-size = 4G --- Begin Cut and Paste from HTOP --- PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 10839 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.00 /usr/sbin/mysqld 10840 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:12.25 /usr/sbin/mysqld 10841 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.57 /usr/sbin/mysqld 10842 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.55 /usr/sbin/mysqld 10843 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.88 /usr/sbin/mysqld 10844 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.80 /usr/sbin/mysqld 10845 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.31 /usr/sbin/mysqld 10846 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.61 /usr/sbin/mysqld 10847 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.60 /usr/sbin/mysqld 10848 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.62 /usr/sbin/mysqld 10849 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.56 /usr/sbin/mysqld 10850 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.52 /usr/sbin/mysqld 10851 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.45 /usr/sbin/mysqld 10852 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.65 /usr/sbin/mysqld 10853 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.78 /usr/sbin/mysqld 10854 mysql 20 0 18.6G 14.9G 1640 S 0.0 95.2 0:00.74 /usr/sbin/mysqld --- End Cut and Paste from HTOP --- There is a memory leak. We did discover one area that has reduced how fast the memory leak occurs. When setting max-allowed-packet, the slave-max-allowed-packet must exceed max-allowed-packet or a memory serious memory leak occurs. We had this setup and it caused mysql to consume large amounts of RAM/SWAP overtime: max-allowed-packet = 192M slave-max-allowed-packet = (default) We changed to this: max-allowed-packet = 192M slave-max-allowed-packet = 194M and it seem to slow the amount of RAM/SWAP consumed over a week. However, we are finding that MySQL still has a memory leak somewhere as the amount of RAM it consumes far exceeds the settings. See our attached configuration file.
[20 Aug 2014 12:19]
Van Stokes
Read only slave configuration file consuming RAM
Attachment: atl-mysql01.my.cnf (application/octet-stream, text), 8.11 KiB.
[20 Aug 2014 19:33]
Van Stokes
We also noticed that when a MySQL server has consumed a lot of RAM (especially into swap) it takes a very long time for the STOP SLAVE command to finish. Sometimes as long as 15 minutes. This is a problem when a server is trying to reboot. In Ubuntu, if the process doesn't shutdown in five minutes then it's forcibly terminated. While STOP SLAVE is running and you have (H)TOP running, you can see the SWAP memory usage going down.
[1 Dec 2014 23:48]
Sveta Smirnova
Thank you for the feedback. Verified as described with release server. Debug version crashes with: 2014-12-01 22:02:26 2aaabd6c8940 InnoDB: Assertion failure in thread 46912810813760 in file log0log.cc line 1462 InnoDB: Failing assertion: loop_count < 5 To repeat. 1. Create dummy test case for MTR: $cat mysql-test/t/bug69848.test --source include/master-slave.inc create table t1(f1 int) engine=innodb; insert into t1 values(1); select * from t1; 2. Create bug69848-master.opt and bug69848-slave.opt files with same content: --skip-external-locking --skip-name-resolve --interactive-timeout=28800 --join-buffer-size=2M --lower-case-table-name=1 --max-allowed-packet=192M --max-connections=200 --max-heap-table-size=64M --max-sp-recursion-depth=25 --open-files-limit=64K --performance-schema=off --query-cache-limit=2M --query-cache-size=64M --query-cache-type=1 --query-prealloc-size=1M --read-buffer-size=1M --sort-buffer-size=2M --table-open-cache=4K --table-open-cache-instances=1 --thread-cache-size=10 --thread-stack=192K --wait-timeout=28800 --myisam-recover=BACKUP --gtid-mode=on --enforce-gtid-consistency=true --binlog-cache-size=256K --binlog-checksum=CRC32 --binlog-format=row --binlog-row-image=minimal --binlog-rows-query-log-events=1 --expire-logs-days=07 --log-bin-trust-function-creators=1 --master-info-repository=TABLE --master-verify-checksum=1 --max-binlog-size=1G --sync-master-info=1 --sync-binlog=0 --binlog-ignore-db=eci_elite_import --binlog-ignore-db=information_schema --binlog-ignore-db=oti_elite_import --binlog-ignore-db=performance_schema --binlog-ignore-db=test --binlog-ignore-db=wbsys --log-slave-updates=1 --read-only=1 --replicate-same-server-id=0 --skip-slave-start=0 --slave-net-timeout=15 --slave-parallel-workers=0 --slave-pending-jobs-size-max=32M --slave-skip-errors=1053 --slave-sql-verify-checksum=1 --slave-max-allowed-packet=194M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-size=2G --innodb-use-sys-malloc=1 --innodb-log-file-size=1G --innodb-log-buffer-size=8M --innodb-file-per-table=1 --innodb-flush-method=O_DIRECT --innodb-flush-log-at-trx-commit=2 --innodb-lock-wait-timeout=50 --innodb-open-files=4K --innodb-print-all-deadlocks=true --innodb-read-io-threads=64 --innodb-support-xa=0 --innodb-thread-concurrency=0 --innodb-use-sys-malloc=1 --innodb-write-io-threads=64 --innodb-use-native-aio=0 --key-buffer-size=128M 3. Start MTR: ./mtr bug69848 --start & 4. Connect to master (mysql -uroot -h127.0.0.1 -P13000) and run: grant replication slave on *.* to 'repl'@'%' identified by 'replrepl'; 5. Connect to slave and run: change master to master_user='repl',master_password='replrepl', master_host='127.0.0.1', master_port=13000; start slave; 6. On master: CREATE DATABASE sbtest; 7. In parallel window prepare table with sysbench 0.5: ~/blade12/bin/sysbench --test=$HOME/blade12/src/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=1000000 --oltp-tables-count=1 --mysql-db=sbtest --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=13000 prepare 8. On master: RENAME TABLE sbtest1 TO sbtest; 9. Run test: for i in `seq 1 1 300`; do ~/blade12/bin/sysbench --test=$HOME/blade12/src/sysbench/sysbench/tests/db/oltp_complex_rw.lua --oltp-table-size=100000 --oltp-tables-count=20 --mysql-db=sbtest --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=13000 --num-threads=16 run; done 10. In third window watch permanent memory increase. Or watch assert if you use debug server.
[2 Dec 2014 12:04]
Sveta Smirnova
Posted by developer: Debug version of 5.7.6 does not crash, but still leaks memory.
[2 Dec 2014 17:59]
Sveta Smirnova
Posted by developer: P_S says this is memory/sql/thd::main_mem_root, allocated by IO thread, which holds memory: mysql> select THREAD_ID,EVENT_NAME,COUNT_ALLOC,SUM_NUMBER_OF_BYTES_ALLOC,SUM_NUMBER_OF_BYTES_FREE as bf,CURRENT_NUMBER_OF_BYTES_USED as cbu,HIGH_NUMBER_OF_BYTES_USED as hbu from performance_schema.memory_summary_by_thread_by_event_name where SUM_NUMBER_OF_BYTES_ALLOC > 0 and THREAD_ID in (145) and CURRENT_NUMBER_OF_BYTES_USED>0 order by CURRENT_NUMBER_OF_BYTES_USED DESC; +-----------+-------------------------------------------------+-------------+---------------------------+-------------+------------+------------+ | THREAD_ID | EVENT_NAME | COUNT_ALLOC | SUM_NUMBER_OF_BYTES_ALLOC | bf | cbu | hbu | +-----------+-------------------------------------------------+-------------+---------------------------+-------------+------------+------------+ | 145 | memory/sql/thd::main_mem_root | 1039 | 1104280592 | 0 | 1104280592 | 1104280592 | | 145 | memory/innodb/std | 19378997 | 2848628832 | 2838767096 | 9861736 | 9862208 | | 145 | memory/sql/NET::buff | 3 | 557077 | 16391 | 540686 | 557077 | | 145 | memory/innodb/trx0undo | 11424 | 3929856 | 3720360 | 209496 | 209496 | | 145 | memory/mysys/array_buffer | 11 | 25952 | 8320 | 17632 | 25504 | | 145 | memory/vio/read_buffer | 1 | 16384 | 0 | 16384 | 16384 | | 145 | memory/sql/THD::transactions::mem_root | 1 | 4112 | 0 | 4112 | 4112 | | 145 | memory/client/MYSQL | 4 | 1366 | 0 | 1366 | 1366 | | 145 | memory/client/mysql_options | 9 | 545 | 0 | 545 | 545 | | 145 | memory/vio/vio | 1 | 496 | 0 | 496 | 496 | | 145 | memory/sql/Log_event | 11 | 1510 | 1122 | 388 | 772 | | 145 | memory/innodb/mem0mem | 2405744 | 11220964072 | 11220963816 | 256 | 21928 | | 145 | memory/sql/String::value | 4 | 232 | 16 | 216 | 216 | | 145 | memory/sql/THD::Session_tracker | 6 | 209 | 0 | 209 | 209 | | 145 | memory/sql/Gtid_set::Interval_chunk | 1 | 200 | 0 | 200 | 200 | | 145 | memory/sql/Mutex_cond_array::Mutex_cond | 1 | 168 | 0 | 168 | 168 | | 145 | memory/mysys/lf_slist | 5 | 160 | 0 | 160 | 160 | | 145 | memory/sql/THD::debug_sync_control | 1 | 120 | 0 | 120 | 120 | | 145 | memory/sql/Owned_gtids::sidno_to_hash | 1 | 96 | 0 | 96 | 96 | | 145 | memory/mysys/lf_node | 2 | 80 | 0 | 80 | 80 | | 145 | memory/innodb/ha_innodb | 1 | 72 | 0 | 72 | 72 | | 145 | memory/sql/Sid_map::Node | 1 | 20 | 0 | 20 | 20 | | 145 | memory/sql/plugin_ref | 2 | 16 | 0 | 16 | 16 | | 145 | memory/sql/THD::Session_sysvar_resource_manager | 1 | 8 | 0 | 8 | 8 | +-----------+-------------------------------------------------+-------------+---------------------------+-------------+------------+------------+ 24 rows in set (0.17 sec) mysql> select * from memory_by_thread_by_current_bytes where thread_id in (145, 146); +-----------+---------------+--------------------+-------------------+-------------------+-------------------+-----------------+ | thread_id | user | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated | +-----------+---------------+--------------------+-------------------+-------------------+-------------------+-----------------+ | 145 | sql/slave_io | 412396 | 1.04 GiB | 2.64 KiB | 1.03 GiB | 17.10 GiB | | 146 | sql/slave_sql | 26083 | 1.79 MiB | 72 bytes | 1.00 MiB | 12.03 GiB | +-----------+---------------+--------------------+-------------------+-------------------+-------------------+-----------------+ 2 rows in set (2.79 sec)
[12 Dec 2014 23:45]
MySQL Verification Team
5.6.23 heap profile saying this grows forever: parent_lex->push_context(&context);
Attachment: mybin.hprof.3825.heap.pdf (application/pdf, text), 9.28 KiB.
[12 Dec 2014 23:49]
MySQL Verification Team
corresponding pprof --text output of the above profile
Attachment: mybin.hprof.3825.heap.txt (text/plain), 107.93 KiB.
[12 Dec 2014 23:50]
MySQL Verification Team
Just a comment about the procedure to verify the bug. I had used slightly different sysbench parametesr than sveta listed. sysbench --test=./tests/db/oltp.lua --oltp-table-size=100000 --oltp-tables-count=10 --mysql-db=sbtest --mysql-user=root --mysql-host=192.168.1.2 --mysql-port=3306 prepare for i in `seq 1 1 300`; do sysbench --test=./tests/db/oltp.lua --oltp-table-size=100000 --oltp-tables-count=10 --mysql-db=sbtest --mysql-user=root --mysql-host=192.168.1.2 --mysql-port=3306 --num-threads=16 run; done
[13 Dec 2014 7:16]
MySQL Verification Team
In my case: sync_master_info=0 <--- no leak sync_master_info=1 <--- leaks
[13 Dec 2014 7:18]
MySQL Verification Team
Looks like related or duplicate of http://bugs.mysql.com/bug.php?id=72885 ?
[22 Dec 2014 18:33]
Yang Liu
My version is 5.6.21, We are seeing the leak when relay-log-info-repository = TABLE master-info-repository = TABLE sync-master-info = 1 We changed to relay-log-info-repository = FILE master-info-repository = FILE The situation is improved. seems this is releated to relay-log-info-repository = TABLE master-info-repository = TABLE no matter RBR OR SBR
[1 Apr 2015 10:30]
Venkatesh Duggirala
Bug#72885 is marked as duplicate of this bug.
[28 Apr 2015 9:11]
David Moss
Thanks for your feedback, this has been fixed in upcoming versions and the following was added to the changelog: When master_info_repository=TABLE the receiver thread stores received event information in a table. The memory used in the process of updating the table was not being freed correctly and this could lead to an out of memory error. The fix ensures that after an event is flushed to the relay log file by a receiver thread, the memory used is freed.
[2 Jun 2015 3:53]
James Day
This is fixed in 5.6.25 where the release note at http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-25.html describes the fix as: "Replication: When master_info_repository=TABLE the receiver thread stores received event information in a table. The memory used in the process of updating the table was not being freed correctly and this could lead to an out of memory error. The fix ensures that after an event is flushed to the relay log file by a receiver thread, the memory used is freed. (Bug #72885, Bug #19390463, Bug #69848, Bug #20124342)" James Day, MySQL Senior Principal Support Engineer, Oracle
[24 Jun 2015 3:58]
Laurynas Biveinis
commit 2401e62aed36c129b5cefe80143cff8fa5729d82 Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com> Date: Wed Apr 1 15:19:33 2015 +0530 Bug#20124342 MYSQL 5.6 SLAVE OUT OF MEMORY ERROR ? Problem: I/O thread on Slave with master_info_repository=TABLE settings, is leaking memory that leads to Out of memory (OOM) after some time. Analysis: SQL thread does free_root(thd->mem_root,...) after every Query_log_event and Row_log_event execution. Unlike SQL thread, I/O thread was not doing free_root of thd->mem_root in it's life time. In case if any of the I/O thread operations are using memory from this mem_root and if the same operation keeps repeating frequently, that will increase memory allocation from this mem_root rapidly. This situation will lead to OOM error in some time. For eg: In this bug scenario, when master_info_repository is set to TABLE, IO thread calls lex_start() in Rpl_info_table::do_flush_info operation which uses thread's memory root. If sync_master_info=1, the frequency of this operation is more and will lead to OOM error in very short time. The memory growth will be gone if the IO thread is stopped because we have free_root(thd->mem_root) in THD::~THD. But in the real production environment, no one will be interested in restarting the IO thread. And this is the reason the similar issue will not appear in a normal client thread because eventually a client thread will be exited Fix: After event is flushed to relay log file by IO thread, memory used by this mem_root is not required. Hence adding free_root(thd->mem_root,...) after every event is flushed.