Bug #85399 Mysql5.7 Group replication node recovering OOM kill mysql server
Submitted: 10 Mar 2017 8:57 Modified: 1 Sep 2017 21:22
Reporter: make mark Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:mysql5.7.17 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: mysql-server group replication

[10 Mar 2017 8:57] make mark
Description:
when I start_groupreplication, the mysql-server node  recovering from  group,
but always oom kill mysql server,and then the server start auto,

mysql error log:
2017-03-10T16:04:20.198114+08:00 36 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@mysql001:3306',replication started in log 'FIRST' at position 4
2017-03-10T16:33:22.202022+08:00 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-03-10T16:33:22.202230+08:00 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2017-03-10T16:33:22.202926+08:00 0 [Note] /usr/local/mysql/bin/mysqld (mysqld 5.7.17-log) starting as process 7043 ...

os information:
CentOS release 6.5 (Final)
            total       used       free     shared    buffers     cached
Mem:          7870       6262       1607          0         19        942
-/+ buffers/cache:       5300       2569
Swap:         8191        377       7814

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
vm.swappiness = 0
net.ipv4.ip_forward = 1
net.ipv4.tcp_max_syn_backlog = 819200
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.ip_local_port_range = 1024  65535
net.ipv4.tcp_max_tw_buckets = 40000
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_reordering = 5
net.ipv4.tcp_retrans_collapse = 0

mysql.cnf:
[client]
port = 3306
socket = /data/mysql/3306/mysql.sock
default-character-set = utf8
character-set-server = utf8
[mysql]
no-auto-rehash
prompt = "\\u@\\d \\R:\\m> "
[mysqld]
server-id = 13321
port = 3306
user = mysql
basedir = /usr/local/mysql
datadir = /data/mysql/3306/
tmpdir = /tmp
socket = /data/mysql/3306/mysql.sock
pid-file = /data/mysql/3306/mysql.pid
skip-name-resolve
default-storage-engine = InnoDB
character-set-client-handshake = FALSE  
connect_timeout = 20
back_log = 600
log_bin_trust_function_creators = 1
log_timestamps=SYSTEM
log-bin = /data/mysql/3306/mysql-bin
binlog_format = row
max_binlog_size = 512M
binlog_cache_size = 2M
expire-logs-days = 7
relay-log = /data/mysql/3306/relay-bin
slave-net-timeout = 20
skip-slave-start
log_slave_updates = 1
relay_log_recovery = 1
sync_master_info = 1
sync_relay_log_info = 1
master_info_repository = TABLE
relay_log_info_repository = TABLE
gtid_mode=ON
enforce_gtid_consistency=1
binlog_checksum=NONE
transaction_write_set_extraction=XXHASH64
group_replication_group_name="faa4183c-017c-11e7-b1c9-52540082ad2e"
loose-group_replication_start_on_boot = off
loose-group_replication_local_address ="192.168.16.60:33060"
loose-group_replication_group_seeds ="192.168.16.60:33060,192.168.16.61:33061,192.168.16.62:33062"
loose-group_replication_bootstrap_group = off
slow_query_log = 1
slow_query_log_file = /data/mysql/3306/mysql.slow
long_query_time = 1
log-error = /data/mysql/3306/error.log
max_connections = 4000
max_user_connections = 1200
max_connect_errors = 100000
key_buffer_size = 64M
max_allowed_packet = 512M
table_open_cache = 6144
table_definition_cache = 4096
sort_buffer_size = 512K
read_buffer_size = 512K
read_rnd_buffer_size = 512k
join_buffer_size = 512K
tmp_table_size = 64M
max_heap_table_size = 128M
query_cache_type = 0
query_cache_size = 0
bulk_insert_buffer_size = 32M
thread_cache_size = 64
thread_stack = 256K
innodb_data_home_dir = /data/mysql/3306/
innodb_data_file_path = ibdata1:1024M:autoextend
innodb_buffer_pool_size = 4722M
innodb_buffer_pool_instances = 8
innodb_log_file_size = 1024M
innodb_log_buffer_size = 64M
innodb_log_files_in_group = 3
innodb_lock_wait_timeout = 30
innodb_spin_wait_delay = 96
innodb_thread_concurrency = 0
innodb_max_dirty_pages_pct = 75
innodb_support_xa = 0
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_read_io_threads =  4
innodb_write_io_threads = 4
innodb_flush_neighbors = 1
innodb_file_per_table = 1
innodb_file_format = Barracuda
innodb_file_format_max = Barracuda
innodb_strict_mode = 1
innodb_purge_threads = 2
innodb_purge_batch_size = 32
innodb_old_blocks_pct = 37
innodb_old_blocks_time = 1000
innodb_change_buffering = all
innodb_stats_on_metadata = off
transaction_isolation = READ-COMMITTED
innodb_monitor_enable=all
performance_schema=ON
sync_binlog = 0
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_load_at_startup = 1
[mysqldump]
quick
max_allowed_packet = 512M
[myisamchk]
key_buffer_size = 64M
sort_buffer_size = 256k
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
[mysqld_safe]
open-files-limit = 65535

How to repeat:
when I start_groupreplication, the mysql-server node  recovering from  group,
but always oom kill mysql server,and then the server start auto,
Mar 10 14:47:10 localhost kernel: Adding 8388600k swap on /dev/sda2.  Priority:-1 extents:1 across:8388600k 
Mar 10 16:33:19 localhost kernel: mysqld invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Mar 10 16:33:19 localhost kernel: mysqld cpuset=/ mems_allowed=0
Mar 10 16:33:19 localhost kernel: Pid: 6996, comm: mysqld Not tainted 2.6.32-431.el6.x86_64 #1
Mar 10 16:33:19 localhost kernel: Call Trace:
Mar 10 16:33:19 localhost kernel: [<ffffffff810d05b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Mar 10 16:33:19 localhost kernel: [<ffffffff81122960>] ? dump_header+0x90/0x1b0
Mar 10 16:33:19 localhost kernel: [<ffffffff8122798c>] ? security_real_capable_noaudit+0x3c/0x70
Mar 10 16:33:19 localhost kernel: [<ffffffff81122de2>] ? oom_kill_process+0x82/0x2a0
Mar 10 16:33:19 localhost kernel: [<ffffffff81122d21>] ? select_bad_process+0xe1/0x120
Mar 10 16:33:19 localhost kernel: [<ffffffff81123220>] ? out_of_memory+0x220/0x3c0
Mar 10 16:33:19 localhost kernel: [<ffffffff8112fb3c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
Mar 10 16:33:19 localhost kernel: [<ffffffff81167b9a>] ? alloc_pages_vma+0x9a/0x150
Mar 10 16:33:19 localhost kernel: [<ffffffff8114ac9d>] ? handle_pte_fault+0x73d/0xb00
Mar 10 16:33:19 localhost kernel: [<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110
Mar 10 16:33:19 localhost kernel: [<ffffffff8104eeb7>] ? pte_alloc_one+0x37/0x50
Mar 10 16:33:19 localhost kernel: [<ffffffff811832e9>] ? do_huge_pmd_anonymous_page+0xb9/0x3b0
Mar 10 16:33:19 localhost kernel: [<ffffffff8114b350>] ? handle_mm_fault+0x2f0/0x300
Mar 10 16:33:19 localhost kernel: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
Mar 10 16:33:19 localhost kernel: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40
Mar 10 16:33:19 localhost kernel: [<ffffffff8103f9d8>] ? pvclock_clocksource_read+0x58/0xd0
Mar 10 16:33:19 localhost kernel: [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
Mar 10 16:33:19 localhost kernel: [<ffffffff8103ea79>] ? kvm_clock_get_cycles+0x9/0x10
Mar 10 16:33:19 localhost kernel: [<ffffffff810a6177>] ? getnstimeofday+0x57/0xe0
Mar 10 16:33:19 localhost kernel: [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
Mar 10 16:33:19 localhost kernel: [<ffffffff8152a815>] ? page_fault+0x25/0x30
Mar 10 16:33:19 localhost kernel: Mem-Info:
Mar 10 16:33:19 localhost kernel: Node 0 DMA per-cpu:
Mar 10 16:33:19 localhost kernel: CPU    0: hi:    0, btch:   1 usd:   0
Mar 10 16:33:19 localhost kernel: CPU    1: hi:    0, btch:   1 usd:   0
Mar 10 16:33:19 localhost kernel: CPU    2: hi:    0, btch:   1 usd:   0
Mar 10 16:33:19 localhost kernel: CPU    3: hi:    0, btch:   1 usd:   0
Mar 10 16:33:19 localhost kernel: Node 0 DMA32 per-cpu:
Mar 10 16:33:19 localhost kernel: CPU    0: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: CPU    1: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: CPU    2: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: CPU    3: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: Node 0 Normal per-cpu:
Mar 10 16:33:20 localhost kernel: CPU    0: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: CPU    1: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: CPU    2: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: CPU    3: hi:  186, btch:  31 usd:   0
Mar 10 16:33:20 localhost kernel: active_anon:1615161 inactive_anon:335997 isolated_anon:0
Mar 10 16:33:20 localhost kernel: active_file:1515 inactive_file:4850 isolated_file:0
Mar 10 16:33:20 localhost kernel: unevictable:0 dirty:5040 writeback:13 unstable:0
Mar 10 16:33:20 localhost kernel: free:25317 slab_reclaimable:3240 slab_unreclaimable:6881
Mar 10 16:33:20 localhost kernel: mapped:930 shmem:73 pagetables:4790 bounce:0
Mar 10 16:33:20 localhost kernel: Node 0 DMA free:15712kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15312kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 10 16:33:20 localhost kernel: lowmem_reserve[]: 0 3511 8056 8056
Mar 10 16:33:20 localhost kernel: Node 0 DMA32 free:47548kB min:29404kB low:36752kB high:44104kB active_anon:2640408kB inactive_anon:631920kB active_file:148kB inactive_file:696kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596256kB mlocked:0kB dirty:784kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:804kB slab_unreclaimable:260kB kernel_stack:216kB pagetables:2628kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:832 all_unreclaimable? no
Mar 10 16:33:20 localhost kernel: lowmem_reserve[]: 0 0 4545 4545
Mar 10 16:33:20 localhost kernel: Node 0 Normal free:38008kB min:38052kB low:47564kB high:57076kB active_anon:3820236kB inactive_anon:712068kB active_file:5912kB inactive_file:18704kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4654080kB mlocked:0kB dirty:19376kB writeback:52kB mapped:3720kB shmem:292kB slab_reclaimable:12156kB slab_unreclaimable:27264kB kernel_stack:1344kB pagetables:16532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:18368 all_unreclaimable? no
Mar 10 16:33:20 localhost kernel: lowmem_reserve[]: 0 0 0 0
Mar 10 16:33:20 localhost kernel: Node 0 DMA: 2*4kB 1*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15712kB
Mar 10 16:33:20 localhost kernel: Node 0 DMA32: 358*4kB 280*8kB 190*16kB 124*32kB 94*64kB 49*128kB 40*256kB 26*512kB 1*1024kB 0*2048kB 0*4096kB = 47544kB
Mar 10 16:33:20 localhost kernel: Node 0 Normal: 936*4kB 576*8kB 321*16kB 196*32kB 126*64kB 65*128kB 7*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 38448kB
Mar 10 16:33:20 localhost kernel: 22615 total pagecache pages
Mar 10 16:33:20 localhost kernel: 16090 pages in swap cache
Mar 10 16:33:20 localhost kernel: Swap cache stats: add 287677, delete 271587, find 51277/51367
Mar 10 16:33:20 localhost kernel: Free swap  = 8002448kB
Mar 10 16:33:20 localhost kernel: Total swap = 8388600kB
Mar 10 16:33:20 localhost kernel: 2097151 pages RAM
Mar 10 16:33:20 localhost kernel: 82354 pages reserved
Mar 10 16:33:20 localhost kernel: 9256 pages shared
Mar 10 16:33:20 localhost kernel: 1979567 pages non-shared
Mar 10 16:33:20 localhost kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Mar 10 16:33:20 localhost kernel: [  417]     0   417     2728       97   1     -17         -1000 udevd
Mar 10 16:33:20 localhost kernel: [  749]     0   749     2727      104   0     -17         -1000 udevd
Mar 10 16:33:20 localhost kernel: [ 1015]     0  1015     6910      101   1     -17         -1000 auditd
Mar 10 16:33:20 localhost kernel: [ 1031]     0  1031    62271      226   2       0             0 rsyslogd
Mar 10 16:33:20 localhost kernel: [ 1060]     0  1060     2705       77   2       0             0 irqbalance
Mar 10 16:33:20 localhost kernel: [ 1157]    81  1157     5351       74   0       0             0 dbus-daemon
Mar 10 16:33:20 localhost kernel: [ 1186]     0  1186     1020       56   2       0             0 acpid
Mar 10 16:33:20 localhost kernel: [ 1195]    68  1195     9437      188   0       0             0 hald
Mar 10 16:33:20 localhost kernel: [ 1196]     0  1196     5082       52   3       0             0 hald-runner
Mar 10 16:33:20 localhost kernel: [ 1225]     0  1225     5612       43   3       0             0 hald-addon-inpu
Mar 10 16:33:20 localhost kernel: [ 1238]    68  1238     4484       57   3       0             0 hald-addon-acpi
Mar 10 16:33:20 localhost kernel: [ 1259]     0  1259    16651      133   2     -17         -1000 sshd
Mar 10 16:33:20 localhost kernel: [ 1281]     0  1281    27580       59   2       0             0 abrtd
Mar 10 16:33:20 localhost kernel: [ 1289]     0  1289    29325      145   3       0             0 crond
Mar 10 16:33:20 localhost kernel: [ 1302]   201  1302     4444       87   1       0             0 zabbix_agentd
Mar 10 16:33:20 localhost kernel: [ 1308]   201  1308     4444      165   1       0             0 zabbix_agentd
Mar 10 16:33:20 localhost kernel: [ 1309]   201  1309     4444       80   1       0             0 zabbix_agentd
Mar 10 16:33:20 localhost kernel: [ 1310]   201  1310     4444       79   0       0             0 zabbix_agentd
Mar 10 16:33:20 localhost kernel: [ 1311]   201  1311     4444       81   0       0             0 zabbix_agentd
Mar 10 16:33:20 localhost kernel: [ 1312]   201  1312     4447      115   3       0             0 zabbix_agentd
Mar 10 16:33:20 localhost kernel: [ 1318]     0  1318     5385       43   2       0             0 atd
Mar 10 16:33:20 localhost kernel: [ 1338]     0  1338    91683      654   2       0             0 salt-minion
Mar 10 16:33:20 localhost kernel: [ 1352]     0  1352     1016       41   2       0             0 mingetty
Mar 10 16:33:20 localhost kernel: [ 1355]     0  1355     1016       39   0       0             0 mingetty
Mar 10 16:33:20 localhost kernel: [ 1357]     0  1357     1016       40   0       0             0 mingetty
Mar 10 16:33:20 localhost kernel: [ 1360]     0  1360     1016       40   2       0             0 mingetty
Mar 10 16:33:20 localhost kernel: [ 1361]     0  1361     2727       69   0     -17         -1000 udevd
Mar 10 16:33:20 localhost kernel: [ 1363]     0  1363     1016       41   0       0             0 mingetty
Mar 10 16:33:20 localhost kernel: [ 1365]     0  1365     1016       40   1       0             0 mingetty
Mar 10 16:33:20 localhost kernel: [ 2751]     0  2751    26559      129   3       0             0 mysqld_safe
Mar 10 16:33:20 localhost kernel: [ 8244]     0  8244    26558       63   1       0             0 mysqld_safe
Mar 10 16:33:20 localhost kernel: [ 9389]   500  9389  1294542   777457   0       0             0 mysqld
Mar 10 16:33:20 localhost kernel: [27629]   500 27629  2034046  1157176   3       0             0 mysqld
Mar 10 16:33:20 localhost kernel: [ 6671]     0  6671    24585      370   0       0             0 sshd
Mar 10 16:33:20 localhost kernel: [ 6673]     0  6673    27085      168   0       0             0 bash
Mar 10 16:33:20 localhost kernel: [ 7002]     0  7002    25224       69   0       0             0 tailf
Mar 10 16:33:20 localhost kernel: Out of memory: Kill process 27629 (mysqld) score 281 or sacrifice child
Mar 10 16:33:20 localhost kernel: Killed process 27629, UID 500, (mysqld) total-vm:8136184kB, anon-rss:4625544kB, file-rss:3160kB
[1 Sep 2017 21:22] MySQL Verification Team
Hi,

I can't reproduce this with normal setup and tweaking your my.cnf is really outside of bugs scope and something you should discuss with our Support team.

all best
Bogdan

p.s. your config file attm allows mysql to use 22G that's more then you have (lot more) so oom with such config is expected