Bug #104979 disk full for a long time causes mysqld to be oom killed
Submitted: 18 Sep 2021 1:09 Modified: 20 Sep 2021 12:43
Reporter: Ye Jinrong Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:8.0.26 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[18 Sep 2021 1:09] Ye Jinrong
Description:
After the disk is full for too long, the mysqld process will continue to consume memory and eventually be oom killed

How to repeat:
Build a 3 nodes group replication.

Select a secondary node, and fill the disk artificially.

You can see the following error message in MySQL:

```
2021-09-17T22:45:06.233659+08:00 514 [ERROR] [MY-000035] [Server] Disk is full writing '/data/MySQL/binlog.000038' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2021-09-17T22:45:06.236731+08:00 503 [ERROR] [MY-000035] [Server] Disk is full writing './yejr-mgr3-relay-bin-group_replication_applier.000044' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2021-09-17T22:45:07.829733+08:00 0 [Note] [MY-011953] [InnoDB] Page cleaner took 574317ms to flush 3146 and evict 0 pages
2021-09-17T22:55:06.279530+08:00 514 [ERROR] [MY-000035] [Server] Disk is full writing '/data/MySQL/binlog.000038' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2021-09-17T22:55:06.284646+08:00 503 [ERROR] [MY-000035] [Server] Disk is full writing './yejr-mgr3-relay-bin-group_replication_applier.000044' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2021-09-17T23:05:06.325558+08:00 514 [ERROR] [MY-000035] [Server] Disk is full writing '/data/MySQL/binlog.000038' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2021-09-17T23:05:06.331382+08:00 503 [ERROR] [MY-000035] [Server] Disk is full writing './yejr-mgr3-relay-bin-group_replication_applier.000044' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 s
```

After about two and a half hours, mysqld was oom-killed by the OS because it consumed too much memory:
```
Sep 18 01:44:05 mgr3 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Sep 18 01:44:05 mgr3 kernel: mysqld cpuset=/ mems_allowed=0
Sep 18 01:44:05 mgr3 kernel: CPU: 3 PID: 10513 Comm: mysqld Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1127.19.1.el7.x86_64 #1
Sep 18 01:44:05 mgr3 kernel: Hardware name: Smdbmds KVM, BIOS seabios-1.9.1-qemu-project.org 04/01/2014
Sep 18 01:44:05 mgr3 kernel: Call Trace:
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f7ffa5>] dump_stack+0x19/0x1b
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f7a8c3>] dump_header+0x90/0x229
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1906d32>] ? ktime_get_ts64+0x52/0xf0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19c251e>] oom_kill_process+0x25e/0x3f0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1933a51>] ? cpuset_mems_allowed_intersects+0x21/0x30
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19c1f7d>] ? oom_unkillable_task+0xcd/0x120
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19c2026>] ? find_lock_task_mm+0x56/0xc0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19c2d76>] out_of_memory+0x4b6/0x4f0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f7b3e0>] __alloc_pages_slowpath+0x5db/0x729
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19c91f6>] __alloc_pages_nodemask+0x436/0x450
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1a18ea8>] alloc_pages_current+0x98/0x110
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19be427>] __page_cache_alloc+0x97/0xb0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19c0fe0>] filemap_fault+0x270/0x420
Sep 18 01:44:05 mgr3 kernel: [<ffffffffc023b706>] ext4_filemap_fault+0x36/0x50 [ext4]
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19edf6a>] __do_fault.isra.61+0x8a/0x100
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19ee51c>] do_read_fault.isra.63+0x4c/0x1b0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa19f5d80>] handle_mm_fault+0xa20/0xfb0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1912a70>] ? futex_wake+0x90/0x180
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f8d653>] __do_page_fault+0x213/0x500
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f8da26>] trace_do_page_fault+0x56/0x150
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f8cfa2>] do_async_page_fault+0x22/0xf0
Sep 18 01:44:05 mgr3 kernel: [<ffffffffa1f897a8>] async_page_fault+0x28/0x30
Sep 18 01:44:05 mgr3 kernel: Mem-Info:
Sep 18 01:44:05 mgr3 kernel: active_anon:3904454 inactive_anon:55 isolated_anon:0#012 active_file:6289 inactive_file:10537 isolated_file:0#012 unevictable:1984 dirty:0 writeback:1 unstable:0#012 slab_reclaimable:52114 slab_unreclaimable:9647#012 mapped:2695 shmem:192 pagetables:9727 bounce:0#012 free:33832 free_pcp:1 free_cma:0
Sep 18 01:44:05 mgr3 kernel: Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 18 01:44:05 mgr3 kernel: lowmem_reserve[]: 0 2830 15866 15866
Sep 18 01:44:05 mgr3 kernel: Node 0 DMA32 free:64036kB min:12044kB low:15052kB high:18064kB active_anon:2658168kB inactive_anon:84kB
active_file:6068kB inactive_file:14416kB unevictable:644kB isolated(anon):0kB isolated(file):0kB present:3129192kB managed:2898760kB
mlocked:644kB dirty:0kB writeback:0kB mapped:3732kB shmem:148kB slab_reclaimable:119100kB slab_unreclaimable:6864kB kernel_stack:896k
B pagetables:6008kB unstable:0kB bounce:0kB free_pcp:216kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:35508 all_unrec
laimable? yes
Sep 18 01:44:05 mgr3 kernel: lowmem_reserve[]: 0 0 13035 13035
Sep 18 01:44:05 mgr3 kernel: Node 0 Normal free:55400kB min:55468kB low:69332kB high:83200kB active_anon:12959648kB inactive_anon:136
kB active_file:19088kB inactive_file:27412kB unevictable:7292kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:1335
1508kB mlocked:7292kB dirty:0kB writeback:4kB mapped:7048kB shmem:620kB slab_reclaimable:89356kB slab_unreclaimable:31708kB kernel_st
ack:5984kB pagetables:32900kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:79103 a
ll_unreclaimable? yes
Sep 18 01:44:05 mgr3 kernel: lowmem_reserve[]: 0 0 0 0
Sep 18 01:44:05 mgr3 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*
2048kB (M) 3*4096kB (M) = 15892kB
Sep 18 01:44:05 mgr3 kernel: Node 0 DMA32: 666*4kB (UEM) 351*8kB (UEM) 871*16kB (UE) 316*32kB (UEM) 189*64kB (UEM) 96*128kB (UEM) 23*
256kB (UE) 9*512kB (UEM) 0*1024kB 0*2048kB 0*4096kB = 64400kB
Sep 18 01:44:05 mgr3 kernel: Node 0 Normal: 4335*4kB (UEM) 4790*8kB (EM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048
kB 0*4096kB = 55660kB
Sep 18 01:44:05 mgr3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 18 01:44:05 mgr3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 18 01:44:05 mgr3 kernel: 18478 total pagecache pages
Sep 18 01:44:05 mgr3 kernel: 0 pages in swap cache
Sep 18 01:44:05 mgr3 kernel: Swap cache stats: add 0, delete 0, find 0/0
Sep 18 01:44:05 mgr3 kernel: Free swap  = 0kB
Sep 18 01:44:05 mgr3 kernel: Total swap = 0kB
Sep 18 01:44:05 mgr3 kernel: 4194168 pages RAM
Sep 18 01:44:05 mgr3 kernel: 0 pages HighMem/MovableOnly
Sep 18 01:44:05 mgr3 kernel: 127624 pages reserved
Sep 18 01:44:05 mgr3 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 18 01:44:05 mgr3 kernel: [  398]     0   398    20103      363      44        0             0 systemd-journal
Sep 18 01:44:05 mgr3 kernel: [  551]    81   551    15082      465      35        0          -900 dbus-daemon
Sep 18 01:44:05 mgr3 kernel: [  564]   998   564     2145      165      10        0             0 lsmd
Sep 18 01:44:05 mgr3 kernel: [  567]   999   567   153634     2522      63        0             0 polkitd
Sep 18 01:44:05 mgr3 kernel: [  570]     0   570     6716      445      18        0             0 systemd-logind
Sep 18 01:44:05 mgr3 kernel: [  576]     0   576     1097      130       8        0             0 acpid
Sep 18 01:44:05 mgr3 kernel: [ 1105]     0  1105    14257     1945      34        0           -17 iscsid
Sep 18 01:44:05 mgr3 kernel: [ 1218]     0  1218     6477      202      18        0             0 atd
Sep 18 01:44:05 mgr3 kernel: [ 1683]     0  1683    28231      556      57        0         -1000 sshd
Sep 18 01:44:05 mgr3 kernel: [29211]     0 29211    31605      192      18        0             0 crond
Sep 18 01:44:05 mgr3 kernel: [29367]     0 29367    11416      248      23        0         -1000 systemd-udevd
Sep 18 01:44:05 mgr3 kernel: [ 9660]     0  9660    31210       83      27        0             0 lvmetad
Sep 18 01:44:05 mgr3 kernel: [ 9737]     0  9737   143572     2850      97        0             0 tuned
Sep 18 01:44:05 mgr3 kernel: [ 9970]     0  9970   183311      491     173        0             0 rsyslogd
Sep 18 01:44:05 mgr3 kernel: [14131]     0 14131    25735      518      48        0             0 dhclient
Sep 18 01:44:05 mgr3 kernel: [ 8722]     0  8722     7336     1920      20        0             0 tmux
Sep 18 01:44:05 mgr3 kernel: [ 8723]     0  8723    29401      616      13        0             0 bash
Sep 18 01:44:05 mgr3 kernel: [15371]     0 15371    29479      702      13        0             0 bash
Sep 18 01:44:05 mgr3 kernel: [23213]     0 23213   319408     7271     112        0          -500 dockerd-current
Sep 18 01:44:05 mgr3 kernel: [23221]     0 23221   195421     2664      52        0          -500 docker-containe
Sep 18 01:44:05 mgr3 kernel: [25883]     0 25883    27553       33      10        0             0 agetty
Sep 18 01:44:05 mgr3 kernel: [ 1684]     0  1684    38919     1647      29        0             0 barad_agent
Sep 18 01:44:05 mgr3 kernel: [ 1690]     0  1690    41300     1882      34        0             0 barad_agent
Sep 18 01:44:05 mgr3 kernel: [ 1692]     0  1692   169809     3049      50        0             0 barad_agent
Sep 18 01:44:05 mgr3 kernel: [24500]     0 24500    24351       97      18        0             0 sgagent
Sep 18 01:44:05 mgr3 kernel: [11389]     0 11389   259333    10156      82        0             0 YDService
Sep 18 01:44:05 mgr3 kernel: [11410]     0 11410   282336     2689      42        0             0 YDLive
Sep 18 01:44:05 mgr3 kernel: [24729]    27 24729  4175201  3665256    7389        0             0 mysqld
Sep 18 01:44:05 mgr3 kernel: [10482]    27 10482   706964   195789    1018        0             0 mysqld
Sep 18 01:44:05 mgr3 kernel: [10969]     0 10969    41685      350      33        0             0 mysql
Sep 18 01:44:05 mgr3 kernel: [10747]     0 10747    46147      263      45        0             0 crond
Sep 18 01:44:05 mgr3 kernel: [10748]     0 10748    46147      263      45        0             0 crond
Sep 18 01:44:05 mgr3 kernel: [10749]     0 10749     1067       11       7        0             0 flock
Sep 18 01:44:05 mgr3 kernel: [10750]     0 10750    28321       85      11        0             0 sa1
Sep 18 01:44:05 mgr3 kernel: Out of memory: Kill process 24729 (mysqld) score 903 or sacrifice child
Sep 18 01:44:05 mgr3 kernel: Killed process 24729 (mysqld), UID 27, total-vm:16700804kB, anon-rss:14661024kB, file-rss:0kB, shmem-rss:0kB
```

my.cnf
```
open_files_limit    = 65535
back_log = 1024
max_connections = 512
max_connect_errors = 1000000
table_open_cache = 1024
table_definition_cache = 1024
thread_stack = 512K
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 8M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 64M
thread_cache_size = 768
tmp_table_size = 32M
max_heap_table_size = 32M

sync_binlog = 1
binlog_cache_size = 4M
max_binlog_cache_size = 2G
max_binlog_size = 1G

key_buffer_size = 32M
myisam_sort_buffer_size = 128M

innodb_buffer_pool_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 32M
innodb_open_files = 65534
innodb_sort_buffer_size = 67108864
```
[18 Sep 2021 1:17] Ye Jinrong
There is another mysqld process on this server, pid=10482, which is a secondary node of another mgr cluster, but it is not killed because it does not consume too much memory
[20 Sep 2021 10:02] MySQL Verification Team
Hi,

This is not a bug, MySQL Server is storing everything it can in RAM hoping for you to release disk space so it can dump it back to disk and continue normal operation. Eventually it will die (or be killed by kernel) but really, you should have system in place to prevent full disk to ever happen. Changing the existing behavior was rejected by the dev team and explained in details many times.

Thanks
[20 Sep 2021 12:43] Ye Jinrong
I think it is more reasonable to set an max limit of transaction cache size and close or exit mgr after exceeding it
[20 Sep 2021 13:11] MySQL Verification Team
Hi,

Everyone have their own idea on what is most reasonable in this situation. Honestly, the only reasonable thing here is to have proper server monitoring and never be in a situation that you do not have free disk space.