Bug #108012 | Data corruption after graceful restart | ||
---|---|---|---|
Submitted: | 28 Jul 2022 7:10 | Modified: | 29 Jul 2022 14:17 |
Reporter: | Aleksei Faians | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 8.0.30 | OS: | Linux |
Assigned to: | MySQL Verification Team | CPU Architecture: | x86 (x86-64) |
[28 Jul 2022 7:10]
Aleksei Faians
[28 Jul 2022 8:17]
Aleksei Faians
Even after successful --initialize subsequent server startup failed with the same error. I managed to get server started after removing this parameter from configuration: innodb_log_file_size = 1G
[28 Jul 2022 8:28]
Aleksei Faians
After a few restarts a new empty just initialized instance got the same issue: 2022-07-28T08:25:57.973231Z 0 [System] [MY-010116] [Server] /opt/mysql/bin/mysqld (mysqld 8.0.30) starting as process 1952662 2022-07-28T08:25:57.983186Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2022-07-28T08:25:58.229729Z 1 [ERROR] [MY-013857] [InnoDB] The redo log file ./#innodb_redo/#ib_redo7 is smaller than 65536 bytes, which indicates it was not generated by InnoDB or become corrupted. Please restore the correct file. 2022-07-28T08:25:58.229761Z 1 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error. 2022-07-28T08:25:58.692173Z 1 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine 2022-07-28T08:25:58.692811Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed. 2022-07-28T08:25:58.692864Z 0 [ERROR] [MY-010119] [Server] Aborting 2022-07-28T08:25:58.693731Z 0 [System] [MY-010910] [Server] /opt/mysql/bin/mysqld: Shutdown complete (mysqld 8.0.30) MySQL Community Server - GPL. This looks like a critical bug to me now.
[28 Jul 2022 17:34]
Jakub Lopuszanski
Hello Aleksei! Thanks for report. Let's focus on the last scenario, which you've mentioned in "28 Jul 8:28" comment. I'd like to reproduce it, so I need a few more details. 1. What was the exact config and command line arguments which you've used for `--initialize`? 2. What does `ls -lh ./#innodb_redo` show right after initialize? 3. What config and command line argument did you use to start the server after initialization? 4. How did you stop the server (by `kill`? or SHUTDOWN or what?) 5. Could you share `ls -lh ./#innodb_redo` results before and after the iteration in which the server lost ability to boot?
[28 Jul 2022 18:03]
Jakub Lopuszanski
So far I've tried the most natural interpretation of the scenario you've provided but I've failed to reproduce the situation: [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: rm -rf ../data/ [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: ./mysqld --initialize-insecure --innodb-log-file-size=1G 2022-07-28T18:00:07.813644Z 0 [System] [MY-013169] [Server] /export/home/tmp/jlopusza/mysql-release/runtime_output_directory/mysqld (mysqld 8.0.30) initializing of server in progress as process 2522136 2022-07-28T18:00:07.819359Z 0 [Warning] [MY-013907] [InnoDB] Deprecated configuration parameters innodb_log_file_size and/or innodb_log_files_in_group have been used to compute innodb_redo_log_capacity=2147483648. Please use innodb_redo_log_capacity instead. 2022-07-28T18:00:07.820396Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2022-07-28T18:00:08.134955Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2022-07-28T18:00:09.017610Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option. [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: ls -lh ../data/#innodb_redo/ total 2.0G -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo0' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo10_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo11_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo12_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo13_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo14_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo15_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo16_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo17_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo18_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo19_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo1_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo20_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo21_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo22_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo23_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo24_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo25_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo26_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo27_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo28_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo29_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo2_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo30_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo31_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo3_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo4_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo5_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo6_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo7_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo8_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo9_tmp' [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: ./mysqld --innodb-log-file-size=1G --port=30303 --socket=here.sock --skip-mysqlx 2022-07-28T18:00:23.426625Z 0 [System] [MY-010116] [Server] /export/home/tmp/jlopusza/mysql-release/runtime_output_directory/mysqld (mysqld 8.0.30) starting as process 2522187 2022-07-28T18:00:23.432153Z 0 [Warning] [MY-013907] [InnoDB] Deprecated configuration parameters innodb_log_file_size and/or innodb_log_files_in_group have been used to compute innodb_redo_log_capacity=2147483648. Please use innodb_redo_log_capacity instead. 2022-07-28T18:00:23.433449Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2022-07-28T18:00:23.531051Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2022-07-28T18:00:23.707638Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2022-07-28T18:00:23.707680Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2022-07-28T18:00:23.708648Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/export/home/tmp' in the path is accessible to all OS users. Consider choosing a different directory. 2022-07-28T18:00:23.719997Z 0 [System] [MY-010931] [Server] /export/home/tmp/jlopusza/mysql-release/runtime_output_directory/mysqld: ready for connections. Version: '8.0.30' socket: 'here.sock' port: 30303 Source distribution. Killed [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: ls -lh ../data/#innodb_redo/ total 2.0G -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo0' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo10_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo11_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo12_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo13_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo14_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo15_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo16_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo17_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo18_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo19_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo1_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo20_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo21_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo22_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo23_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo24_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo25_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo26_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo27_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo28_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo29_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo2_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo30_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo31_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo3_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo4_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo5_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo6_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo7_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo8_tmp' -rw-r----- 1 jlopusza common 64M Jul 28 20:00 '#ib_redo9_tmp' As you can see the files should have 64M (=1G*2/32), so it's indeed a bit strange that InnoDB finds much smaller files on your disk. Are you perhaps running out of space on this disc? Or restoring the files from some backup?
[28 Jul 2022 18:32]
MySQL Verification Team
Hi Alexei, Upload please your full config too. thanks
[29 Jul 2022 2:40]
Aleksei Faians
The issue does not always reproduce. Last time it happened only after I restarted server several times. Also, I am using this binary package: https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.30-linux-glibc2.17-x86_64-minimal.... Config: [mysqld] user = mysql-portal pid-file = /run/mysqld-portal/mysqld.pid socket = /run/mysqld-portal/mysqld.sock port = 13306 datadir = /compressed/lib/mysql-portal bind-address = 0.0.0.0 mysqlx-bind-address = 0.0.0.0 key_buffer_size = 16M log_error = /var/log/mysql-portal/error.log max_binlog_size = 100M key_buffer_size = 256K innodb_buffer_pool_size = 8G innodb_buffer_pool_instances = 8 join_buffer_size = 64M sort_buffer_size = 8M read_rnd_buffer_size = 1M performance_schema = ON table_open_cache = 10000 skip_name_resolve = OFF max_connections = 100 mysqlx_max_connections = 20 tmpdir = /compressed/tmp secure-file-priv = /compressed/lib/mysql-portal-files mysqlx_port = 13307 mysqlx_socket = /run/mysqld-portal/mysqlx.sock report_host = portal-db.bsrealm.net Command used to initialize server: sudo -u mysql-portal /opt/mysql/bin/mysqld --defaults-file=/etc/mysql-portal/my.cnf --initialize Command used to start/stop/restart server: systemctl start/stop/restart mysql-portal.service Systemd unit file: [Unit] Description=MySQL Community Server After=network.target [Install] WantedBy=multi-user.target [Service] Type=notify User=mysql-portal Group=mysql-portal PIDFile=/run/mysqld-portal/mysqld.pid ExecStart=/opt/mysql/bin/mysqld --defaults-file=/etc/mysql-portal/my.cnf TimeoutSec=600 Restart=on-failure RuntimeDirectory=mysqld-portal RuntimeDirectoryMode=755 LimitNOFILE=infinity Environment=MYSQLD_PARENT_PID=1 Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 Before: ~# ls -lh /compressed/lib/mysql-portal/#innodb_redo/ total 819K -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo10_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo11_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo12_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo13_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo14_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo15_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo16_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo17_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo18_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo19_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo20_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo21_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo22_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo23_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo24_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo25_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo26_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo27_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo28_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo29_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo30_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo31_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo32_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo33_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo34_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo35_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo36_tmp' -rw-r----- 1 mysql-portal mysql-portal 2.4M Jul 29 05:37 '#ib_redo5' -rw-r----- 1 mysql-portal mysql-portal 408K Jul 29 05:37 '#ib_redo6' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo7_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo8_tmp' -rw-r----- 1 mysql-portal mysql-portal 0 Jul 29 05:37 '#ib_redo9_tmp' After: # ls -lh /compressed/lib/mysql-portal/#innodb_redo/ total 152K -rw-r----- 1 mysql-portal mysql-portal 408K Jul 29 05:37 '#ib_redo6'
[29 Jul 2022 2:41]
Aleksei Faians
> Are you perhaps running out of space on this disc? Or restoring the files from some backup? # df -h /compressed/ Filesystem Size Used Avail Use% Mounted on compressed 900G 284G 616G 32% /compressed
[29 Jul 2022 2:44]
Aleksei Faians
Also please note, that this issue only happens with 8.0.30. I had no single issue with 8.0.29 version for months.
[29 Jul 2022 2:52]
Aleksei Faians
I tried this archive: https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.30-linux-glibc2.12-x86_64.tar.xz The problem doesn't seem to be occurring now. So the question is why the glibc2.17 based package is broken (again)?
[29 Jul 2022 6:45]
MySQL Verification Team
Hi, So if I understood you correctly the mysql-8.0.30-linux-glibc2.17-x86_64 fails for you "every time" or "most of the times" and mysql-8.0.30-linux-glibc2.12-x86_64 does not fail at all? I tried glibc2.17 version 4 times on one server with centos and 2 times on second server with oracle linux and I did not manage to reproduce the corruption. What filesystem are you using, what OS version, what is your glibc version on the server? Thanks
[29 Jul 2022 6:58]
Aleksei Faians
> So if I understood you correctly the mysql-8.0.30-linux-glibc2.17-x86_64 fails for you "every time" or "most of the times" and mysql-8.0.30-linux-glibc2.12-x86_64 does not fail at all? Right. > I tried glibc2.17 version 4 times on one server with centos and 2 times on second server with oracle linux and I did not manage to reproduce the corruption. > What filesystem are you using, what OS version, what is your glibc version on the server? ZFS with compression=on, Ubuntu 20.04. Please also note that I am preloading jemalloc. $ cat /proc/version Linux version 5.15.0-41-generic (buildd@lcy02-amd64-105) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #44~20.04.1-Ubuntu SMP Fri Jun 24 13:27:29 UTC 2022 $ ldd --version ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. $ apt list libjemalloc2 Listing... Done libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed] $ zpool status pool: compressed state: ONLINE scan: scrub repaired 375K in 0 days 00:16:33 with 0 errors on Sun Jul 10 00:40:35 2022 config: NAME STATE READ WRITE CKSUM compressed ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 wwn-0x5002538f41c27845 ONLINE 0 0 0 wwn-0x5002538f41c2784c ONLINE 0 0 0 errors: No known data errors
[29 Jul 2022 8:52]
Jakub Lopuszanski
Hello Aleksei, thank you for providing a lot of details! AFAIU in your latest config there is no innodb_log_file_size, innodb_log_files_in_group nor innodb_redo_log_capacity, so the expectation should be that the default of innodb_redo_log_capacity=100*1024*1024 bytes, which should be divided into 32 files giving 3276800 bytes per file. Yet, your ls -l listing after initialize shows that some files are completely empty, and some are much shorter. This seems to me to indicate some failure during "resizing them up" - they are indeed created initially empty, and then, as they should essentially be filled with zeros (or actually "whatever") up to the expected length, we do not use write() which would be slow, but instead use fallocate(). In particular, this happens here: log_create_unused_file(const Log_files_context & ctx, unsigned __int64 file_id, unsigned __int64 size_in_bytes)C++ log_files_prepare_unused_file(log_t & log, unsigned __int64 file_id, unsigned __int64 start_lsn, unsigned __int64 checkpoint_lsn, bool create_first_data_block, unsigned __int64 & file_size) log_files_create_file(log_t & log, unsigned __int64 file_id, unsigned __int64 start_lsn, unsigned __int64 checkpoint_lsn, bool create_first_data_block) log_files_create(log_t & log, unsigned __int64 flushed_lsn, unsigned __int64 & checkpoint_lsn) log_sys_init(bool expect_no_files, unsigned __int64 flushed_lsn, unsigned __int64 & new_files_lsn) srv_start(bool create_new_db) innobase_init_files(dict_init_mode_t dict_init_mode, List<Plugin_tablespace const> * tablespaces) which calls os_file_set_size_fast(file_path.c_str(), file, 0, size_in_bytes, true) which you can find here: https://github.com/mysql/mysql-server/blob/mysql-8.0.30/storage/innobase/os/os0file.cc#L55... ``` #if !defined(NO_FALLOCATE) && defined(UNIV_LINUX) && \ defined(HAVE_FALLOC_FL_ZERO_RANGE) ... int ret = fallocate(pfs_file.m_file, FALLOC_FL_ZERO_RANGE, offset, size - offset); ... #endif /* !NO_FALLOCATE && UNIV_LINUX && HAVE_FALLOC_FL_ZERO_RANGE */ return os_file_set_size(name, pfs_file, offset, size, flush); ``` So, if the binary was build on a system which seemed to support `fallocate(...HAVE_FALLOC_FL_ZERO_RANGE..)`, we use it. We should also print an error message (only once for the whole mysqld liftime) if it fails: ``` if (print_message) { ib::info(ER_IB_MSG_1359) << "fallocate() failed with errno " << errno << " - falling back to writing NULLs."; print_message = false; } ``` I don't see it mentioned in the fragments of logs you've shared so far, but this could be because: a) because you're using low verbosity level. Note that by default ib::info's are note printed https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_log_error_verb... so you'd have to change it to level 3 b) the message was printed earlier for some other operation (say, tablespace generation) Anyway, if fallocate seems unsupported or failed with error, we fallback to os_file_set_size, which does a series of writes. I think we have not reached this call at all. I base this suspicion on following clues: - some of your files have 0 length, for example: 0 Jul 29 05:37 '#ib_redo13_tmp' - some have size not divisible by UNIV_PAGE_SIZE (16KB according to your config), for example: 408K Jul 29 05:37 '#ib_redo6' - some have non-zero size (for example #ib_redo5 and #ib_redo6), and these files are exactly those which are not temporary, which means they store actual redo log data, so were written to, which suggests that writes with data "in principle" work on this machine, so if we got to call os_file_set_size then it would work (at least "should") So, what I am now suspecting the most is that fallocate "pretends to work" - does not report an error, but also does not extend the file length. Aleksei, would it be possible for you to run a simple experiment with small C++ program using `fallocate` on this machine? $ cat f.cpp #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <fcntl.h> #include <cstdlib> #include <cstdio> #include <linux/falloc.h> int main(int argc,const char * args[]) { if(argc!=3){ printf("Please run with:\n %s path_to_create len\n", args[0]); return 2; } const auto path=args[1]; const int size=atoi(args[2]); printf("Creating file %s\n", path); const int fd = open(path,O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP); if(fd == -1){ printf("Creation has failed\n"); return 3; } printf("fallocate %d bytes\n", size); const int ret=fallocate(fd, FALLOC_FL_ZERO_RANGE, 0, size); printf("fallocate returned %d\n", ret); return(0); } $ g++ f.cpp -o f $ ./f /compressed/lib/mysql-portal/#innodb_redo/example 3276800 ``` and share the output? In my case it was something like this: ``` [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: ./f ./example 3276800 Creating file ./example fallocate 3276800 bytes fallocate returned 0 [jlopusza@compile-el8-01]/export/home/tmp/jlopusza/mysql-release/runtime_output_directory: ls -lh example -rw-r----- 1 jlopusza common 3.2M Jul 29 10:48 example ``` But I wonder what happens on your machine and the specific drive & file system you have in /compressed/lib/mysql-portal/#innodb_redo
[29 Jul 2022 9:05]
Jakub Lopuszanski
> ZFS with compression=on, Ubuntu 20.04. Please also note that I am preloading jemalloc. Then perhaps this thread is relevant to our situation: https://github.com/openzfs/zfs/issues/326
[29 Jul 2022 9:45]
Aleksei Faians
root@home:/compressed# mkdir testing root@home:/compressed# chown testing:testing testing root@home:/compressed# ls -ld /compressed/testing/ drwxr-xr-x 2 testing testing 2 Jul 29 12:38 /compressed/testing/ root@home:/compressed# cd testing/ root@home:/compressed/testing# cat > f.cpp #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <fcntl.h> #include <cstdlib> #include <cstdio> #include <linux/falloc.h> int main(int argc,const char * args[]) { if(argc!=3){ printf("Please run with:\n %s path_to_create len\n", args[0]); return 2; } const auto path=args[1]; const int size=atoi(args[2]); printf("Creating file %s\n", path); const int fd = open(path,O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP); if(fd == -1){ printf("Creation has failed\n"); return 3; } printf("fallocate %d bytes\n", size); const int ret=fallocate(fd, FALLOC_FL_ZERO_RANGE, 0, size); printf("fallocate returned %d\n", ret); return(0); } root@home:/compressed/testing# sudo -u testing g++ f.cpp -o f root@home:/compressed/testing# sudo -u testing ./f /compressed/testing/example 3276800 Creating file /compressed/testing/example fallocate 3276800 bytes fallocate returned 0 root@home:/compressed/testing# ls -lh total 7.0K -rw-rw---- 1 testing testing 0 Jul 29 12:40 example -rwxrwxr-x 1 testing testing 17K Jul 29 12:39 f -rw-r--r-- 1 root root 693 Jul 29 12:39 f.cpp So, I see that fallocate fails for some reason. But why does this problem only hits me: 1. Only with with glibc 2.17 package 2. Only with 8.0.30 version Because I never had this issue with 8.0.29 (and older) glibc 2.17 package.
[29 Jul 2022 10:01]
Aleksei Faians
I've also managed to nail down problem to FALLOC_FL_ZERO_RANGE flag. Documentation says that not all filesystems support it and error should be returned if it's not supported. For some reason, with ZFS no error is returned. If I replace FALLOC_FL_ZERO_RANGE with 0 in f.cpp, file is created properly: root@home:/compressed/testing# ls -l total 128 -rw-rw---- 1 testing testing 3276800 Jul 29 12:57 example
[29 Jul 2022 10:38]
Jakub Lopuszanski
> But why does this problem only hits me: > 1. Only with with glibc 2.17 package > 2. Only with 8.0.30 version In mysql-8.0.29 the os_file_set_size_fast was used in just one place, in create_log_file() : https://github.com/mysql/mysql-server/blob/mysql-8.0.29/storage/innobase/srv/srv0start.cc#... AFAICT, the mysql-8.0.29, also validated size of files on restart: If you look into open_log_file() function it gets the size of file here: https://github.com/mysql/mysql-server/blob/mysql-8.0.29/storage/innobase/srv/srv0start.cc#... and then the caller (srv_start() in the case of create_new_db=false) does some checking: https://github.com/mysql/mysql-server/blob/mysql-8.0.29/storage/innobase/srv/srv0start.cc#... 1. asserts that the size is not "undefined" ut_a(size != (os_offset_t)-1); 2. that the size is multiple of UNIV_PAGE_SIZE: if (size & ((1 << UNIV_PAGE_SIZE_SHIFT) - 1)) { ib::error(ER_IB_MSG_1137, logfilename, ulonglong{size}); return (srv_init_abort(DB_ERROR)); } 3. finally, it just accepts that the size of first file (ib_redo0) it has found on disc is the one to expect from other files: ``` if (i == 0) { srv_log_file_size = size; ... } else if (!srv_dedicated_server && size != srv_log_file_size) { ... ib::error(ER_IB_MSG_1138, logfilename, ulonglong{size}, srv_log_file_size); return (srv_init_abort(DB_ERROR)); } ``` I'd expect that if you use the same implementation of fallocate then mysql-8.0.29 should also fail to start if you stopped it immediatelly after --initialize. When you say it works fine on mysql-8.0.29, do you mean a) specifically that `--initialize` creates files of correct size or b) that you have not seen problems in production so far? If it is b), then plausible explanation is that in 8.0.29 we reused the same two files over and over again in cyclical fashion, so if you let the server run for long enough then it should have write actual redo log data into files, which would make their size reach the full capacity, so after restart they'd appear full. This theory doesn't fit entirely, as `--initialize` usually doesn't produce enough redo log data to fill two files up to the brim. So perhaps you've used 8.0.29 with a different implementation of fallocate? Or perhaps copied/restored redo log files from a different machine/file system? At any rate, could you repeat your experiment with mysql-8.0.29 on the same filesystem? Also, do you happen what link/exact package name did you use to download 8.0.29? Perhaps it was using different version of glibc?
[29 Jul 2022 10:51]
Aleksei Faians
> I'd expect that if you use the same implementation of fallocate then mysql-8.0.29 should also fail to start if you stopped it immediatelly after --initialize. > When you say it works fine on mysql-8.0.29, do you mean > a) specifically that `--initialize` creates files of correct size > or > b) that you have not seen problems in production so far? The answer is b. > So perhaps you've used 8.0.29 with a different implementation of fallocate? > Or perhaps copied/restored redo log files from a different machine/file system? At time of initialization version was 8.0.26 or something and it was running on ext4 fs. Later if was moved to ZFS. So if before 8.0.30 fallocate() was only used at initialization time, it explains everything. > At any rate, could you repeat your experiment with mysql-8.0.29 on the same filesystem? I don't see a point. We already know that fallocate() fails on ZFS when used with FALLOC_FL_ZERO_RANGE and it doesn't even return the error, so it's a problem in kernel/glibc/driver, not in MySQL. But now since we know it happens, I believe, some kind of check or a workaround should be applied. I would also like to report the problem upstream, but I don't know where exactly this report should go. I'd appreciate it if you could point me. > what link/exact package name did you use to download 8.0.29 https://downloads.mysql.com/archives/get/p/23/file/mysql-8.0.29-linux-glibc2.17-x86_64-min...
[29 Jul 2022 11:14]
Jakub Lopuszanski
I've consulted our Release Engineering team and they say that the glibc2.12 variant of MySQL is build&compiled on EL6, on which FALLOC_FL_ZERO_RANGE is not supported: ``` -bash-4.1$ ack FALLOC_FL_ZERO_RANGE /usr/include/ -bash-4.1$ ack FALLOC /usr/include/ /usr/include/linux/falloc.h 1:#ifndef _FALLOC_H_ 2:#define _FALLOC_H_ 4:#define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */ 5:#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */ 8:#endif /* _FALLOC_H_ */ /usr/include/linux/fuse.h 262: FUSE_FALLOCATE = 43, -bash-4.1$ ``` so, the generated binary doesn't even try to use fallocate (as it is disabled by #ifdef) and it goes right to the slow path of writing zeros manually. Assuming that on your combination of machine+glibc+ZFS+OS the fallocate implementation is broken in the sense of returning 0 (success) even though it has not changed the (uncompressed) size of the file (which your experiments with f.cpp seem to suggest), then indeed using glibc2.12 variant of MySQL could have saved you from experiencing this bug, while using glibc2.17 variant (which is built on machine which supports FALLOC_FL_ZERO_RANGE) has exposed you to the problem. > I don't see a point. We already know that fallocate() fails on ZFS when used with FALLOC_FL_ZERO_RANGE and it doesn't even return the error, so it's a problem in kernel/glibc/driver, not in MySQL. So it seems, right. > But now since we know it happens, I believe, some kind of check or a workaround should be applied. Hm, I'm not sure what can I offer here. I assume you really want to keep the redo log on ZFS (probably to take snapshots?). Then what you might try is to either use the glibc2.12 version as you've already figured, or try the Ubuntu-specific binary we also offer for download (but this one might also "assume" fallocate works correctly), or recompile the mysqld from source disabling the fragment for HAVE_FALLOC_FL_ZERO_RANGE. I'm not sure we can really "fix" the "generic linux build" variant, as we really want to use fallocate for performance reasons, and if the compile-time check, and runtime checks are both "lying to us", that the function is working fine, then what exactly can we do about it? Perhaps we could check if the size after the call is what we expect it to be, and if it is not, then we could emit a warning and fall back to slow path? I'm not sure of the performance impact of this. What I am sure of, though, is that it adds to the code complexity. Is the "use glibc2.12 variant" a viable option for you? > I would also like to report the problem upstream, but I don't know where exactly this report should go. I'd appreciate it if you could point me. Thank you! As I've mentioned before I see these people work on it: https://github.com/openzfs/zfs/issues/326 but I am not sure if the bug is there, or in glibc.
[29 Jul 2022 11:32]
Jakub Lopuszanski
Hi Aleksei, I've found this report: https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1969247
[29 Jul 2022 11:41]
Aleksei Faians
> Hm, I'm not sure what can I offer here. I am not trying to solve my own problem right now. I want you to make sure that no one else will get their instances destroyed. I am fine with glibc 2.12 build for now. So, one possible workaround is to implement a simple check at startup to make sure that fallocate() actually creates files of desired size, which in case of failure disables further usage of fallocate(). Another possible workaround is to disable usage of fallocate() completely if ZFS is detected. This won't affect performance much because compression makes zero-filling files an almost instant operation: root@home:~# time dd if=/dev/zero of=/compressed/testing/zeroes.dat bs=1M count=10240 conv=fsync 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 3.00176 s, 3.6 GB/s real 0m3.005s user 0m0.029s sys 0m2.282s And I really doubt that anyone would run MySQL on ZFS without compression. > As I've mentioned before I see these people work on it: https://github.com/openzfs/zfs/issues/326 This one is closed, fallocate() support was added to ZFS and it works if FALLOC_FL_ZERO_RANGE is not used. So it looks to me like the bug is in glibc, it doesn't seem to make sure that FALLOC_FL_ZERO_RANGE is supported by fs when issuing fallocate() system call. I'll report it to Launchpad, they'll probably know what to do next. Thanks for the very professional debugging session! :)
[29 Jul 2022 13:59]
Jakub Lopuszanski
Also, if anyone else is experiencing this particular issue, I advice not to remove redo log files! They may contain valueable data, and more importantly, perhaps can be fixed, by simply appending zeros to them, so that the size is at 64K and divisible by innodb_page_size (which usually is 16K). In theory you should be able to that using `truncate` or `fallocate` command-line utilities, but given that they might use fallocate function and thus fail for same reason as MySQL, you might prefer to use `dd` to literally write zeros. I haven't tested it myself, but I think that something like: ``` for x in *ib_redo*; do s=$(stat --format="%s" $x); # this assumes innodb_page_size = 16K ((B=1<<14)); # round up to multiple of B ((ns=(s+B-1)/B*B )); # number of bytes missing ((delta=ns-s)); if((0<$delta)); then echo "Missing $delta bytes in $x"; # make a backup, in case I've messed up dd call cp $x $x.backup; dd if=/dev/zero bs=1 count=$delta >> $x; fi; done ``` should round up all files to 16K. Similar approach could be used to fix those which are smaller than 65536.
[22 Aug 2022 12:01]
Reio Remma
Just upgraded to 8.0.30 and also happened on this. mysql-community-server.x86_64 8.0.30-1.el8 @mysql80-community Running in AlmaLinux release 8.6 container on Proxmox along with ZFS. Managed to switch over to my single non-ZFS slave. Is my only option right now to take a backup from the non-ZFS server and revert all to 8.0.29?
[22 Aug 2022 17:20]
Reio Remma
I can confirm that the script above worked nicely to fix redo log an a slave.
[23 Aug 2022 8:39]
Jakub Lopuszanski
Thank you Reio Remma for conducting this real world experiment, and sharing results - I hope it might help others in similar situation.