MySQL Bugs: #108952: data inconsistent since replica crash recovery after undo tablespace truncation

Bug #108952	data inconsistent since replica crash recovery after undo tablespace truncation
Submitted:	1 Nov 2022 12:13	Modified:	4 Nov 2022 15:20
Reporter:	John White	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	8.0.28	OS:	CentOS (7.2)
Assigned to:		CPU Architecture:	x86
Tags:	crash recovery, undo truncate

Description:
2022-10-01T22:49:56.811853+08:00 0 [ERROR] [MY-013266] [InnoDB] Thread# 0 -  Failed to finish truncating Undo Tablespace 'innodb_undo_001'
2022-10-01T22:49:56.840252+08:00 0 [ERROR] [MY-013266] [InnoDB] Thread# 0 -  Failed to finish truncating Undo Tablespace 'innodb_undo_002'

Once replica truncated undo tablespace while super_read_only = ON, a log file undo_{space_num}_trunc.log left when dd_tablespace_set_id_and_state returned true, which means #define DD_FAILURE true as following stacktrace.

When replica reboot, undo tablespace contains data need to be rollbacked would be truncated while booting tablespaces during InnoDB recovery, since undo_{space_num}_trunc.log still survived.

check_readonly(thd, true)
lock_tables_check(THD * thd, TABLE ** tables, size_t count, uint flags) (/data/compile/mysql-8.0.28/sql/lock.cc:211)
mysql_lock_tables(THD * thd, TABLE ** tables, size_t count, uint flags) (/data/compile/mysql-8.0.28/sql/lock.cc:325)
lock_dictionary_tables(THD * thd, TABLE_LIST * tables, uint count, uint flags) (/data/compile/mysql-8.0.28/sql/sql_base.cc:7050)
dd::Open_dictionary_tables_ctx::open_tables(dd::Open_dictionary_tables_ctx * const this) (/data/compile/mysql-8.0.28/sql/dd/impl/transaction_impl.cc:124)
dd::cache::Storage_adapter::store<dd::Tablespace>(THD * thd, dd::Tablespace * object) (/data/compile/mysql-8.0.28/sql/dd/impl/cache/storage_adapter.cc:334)
dd::cache::Dictionary_client::update<dd::Tablespace>(dd::cache::Dictionary_client * const this, dd::Tablespace * new_object) (/data/compile/mysql-8.0.28/sql/dd/impl/cache/dictionary_client.cc:2653)
dd::commit_or_rollback_tablespace_change(THD * thd, dd::Tablespace * space, bool error, bool release_mdl_on_commit_only) (/data/compile/mysql-8.0.28/sql/dd/impl/dictionary_impl.cc:721)
dd_tablespace_set_id_and_state(const char * space_name, space_id_t space_id, dd_space_states state) (/data/compile/mysql-8.0.28/storage/innobase/dict/dict0dd.cc:6064)
trx_purge_truncate_marked_undo_low(space_id_t space_num, std::string space_name) (/data/compile/mysql-8.0.28/storage/innobase/trx/trx0purge.cc:1493)
trx_purge_truncate_marked_undo() (/data/compile/mysql-8.0.28/storage/innobase/trx/trx0purge.cc:1556)
trx_purge_truncate_undo_spaces() (/data/compile/mysql-8.0.28/storage/innobase/trx/trx0purge.cc:1685)
trx_purge_truncate() (/data/compile/mysql-8.0.28/storage/innobase/trx/trx0purge.cc:2386)
trx_purge(ulint n_purge_threads, ulint batch_size, bool truncate) (/data/compile/mysql-8.0.28/storage/innobase/trx/trx0purge.cc:2492)
srv_do_purge(ulint * n_total_purged) (/data/compile/mysql-8.0.28/storage/innobase/srv/srv0srv.cc:2972)
srv_purge_coordinator_thread() (/data/compile/mysql-8.0.28/storage/innobase/srv/srv0srv.cc:3145)

How to repeat:
1. set variables:
> SET GLOBAL innodb_max_undo_log_size = 16 * 1024 * 1024;
> SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
> SET GLOBAL innodb_undo_log_truncate = ON;

2. write data:
Connection 1: 
> SET SESSION transaction_isolation = 'REPEATABLE-READ';
> START TRANSACTION WITH CONSISTENT SNAPSHOT;

Connection 2:
write a table with 1,000,000 rows, then update it, until undo file size larger than 16MiB;

Connection 1:
> ROLLBACK;

3. soon undo_{space_num}_trunc.log would be created, and undo file recreated;

4. killall -9 mysqld and reboot mysqld, watch mysql.err:
[ERROR] [MY-013266] [InnoDB] Thread# 0 -  Failed to finish truncating Undo Tablespace 'innodb_undo_001'

Suggested fix:
1. maybe we can check if undo is active or not when undo_{space_num}_trunc.log found on booting mysqld;

2. skip check read only when dd_tablespace_set_id_and_state in undo truncation;

3. print more logs when dd_tablespace_set_id_and_state failed;

Connection 1:
> SET GLOBAL super_read_only = ON;
> ROLLBACK;

fix reproduce step, set super_read_only then rollback, wait for undo truncation.

Hi Mr. White,

Thank you for your bug report.

However, we do not consider reports where one of the step is killing MySQL server unconditionally. We can accept that too, providing that all of your ACID settings are set to maximum, where syncing the drives is set at the most severe level  and that all these files are located to non-cached partitions.

Also, your workarounds are quite acceptable for installations without these conditions.

Unsupported.

Maybe above reproducing steps mislead the primal problem.

Our settings of MySQL as following, sync_binlog & innodb_flush_log_at_trx_commit both set to 1;

MySQL was killed by OOM killer at the first time I encountered the case, the key to the question is undo_{space_num}_trunc.log left after undo truncated when set state and id failed in DD.

Maybe this error can be catched and processed, instead of regarding trunc.log as unfinished undo truncation when MySQL shutdown & boot abnormally.

[mysqld]
port = 3306
socket = /data/mysql/data/mysql.sock
back_log = 300
max_connections = 3500
max_connect_errors = 9999
transaction_isolation = READ-COMMITTED
max_allowed_packet = 1024M
open_files_limit = 10000
character_set_server = latin1
performance_schema = ON
performance_schema_instrument = 'memory/%=COUNTED'
performance_schema_digests_size = 40000
performance_schema_max_table_instances = 40000
performance_schema_max_sql_text_length = 4096
performance_schema_max_digest_length = 4096
performance_schema_max_table_handles = 10000
log_bin = binlog
binlog_format = row
log_slave_updates = on
sync_binlog = 1
slow_query_log
long_query_time = 2
slow_query_log_file = mysql-slow.log
log_slow_slave_statements
binlog_cache_size = 2M
sort_buffer_size = 2M
join_buffer_size = 2M
key_buffer_size = 32M
read_buffer_size = 2M
read_rnd_buffer_size = 16M
bulk_insert_buffer_size = 64M
thread_stack = 192K
innodb_buffer_pool_size = 80G
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 16M
innodb_log_file_size = 16M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 60
innodb_stats_on_metadata = 0
innodb_flush_method = O_DIRECT
innodb_flush_neighbors = 0
innodb_lock_wait_timeout = 8
thread_cache_size = 256
innodb_write_io_threads = 16
innodb_read_io_threads = 8
innodb_io_capacity = 200
innodb_thread_concurrency = 8
innodb_purge_threads = 1
innodb_buffer_pool_instances = 8
innodb_autoinc_lock_mode = 2
innodb_page_cleaners = 4
innodb_old_blocks_time = 1000
innodb_buffer_pool_dump_at_shutdown = 0
innodb_buffer_pool_load_at_startup = 0
innodb_print_all_deadlocks = 1
innodb_log_checksums = on
innodb_checksum_algorithm = crc32
innodb_file_per_table = 1
innodb_adaptive_hash_index = 0
datadir = /data/mysql/data
innodb_log_group_home_dir = /data/mysql/data
innodb_data_home_dir = /data/mysql/data
innodb_data_file_path = ibdata1:16M:autoextend
general_log_file = mysql.log
log_error = mysql.err
pid_file = mysql.pid
table_definition_cache = 10000
table_open_cache = 10000
innodb_open_files = 10000
table_open_cache_instances = 64
gtid_mode = ON
enforce_gtid_consistency = 1
binlog_gtid_simple_recovery = 1
binlog_rows_query_log_events = 1
server_id = 1
relay_log_space_limit = 10G
skip_slave_start
relay_log_recovery = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
slave_parallel_type = logical_clock
slave_parallel_workers = 8
slave_preserve_commit_order = 0
slave_net_timeout = 30
interactive_timeout = 1800
wait_timeout = 1800
log_error_verbosity = 3
log_timestamps = SYSTEM
loose_rpl_semi_sync_master_wait_point = AFTER_SYNC

Hi Mr. White,

Thank you for your feedback.

It is clear for your feedback that you have not set all variables to 100 % ACID compliance, nor have you managed to prevent your disk and its partitions from being cached by Linux.

Thanks for your advice, may I get a guidance how to set all variables to 100% ACID compliance and prevent disk and its partitions from being cached by Linux?

If Linux didn't reboot, cached by Linux and disk still there.

According to https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-unexpected-replica-halt.html

IMHO, sync_binlog & innodb_flush_log_at_trx_commit set to 1 is enough to protect data from crash, 
innodb_flush_method set to O_DIRECT could bypass Linux cache, 
sync_relay_log was the default 10000 but relay_log_recovery = ON, I think it's safe

If any other settings could prevent MySQL, may I get a list and check them?

Hi Mr. White,

Unconditional killing of MySQL has the same consequences as the crash of the OS. Hence, first thing to make sure is that cacheing of disks, partitions and filesystem is off.

Regarding all other ACID settings, read our Reference Manual, very carefully.  Type of partition and filesystem syncing should be set carefully and precisely and should be checked whether is working on your hardware and OS.

Thanks for your advice, I had consulted our OS experts for the settings, and verified the environment of disk & OS works, we deployed thousands of MySQL on the same standard environment for at least 5 years, it have been always safe even encounter electric power failure.

There still one question confuses me, why didn't empty undo_1_trunc.log remove after undo truncation? Would this log lead undo_001 cleared again when MySQL restart?

As I traced MySQL startup, I saw following stacktrace, if magic_no != undo::s_magic, undo_001 will be recreated, since at this time, undo_1_trunc.log is 16KiB empty file, no magic number written due to latest undo truncation failure as above description, all the time OS & disk works fine, I'm sure the reason undo_1_trunc.log still be there is not a OS or disk failure, because I traced that setting super_read_only bring on the problem, thanks for reading my trivial description:

undo::is_active_truncate_log_present(space_num)
srv_undo_tablespace_fixup(const char * space_name, const char * file_name, space_id_t space_id) (/data/compile/mysql-8.0.28/storage/innobase/srv/srv0start.cc:807)
Validate_files::check(Validate_files * const this, const Validate_files::Const_iter & begin, const Validate_files::Const_iter & end, size_t thread_id) (/data/compile/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:3406)
Validate_files::validate(Validate_files * const this, const Validate_files::DD_tablespaces & tablespaces) (/data/compile/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:3644)
boot_tablespaces(THD * thd) (/data/compile/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:3718)
innobase_dict_recover(dict_recovery_mode_t dict_recovery_mode, uint version) (/data/compile/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:3933)
(anonymous namespace)::DDSE_dict_recover(THD * thd, dict_recovery_mode_t dict_recovery_mode, uint version) (/data/compile/mysql-8.0.28/sql/dd/impl/bootstrap/bootstrapper.cc:87)
dd::bootstrap::restart(THD * thd) (/data/compile/mysql-8.0.28/sql/dd/impl/bootstrap/bootstrapper.cc:933)
dd::upgrade_57::restart_dictionary(THD * thd) (/data/compile/mysql-8.0.28/sql/dd/upgrade_57/upgrade.cc:797)
dd::upgrade_57::do_pre_checks_and_initialize_dd(THD * thd) (/data/compile/mysql-8.0.28/sql/dd/upgrade_57/upgrade.cc:957)
bootstrap::handle_bootstrap(void * arg) (/data/compile/mysql-8.0.28/sql/bootstrap.cc:327)
pfs_spawn_thread(void * arg) (/data/compile/mysql-8.0.28/storage/perfschema/pfs.cc:2947)

Hi, John White 

You are right that undo tablespace contains data need to be rollbacked would be truncated while booting tablespaces during InnoDB recovery, and result in inconsistent or missing data.

I maybe reproduce your problem and hope it works for you:

step1: produce a xa transaction, in session1

mysql>  select * from tmp.test;
+----+-----------+
| id | newColumn |
+----+-----------+
|  1 |       100 |
|  2 |       100 |
+----+-----------+
2 rows in set (0.00 sec)

mysql> xa start 'test';
Query OK, 0 rows affected (0.00 sec)

mysql> update test set newColumn = 200;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> xa end 'test';
Query OK, 0 rows affected (0.00 sec)

mysql> xa prepare 'test';
Query OK, 0 rows affected (0.00 sec)

step2:  flush the dirty pages of table `test`, in session2

mysql> use tmp;
Database changed
mysql> flush table test;
Query OK, 0 rows affected (0.00 sec)

step3: kill the mysqld instance

step4: I create tow empty files directly under the data directory, and they are undo_1_trunc.log and undo_2_trunc.log(It is very difficult to know which tablespace file the xa transaction's undo log is in)

step5: reboot the mysqld instance

step6: verify the status of the xa transaction

mysql> xa recover;
Empty set (0.00 sec)

mysql> select * from tmp.test;
+----+-----------+
| id | newColumn |
+----+-----------+
|  1 |       200 |
|  2 |       200 |
+----+-----------+
2 rows in set (0.03 sec)

I don't think it is a correct result about the status of the xa transaction, and the data of undo log is missed.