| Bug #120648 | Contribution: InnoDB: Crash resuming interrupted ALTER TABLESPACE ENCRYPTIO ... | ||
|---|---|---|---|
| Submitted: | 9 Jun 17:53 | ||
| Reporter: | OCA Admin (OCA) | Email Updates: | |
| Status: | Open | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
| Version: | OS: | Any | |
| Assigned to: | CPU Architecture: | Any | |
[9 Jun 17:53]
OCA Admin
[9 Jun 17:53]
OCA Admin
Contribution submitted via Github - InnoDB: Crash resuming interrupted ALTER TABLESPACE ENCRYPTION (*) Contribution by Przemysław Skibiński (Github inikep, https://github.com/mysql/mysql-server/pull/668): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it --- Resuming an interrupted `ALTER TABLESPACE ... ENCRYPTION` after a crash is unsafe in two independent ways. The post-recovery resume thread can crash the server while finishing the operation; either bug alone is sufficient. ### 1. Resume thread has no thread id `fsp_init_resume_alter_encrypt_tablespace()` runs the resumed ALTER on an internal THD created with `create_internal_thd()` but never assigns it a thread id. When binary logging is enabled the resumed DDL acquires GTID ownership keyed by thread id, and committing it aborts: ``` rpl_gtid_state.cc:855: Gtid_state::update_gtids_impl_own_gtid(): Assertion `owned_gtids.is_owned_by(thd->owned_gtid, thd->thread_id())'' ``` Fix: call `thd->set_new_thread_id()` on the resume THD before running the operation. ### 2. Persisted resume progress can run ahead of the on-disk state A resumed (un)encryption continues from the progress counter persisted on page 0 and skips every page below it, assuming those pages are already on disk in the target encryption state. But `mark_all_page_dirty_in_tablespace()` only dirties the pages in the buffer pool; a page''s on-disk encryption state is decided later, at write time, in `fil_io_set_encryption()`. Page 0 (holding the progress) and the data pages it accounts for are flushed independently, so page 0 can reach disk with progress=N while the data pages dirtied up to N have not been flushed. The redo-logged rewrite of those data pages is content-neutral (re-stamps the same space id), so redo alone does not establish their on-disk encryption state either: during crash recovery a data page can be applied and flushed by the recovery writer in the old encryption state before the operation marker on page 0 is applied to `fil_space_t::encryption_op_in_progress`. After such a crash the resume trusts progress=N, skips those still-old pages, and for a decryption `decrypt_end()` erases the tablespace key, leaving pages that are encrypted on disk but no longer decryptable. The next read aborts: ``` fil0fil.cc: Assertion `req_type.is_dblwr() || err == DB_SUCCESS'' (the failing read returns DB_IO_DECRYPT_FAIL) ``` The symmetric encryption case silently leaves pages unencrypted on disk. Fix: - `mark_all_page_dirty_in_tablespace()`: flush the just-processed pages with `buf_LRU_flush_or_remove_pages()` (which fsyncs the file via `fil_flush()`) before persisting the progress on page 0, so the persisted progress can never run ahead of the on-disk encryption state. - `resume_alter_encrypt_tablespace()`: load the tablespace key from the header on resume for both encryption and decryption, not only encryption — pages above the progress watermark can still be encrypted on disk and reading them needs the in-memory key. - `load_encryption_from_header()`: when page 0 is not flagged as encrypted there is no encryption info to load (an expected state after a crash that interrupted `ALTER ... ENCRYPTION` before the new info was persisted, or after a decryption finished); return early instead of decoding the default info, which avoids a misleading "found unexpected version" error logged from the post-recovery resume thread. ### Test `mysql_ts_alter_encrypt_resume_crash` repeatedly toggles the mysql tablespace encryption and kills the server mid-operation; on the next startup the resume thread finalizes the interrupted operation. The crash is timing dependent — run with `--parallel=24 --repeat=40`. With both fixes the server always recovers and the tablespace stays readable. Verified: unfixed binary reproduces the `fil0fil` `DB_IO_DECRYPT_FAIL` assertion; with the fix, 40/40 repeats pass with no assertions.
Contribution: git_patch_3832547426.txt (text/plain), 14.12 KiB.
