Bug #98564 [InnoDB] Assertion failure: space->encryption_op_in_progress == NONE
Submitted: 12 Feb 2020 10:28 Modified: 17 Feb 2021 11:01
Reporter: Satya Bodapati (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0, 8.0.19 OS:Any
Assigned to: CPU Architecture:Any

[12 Feb 2020 10:28] Satya Bodapati
Description:
Kill server when ALTER TABLESPACE is running.
Next startup, when resume alter thread is about to start, start some DDLs parallely in other connections (on unrelated tablespaces).
Kill the server again.

On startup, trying to re-encrypt this table will lead to assertion failure. Essentially the tablespace is left in 'partial' state. The recovery roll-forward encrypt thread will never fix it.

How to repeat:
See mtr testcase. Please apply the debug patch. This is to ensure resume_encrypt thread is killed in right state.

Suggested fix:
It seems resume encrypt thread breaks DDL Log design.

1. So when DDL execute, they write DDL log entries in DDL Log Table using the connection id(). aka thread_id in innodb source.

2. after restart, the connection ids can again start from beginning.

3. When the resume encrypt thread is about to process the DDL log entries, parallel DDL will create entries with same connection ids.

4. When the DDLs are successful, they remove the 'old' connection id entries also. Thus resume encrypt thread or next startups will never these DDL Log entries.

Let me try to explain with example:

1. ALTER TABLESPACE ENCRYPTION='Y' -> writes entry into DDL_log table with id 7 (connection id as 7).
2. assume this operation is killed. So DDL_Log with entry 7 is used at startup to resume the operation.
3. Next startup, there is new connection estableished and its id is 7 again. Execute DDL from this. When the DDL completes, it has to remove all entries with '7'.
4. Now the old entry '7' is also removed.
5. If server is killed at this stage, it can never resume the encryption operation as DDL_Log is empty.

My recommendation is to let go of this 'background thread concept', recover encrypt operations like other DDLs are processed.
(afaik, DDL Recovery happens at startup and only exemption is encryption operations)

IF we want to keep background thred resume, ensure somehow the ids don't conflict. This is a bit hard because server decides those connection ids.

IF we block startup until resume encrypt is over, then we might as do as part of regular startup. No need for background. But yes, in rare cases, startup will be longer.
[12 Feb 2020 10:29] Satya Bodapati
mtr testcase

Attachment: ddl_log_issue.test (application/octet-stream, text), 3.83 KiB.

[12 Feb 2020 10:32] Satya Bodapati
code change. sleep

Attachment: ddl_log_code_change.diff (text/x-patch), 555 bytes.

[12 Feb 2020 11:45] MySQL Verification Team
Hello Satya,

Thank you for the report and feedback!

regards,
Umesh
[12 Feb 2020 13:16] MySQL Verification Team
I think that debug patch is fine, since it only introduces sleep() when server is compiled with full debug info.
[17 Sep 2020 16:27] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.23 release, and here's the proposed changelog entry from the documentation team:

An interrupted tablespace encryption operation did not update the
encrypt_type table option information in the data dictionary when the
operation resume processing after the server was restarted.
[21 Sep 2020 16:35] MySQL Verification Team
Thank you, Daniel.
[15 Feb 2021 8:59] Satya Bodapati
Thank you for fixing this! This should be in closed status though.
[15 Feb 2021 13:44] MySQL Verification Team
Hi,

We agree with Mr. Bodapati that this report should be closed.
[17 Feb 2021 11:01] Erlend Dahl
In response to urgent requests from Mr Bodapati and Mr Milivojevic, I close the bug.