Bug #99326 undo truncation might still not be crash safe
Submitted: 23 Apr 2020 2:06 Modified: 25 May 2020 16:18
Reporter: Zhang JiYang Email Updates:
Status: Closed Impact on me:
Category:MySQL Server Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[23 Apr 2020 2:06] Zhang JiYang
It's a variant of the bug https://bugs.mysql.com/bug.php?id=93170.

Now the undo space id may be reused after 512 truncation iterations. Is possible that the checkpoint is too old so that the space id is reused by undo, and then the page id is unexpectedly used while doing recovery.

How to repeat:
[23 Apr 2020 12:37] MySQL Verification Team

Thanks for the report. I understand the logic but I'm not able to make a test case to reproduce this. Lemme get back on it.

all best
[23 Apr 2020 17:23] Kevin Lewis
This whole undo truncation process is done while an undo trunc log file exists in the undo directory (or datadir if innodb_undo_directory is not defined). This temporary file, named "undo%lu_trunc.log", is created at the start of the undo truncate process and is deleted at the end. This is what assures that the process is crash safe.  We have test cases that introduce crashed at 9 different places along the process.  The existance of that file at startup will cause its associated undo tablespace to be deleted and replaced, a full truncation, at startup.

It is not possible for undo tablespaces from a previous incarnation (one of the 512 possible space IDs assigned to an undo tablespace) to interfere with another one since the buffer pool is cleaned up of all pages from the old space_id before the tablespace with the new space ID is created during undo truncation. And the space IDs are assigned on a round robin bases each time an undo tablespace is truncated.  Undo truncation currently removes all pages from the old undo tablespace when it is deleted.  Then the new tablespace is flushed to disk before it is put online. 

So the undo truncation process is indeed crash safe.
[23 Apr 2020 19:08] Kevin Lewis
After and internal discussion with Sunny Bains, I think I understand the concern better.  Let's assume that a redo log is so large that it contains redo entries for all 512 Space IDs of an undo tablespace that is being truncated too often. In other words, even though each truncate removes old pages from the buffer pool and flushes newly created pages, it does not actually cause a checkpoint for each truncation like it did in 5.7.  So the redo log can possibly contain records for more than 512 space IDs.

There is a worklog tested and pushed to the 8.0.21 release branch that fixes this highly unlikely possibility. 

As part of WL#11819, we keep a count of the number of truncations that have happened between checkpoints. So if there is more than (512 / 8) truncations between checkpoints, then no more truncations can happen on that undo space until the next checkpoint happens.
[23 Apr 2020 20:09] MySQL Verification Team
Kevin, thanks for the clarification! This explains why I could not reproduce :)

[24 May 2020 17:11] Valeriy Kravchuk
How comes a theoretically confirmed problem, fixed in some internal bug report or in frames of some worklog is not a duplicate, but "Not a Bug"? It is a bug in all currently released versions of 8.0. up to 8.0.20. Please, set proper status and document this until the fix is released.
[25 May 2020 16:18] MySQL Verification Team
Hi Val,

Thanks, you are right.

Fixed in 8.0.21