MySQL Bugs: #119643: Undo log purging goes through purge boundary overwrites undo log

Bug #119643	Undo log purging goes through purge boundary overwrites undo log
Submitted:	7 Jan 18:53
Reporter:	Meng-Hsiu Chiang (OCA)	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	8.0.44	OS:	Linux
Assigned to:		CPU Architecture:	x86
Description:
When purging of a record in the virtual index (secondary index on a generated column), the process requires a safety verification check. This check involves locating the corresponding record in the primary index and traversing through the older versions (MVCC) until either a match is found or the purge boundary is encountered. If a matching record exists within these bounds, the secondary index record must be preserved and cannot be purged.

Virtual indexes implement different traversal logic. The traversal of older versions continues beyond the purge boundary, terminating only upon finding a match or reaching the undo record currently under purge. This extended traversal introduces a risk of parsing truncated undo records. While parsing truncated undo logs is not inherently dangerous, a critical issue arises when the page containing these undo logs is reallocated to another undo segment. In such cases, the original undo log references become invalid, potentially pointing to arbitrary data with different undo offsets or data belonging to different tables.

Example:
```

                                Purge
                               boundary
                                  |
+-----------+                     |
|           |        +----------+ |  +----------+    +----------+
| Primary   |------->|  TRX C   |--->|  TRX B   |--->|  TRX A   |
| Index     |        +----------+ |  +----------+    +----------+
|           |                     |
+-----------+                     |
                                  |
+-----------+                     |
|   TRX A   |
+-----------+                   
|  Virtual  |
|   Index   |
|           |
|           |
+-----------+

```

1. Initial Purge Processing:

  During the first purge processing of Transaction A, the complete MVCC chain from the primary index to Transaction A remains intact. At this stage, virtual index purge operations reading below the purge boundary proceed safely.

2.  Partial Truncation Scenario:

  A situation can arise where Transactions A and B are purged, but only Transaction B gets truncated. This partial truncation occurs due to either:

    a. Transaction A residing in a cached undo segment (AMS does not truncate cached undo segments)
    b. System crash during truncation, resulting in only Transaction B being truncated
3. Post-Restart Behavior:

  Following system restart in either scenario (2a or 2b), Transaction A undergoes a second purge operation since:

    * It remains present in the system
    * Purge operations always initiate from the oldest available undo records

  During this second purge of Transaction A, the MVCC chain traversal encounters the truncated Transaction B records. These records may be corrupted if their original undo pages have been reallocated to other transactions.

4. Impact:

  The described sequence triggers various assertion failures during Transaction B processing.

In this worklog for virtual index design https://dev.mysql.com/worklog/task/?id=8149, the low level design section makes note of special case handling during purge.

How to repeat:
Prerequisites: disable truncate

```
# create test cases
drop database if exists test;    
create database test;  
use test;
create table t2(x int, y int);
create table t1(x int, y int, z int GENERATED ALWAYS as (x + y), PRIMARY KEY(x));
create index v_idx on t1(z);                     
delimiter $$                                      
create procedure large_update_t2()    
begin                                       
  set @itr = 0;
  while @itr < 10000 do                                                                                                       
    update t2 set y = @itr where x=3;
    set @itr = @itr + 1;
  end while;
end $$             
delimiter ;
         
insert into t1(x,y) values(3, 4);                  
                             
# insert other records into T1 until the last line printed table space assigned is 4294967279
                
delete from t1 where x=3; 
 
# insert other records into T1 until the last line printed for table space assigned is 4294967278 (with 2 undo tablespace)
```

Once the table space assigned is satisfied (4294967278) , create the transaction:

```
start transaction;
# this insert should use tablespace 4294967279
insert into t1(x,y) values(3, 4);        
insert into t2(x,y) values(3, 4);
call large_update_t2();
commit;  
```

Then enable purge and truncation, and force a engine crash. After restarting engine:
```
use test;
CREATE TABLE `t3` (
`a` BLOB,
`b` BLOB,
`c` BLOB GENERATED ALWAYS AS (CONCAT(a,b)) VIRTUAL,
`h` VARCHAR(10) DEFAULT NULL,
`i` int
) ENGINE=InnoDB;
INSERT INTO t3 VALUES (REPEAT('g', 16000), REPEAT('x', 16000), DEFAULT, "kk", 1);
delimiter $$
create procedure large_update_t3()
begin
  set @itr = 2;
  while @itr < 100000 do
    UPDATE t3 SET a = RANDOM_BYTES(1000);
    set @itr = @itr + 1;
  end while;
end $$
delimiter ;
  
call large_update_t3();  
```
by adding addition code to the engine, we can analyze the log and see the undo logs were being overwritten