Bug #119396 Increase in undo log size , slower purging and no truncation even after restart
Submitted: 14 Nov 10:58 Modified: 14 Nov 15:12
Reporter: Azar Md Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:8.0.40 OS:Any
Assigned to: CPU Architecture:Any
Tags: InnDB, undo log purge, undo space

[14 Nov 10:58] Azar Md
Description:
We are encountering an issue in MySQL 8.0.40 where the History List Length has increased significantly, and the undo log files are occupying a large amount of disk space. We marked two undo tablespaces as inactive, and even after more than 30 days, they still have not been emptied. We also restarted MySQL to clear the space, but even after the restart, the space is still not getting freed.

There are no long-running transactions on this server; it is purely a heavy-write database, yet the undo logs continue to grow and do not purge as expected.

--

Purge done for trx's n:o < 2176302582 undo n:o < 0 state: running
History list length 2216807648

mysql>  SELECT SPACE, NAME, STATE, FILE_SIZE, ALLOCATED_SIZE FROM information_schema.innodb_tablespaces WHERE SPACE_TYPE='Undo';
+------------+-----------------+----------+--------------+----------------+
| SPACE      | NAME            | STATE    | FILE_SIZE    | ALLOCATED_SIZE |
+------------+-----------------+----------+--------------+----------------+
| 4294966771 | innodb_undo_001 | active   | 591463972864 |   591464157184 |
| 4294966897 | innodb_undo_002 | inactive | 616831123456 |   616831246336 |
| 4294967277 | temp_undo_003   | active   | 654227537920 |   654227861504 |
| 4294967276 | temp_undo_004   | inactive | 109270007808 |   109270110208 |
+------------+-----------------+----------+--------------+----------------+
4 rows in set (0.12 sec)

mysql> SELECT     t.trx_id,     t.trx_state,     t.trx_started,     TIMESTAMPDIFF(SECOND, t.trx_started, NOW()) AS age_sec,     p.ID,     p.USER,     p.HOST,     p.DB,     p.COMMAND,     p.TIME,
 p.STATE,     p.INFO FROM information_schema.innodb_trx t JOIN information_schema.processlist p     ON t.trx_mysql_thread_id = p.ID ORDER BY age_sec DESC;
Empty set (0.00 sec)

How to repeat:
Set up MySQL 8.0.40 on a server with a heavy write-intensive workload (millions of writes per day).

Ensure there are no long-running transactions, and the workload consists mainly of short autocommit DML operations.

Monitor the InnoDB metrics and observe:

History List Length continuously increases,

Purge lag grows even without any open long transactions.

Mark two undo tablespaces (e.g., undo_003, undo_004) as INACTIVE using:

ALTER UNDO TABLESPACE undo_003 SET INACTIVE;
ALTER UNDO TABLESPACE undo_004 SET INACTIVE;

Let the server run for 30+ days.

Monitor INFORMATION_SCHEMA.INNODB_TABLESPACES — inactive undo tablespaces never become empty.

Restart MySQL to trigger space reclamation.

After the restart, observe that:

Undo tablespaces remain large,

Disk space is never reclaimed,

History List Length still grows unexpectedly.

Suggested fix:
We need a way to control and limit the growth of undo logs in a heavy write–workload environment.

If the History List Length and the undo log space have increased significantly, then after marking some undo tablespaces as INACTIVE, they should eventually return to an empty state once their contents are no longer needed.

During a MySQL restart, the undo log files should be cleared or reduced, at least for the manually created undo tablespaces. These should free up space once they are inactive and no active transactions are referencing them.
[14 Nov 14:44] MySQL Verification Team
Suggest you check for long running XA or transactions.

 XA RECOVER;
 SELECT * FROM information_schema.innodb_trx ORDER BY trx_started;

if an XA trx is there for long, then do a XA ROLLBACK for it.
[14 Nov 14:46] MySQL Verification Team
related:
https://bugs.mysql.com/bug.php?id=54455
(innodb needs a way to limit consistent read snapshot age)
[14 Nov 14:50] MySQL Verification Team
A few things you could tune in purge.
https://dev.mysql.com/doc/refman/8.0/en/innodb-purge-configuration.html
however we cant help with that in context of the bug report.

IF you've got time for downtime and there are no XA trx, then
SET GLOBAL innodb_fast_shutdown=0; and restart. It'll take a while, but purge should catch up.
[14 Nov 15:12] Azar Md
mysql> SELECT * FROM information_schema.innodb_trx ORDER BY trx_started; 
Empty set (0.00 sec) 

mysql> XA RECOVER; 
Empty set (0.00 sec)

The result of SELECT * FROM information_schema.innodb_trx is empty, and XA RECOVER also returns an empty set. We have already tried all the recommended suggestions to speed up the purge process, but there is still no major improvement. A MySQL restart should normally clear the manually created undo tablespaces, but that is not happening. Is this a bug in this version?

Is there any better solution to overcome this issue? We have not yet tried the shutdown query you shared, because we need to know how long the downtime would be, and we cannot afford a long downtime window.