Bug #119578 Repeated MySQL Crashes During Online ALTER on Multi -TB Tables
Submitted: 19 Dec 14:06
Reporter: pon suresh pandian soundara rajan Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:8.0.40 OS:Any
Assigned to: CPU Architecture:x86
Tags: compressed tables, crash alter script

[19 Dec 14:06] pon suresh pandian soundara rajan
Description:
Hi Team,

We are implementing table compression on very large tables in our prod MySQL. One of the affected tables helpdesk_note_bodies, is around 13 TB in size.

ALTER statement used :

ALTER TABLE helpdesk_note_bodies ROW_FORMAT=COMPRESSED, KEY_BLOCK_SIZE=8, ALGORITHM=INPLACE, LOCK=NONE;

To isolate the operation:

We created a copy of prod instance, Executed the ALTER with no application workload

Despite this, the ALTER ran for over 9 days and eventually MySQL crashed.

We have observed the same failure pattern across four different servers when running this ALTER..

Observed Behavior :

All observed crashes point to InnoDB dictionary latch starvation during online DDL on multi-TB tables using ROW_FORMAT=COMPRESSED. 

This issue is reproducible with no writes

-> Independent of query workload

-> Not related to data or index corruption

-> Consistent across multiple servers

This strongly indicates a storage-engine–level limitation or bug in MySQL 8.0.40

Error Log Analysis

The following repeated patterns appear in all error logs:

Mutex DICT_SYS created dict0dict.cc:1019

X-lock on RW-latch created dict0dict.cc:1027

These locks protect the global InnoDB data dictionary. A single thread holds the exclusive lock for an extended period, while other threads wait for 176–251 seconds, leading to semaphore starvation.

Blocking Thread Details :

Last time write locked in file:

storage/innobase/handler/handler0alter.cc line 7647

This identifies the failure point inside the online ALTER TABLE (INPLACE) execution path.

Source Code Correlation :

Reviewing MySQL 8.0.40 source code confirms that line 7647 corresponds to the following logic in ha_innodb::inplace_alter_table():

7643 if (new_auto_inc_val != UT_SINT64_MAX) {
7644
7645   /* Change auto-increment counter in the memory struct. */
7646
7647   dict_table_autoinc_lock(table);
7648
7649   table->set_autoinc(new_auto_inc_val);
7650
7651   dict_table_autoinc_unlock(table);

FYI - https://github.com/enhancedformysql/mysql-8.0.40/blob/main/storage/innobase/handler/handle... 

How to repeat:
Reproduction Summary :

MySQL version: 8.0.40

Table size: 13 TB

DDL operation: Online ALTER with compression

Outcome: MySQL crash after prolonged execution

Evidence :

Error logs referencing:

dict0dict.cc:1019

dict0dict.cc:1027

handler0alter.cc:7647

Semaphore waits exceeding 180 seconds

Multiple servers impacted with identical symptoms

Given the consistent reproduction and clear source-code correlation, 

Confirmation whether this is a known limitation or bug in MySQL 8.0.40 online DDL for large tables.

Here I am attaching the error logs from multiple servers same behaviour is observed in RDS MySQL 8.40 version also.