Description:
Hi Team,
We are implementing table compression on very large tables in our prod MySQL. One of the affected tables helpdesk_note_bodies, is around 13 TB in size.
ALTER statement used :
ALTER TABLE helpdesk_note_bodies ROW_FORMAT=COMPRESSED, KEY_BLOCK_SIZE=8, ALGORITHM=INPLACE, LOCK=NONE;
To isolate the operation:
We created a copy of prod instance, Executed the ALTER with no application workload
Despite this, the ALTER ran for over 9 days and eventually MySQL crashed.
We have observed the same failure pattern across four different servers when running this ALTER..
Observed Behavior :
All observed crashes point to InnoDB dictionary latch starvation during online DDL on multi-TB tables using ROW_FORMAT=COMPRESSED.
This issue is reproducible with no writes
-> Independent of query workload
-> Not related to data or index corruption
-> Consistent across multiple servers
This strongly indicates a storage-engine–level limitation or bug in MySQL 8.0.40
Error Log Analysis
The following repeated patterns appear in all error logs:
Mutex DICT_SYS created dict0dict.cc:1019
X-lock on RW-latch created dict0dict.cc:1027
These locks protect the global InnoDB data dictionary. A single thread holds the exclusive lock for an extended period, while other threads wait for 176–251 seconds, leading to semaphore starvation.
Blocking Thread Details :
Last time write locked in file:
storage/innobase/handler/handler0alter.cc line 7647
This identifies the failure point inside the online ALTER TABLE (INPLACE) execution path.
Source Code Correlation :
Reviewing MySQL 8.0.40 source code confirms that line 7647 corresponds to the following logic in ha_innodb::inplace_alter_table():
7643 if (new_auto_inc_val != UT_SINT64_MAX) {
7644
7645 /* Change auto-increment counter in the memory struct. */
7646
7647 dict_table_autoinc_lock(table);
7648
7649 table->set_autoinc(new_auto_inc_val);
7650
7651 dict_table_autoinc_unlock(table);
FYI - https://github.com/enhancedformysql/mysql-8.0.40/blob/main/storage/innobase/handler/handle...
How to repeat:
Reproduction Summary :
MySQL version: 8.0.40
Table size: 13 TB
DDL operation: Online ALTER with compression
Outcome: MySQL crash after prolonged execution
Evidence :
Error logs referencing:
dict0dict.cc:1019
dict0dict.cc:1027
handler0alter.cc:7647
Semaphore waits exceeding 180 seconds
Multiple servers impacted with identical symptoms
Given the consistent reproduction and clear source-code correlation,
Confirmation whether this is a known limitation or bug in MySQL 8.0.40 online DDL for large tables.
Here I am attaching the error logs from multiple servers same behaviour is observed in RDS MySQL 8.40 version also.
Description: Hi Team, We are implementing table compression on very large tables in our prod MySQL. One of the affected tables helpdesk_note_bodies, is around 13 TB in size. ALTER statement used : ALTER TABLE helpdesk_note_bodies ROW_FORMAT=COMPRESSED, KEY_BLOCK_SIZE=8, ALGORITHM=INPLACE, LOCK=NONE; To isolate the operation: We created a copy of prod instance, Executed the ALTER with no application workload Despite this, the ALTER ran for over 9 days and eventually MySQL crashed. We have observed the same failure pattern across four different servers when running this ALTER.. Observed Behavior : All observed crashes point to InnoDB dictionary latch starvation during online DDL on multi-TB tables using ROW_FORMAT=COMPRESSED. This issue is reproducible with no writes -> Independent of query workload -> Not related to data or index corruption -> Consistent across multiple servers This strongly indicates a storage-engine–level limitation or bug in MySQL 8.0.40 Error Log Analysis The following repeated patterns appear in all error logs: Mutex DICT_SYS created dict0dict.cc:1019 X-lock on RW-latch created dict0dict.cc:1027 These locks protect the global InnoDB data dictionary. A single thread holds the exclusive lock for an extended period, while other threads wait for 176–251 seconds, leading to semaphore starvation. Blocking Thread Details : Last time write locked in file: storage/innobase/handler/handler0alter.cc line 7647 This identifies the failure point inside the online ALTER TABLE (INPLACE) execution path. Source Code Correlation : Reviewing MySQL 8.0.40 source code confirms that line 7647 corresponds to the following logic in ha_innodb::inplace_alter_table(): 7643 if (new_auto_inc_val != UT_SINT64_MAX) { 7644 7645 /* Change auto-increment counter in the memory struct. */ 7646 7647 dict_table_autoinc_lock(table); 7648 7649 table->set_autoinc(new_auto_inc_val); 7650 7651 dict_table_autoinc_unlock(table); FYI - https://github.com/enhancedformysql/mysql-8.0.40/blob/main/storage/innobase/handler/handle... How to repeat: Reproduction Summary : MySQL version: 8.0.40 Table size: 13 TB DDL operation: Online ALTER with compression Outcome: MySQL crash after prolonged execution Evidence : Error logs referencing: dict0dict.cc:1019 dict0dict.cc:1027 handler0alter.cc:7647 Semaphore waits exceeding 180 seconds Multiple servers impacted with identical symptoms Given the consistent reproduction and clear source-code correlation, Confirmation whether this is a known limitation or bug in MySQL 8.0.40 online DDL for large tables. Here I am attaching the error logs from multiple servers same behaviour is observed in RDS MySQL 8.40 version also.