Description:
The innodb_ddl_buffer_size parameter does not correctly limit the memory allocation used by the first phase of an ALTER TABLE (stage/innodb/alter table (read PK and internal sort)). The first phase allocates memory under 'memory/innodb/std'. See below:
With 4G of innodb_ddl_buffer_size:
mysql [localhost:32726] {msandbox} (test) > select @@innodb_ddl_buffer_size, @@innodb_ddl_threads, @@innodb_parallel_read_threads;
+--------------------------+----------------------+--------------------------------+
| @@innodb_ddl_buffer_size | @@innodb_ddl_threads | @@innodb_parallel_read_threads |
+--------------------------+----------------------+--------------------------------+
| 4294967295 | 1 | 1 |
+--------------------------+----------------------+--------------------------------+
1 row in set (0.00 sec)
During the first stage of an ALTER TABLE:
mysql [localhost:32726] {msandbox} ((none)) > SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;
+------------------------------------------------------+----------------+----------------+
| EVENT_NAME | WORK_COMPLETED | WORK_ESTIMATED |
+------------------------------------------------------+----------------+----------------+
| stage/innodb/alter table (read PK and internal sort) | 542296 | 542296 |
+------------------------------------------------------+----------------+----------------+
1 row in set (0.00 sec)
We can see memory/innodb/std growing more than twice the allocation of innodb_ddl_buffer_size (8.12 GiB, while our limit was 4G):
mysql [localhost:32726] {msandbox} ((none)) > select * from sys.memory_global_by_current_bytes limit 10;
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/innodb/std | 2103 | 8.12 GiB | 3.96 MiB | 5680 | 8.25 GiB | 1.49 MiB |
| memory/innodb/buf_buf_pool | 16 | 2.04 GiB | 130.88 MiB | 16 | 2.04 GiB | 130.88 MiB |
| memory/innodb/memory | 203108 | 1.53 GiB | 7.92 KiB | 353953 | 14.11 GiB | 41.80 KiB |
| memory/innodb/os0event | 375256 | 48.67 MiB | 136 bytes | 375274 | 48.67 MiB | 136 bytes |
| memory/performance_schema/events_statements_summary_by_digest | 1 | 40.28 MiB | 40.28 MiB | 1 | 40.28 MiB | 40.28 MiB |
| memory/innodb/ut0link_buf | 2 | 24.00 MiB | 12.00 MiB | 2 | 24.00 MiB | 12.00 MiB |
| memory/innodb/log_buffer_memory | 1 | 16.00 MiB | 16.00 MiB | 1 | 16.00 MiB | 16.00 MiB |
| memory/performance_schema/events_statements_history_long | 1 | 14.42 MiB | 14.42 MiB | 1 | 14.42 MiB | 14.42 MiB |
| memory/performance_schema/events_errors_summary_by_thread_by_error | 257 | 13.24 MiB | 52.75 KiB | 257 | 13.24 MiB | 52.75 KiB |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 11.97 MiB | 11.97 MiB | 1 | 11.97 MiB | 11.97 MiB |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
10 rows in set (0.00 sec)
We can see it is not a performance_schema tracking issue, as the memory used by the mysqld process grows to 12G as well. Our buffer pool (buf_buf_pool) takes 2G, memory/innodb/std takes 8G, memory/innodb/memory the other 2G:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
leonard+ 3146342 76.2 3.2 24590100 12682680 pts/0 Sl 17:19 26:46 /opt/mysql/8.0.40/bin/mysqld --defaults-file=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/my.sandbox.cnf --basedir=/opt/mysql/8.0.40 --datadir=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/data --plugin-dir=/opt/mysql/8.0.40/lib/plugin --log-error=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/data/msandbox.err --pid-file=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/data/mysql_sandbox32726.pid --socket=/tmp/mysql_sandbox32726.sock --port=32726
The higher the value of innodb_ddl_buffer_size, the more memory is allocated under 'memory/innodb/std'. The memory allocated is not proportional to the number of ddl or parallel read threads.
This excessive memory consumption only happens during the first stage of the ALTER TABLE:
mysql [localhost:32726] {msandbox} ((none)) > SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;
+------------------------------------------------------+----------------+----------------+
| EVENT_NAME | WORK_COMPLETED | WORK_ESTIMATED |
+------------------------------------------------------+----------------+----------------+
| stage/innodb/alter table (read PK and internal sort) | 0 | 335583 |
+------------------------------------------------------+----------------+----------------+
1 row in set (0.00 sec)
The memory allocation of subsequent phases is tracked under 'memory/innodb/ddl', which is correctly limited by innodb_ddl_buffer_size.
How to repeat:
Steps to reproduce:
CREATE DATABASE IF NOT EXISTS test;
USE test;
CREATE TABLE `joinit` (
`i` int(11) NOT NULL AUTO_INCREMENT,
`s` varchar(64) DEFAULT NULL,
`t` time NOT NULL,
`g` int(11) NOT NULL,
PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- Insert data until the table is sufficiently large, so we have enough time to track the memory usage
INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )));
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +256 rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +512 rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +1024 rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +16k rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +1M rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +2M rows -> 150MB every 2M rows
-- Enable relevant p_s instruments and consumers:
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME LIKE 'memory/%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%stages%';
-- Set variables for the test:
set session innodb_ddl_buffer_size = 4294967295;
set session innodb_ddl_threads = 1;
set session innodb_parallel_read_threads=1;
-- Create index
alter table joinit add index `ix_cpvap_apdml_l_01` (`s`) ;
Using a different session while the index is created, watch as the memory allocation of 'memory/innodb/std' grows significantly larger than innodb_ddl_buffer_size:
select * from sys.memory_global_by_current_bytes limit 10;
This only happens while the ALTER TABLE stage is at stage/innodb/alter table (read PK and internal sort):
SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;
You can drop and recreate the index with different values of innodb_ddl_buffer_size. The memory/innodb/std memory allocation was twice the size of innodb_ddl_buffer_size with this data, but we've seen cases where its allocation is even larger.
This is not expected, as innodb_ddl_buffer_size should limit the memory used by all phases of the DDL, but the first phase can allocate more than twice that value.