Bug #117099 Variable innodb_ddl_buffer_size does not limit memory usage in the first phase of an ALTER TABLE
Submitted: 2 Jan 18:20 Modified: 8 Jan 15:12
Reporter: Leonardo Fernandes Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0.40 OS:Any
Assigned to: CPU Architecture:Any

[2 Jan 18:20] Leonardo Fernandes
Description:
The innodb_ddl_buffer_size parameter does not correctly limit the memory allocation used by the first phase of an ALTER TABLE (stage/innodb/alter table (read PK and internal sort)). The first phase allocates memory under 'memory/innodb/std'. See below:

With 4G of innodb_ddl_buffer_size:

mysql [localhost:32726] {msandbox} (test) > select @@innodb_ddl_buffer_size, @@innodb_ddl_threads, @@innodb_parallel_read_threads;
+--------------------------+----------------------+--------------------------------+
| @@innodb_ddl_buffer_size | @@innodb_ddl_threads | @@innodb_parallel_read_threads |
+--------------------------+----------------------+--------------------------------+
|               4294967295 |                    1 |                              1 |
+--------------------------+----------------------+--------------------------------+
1 row in set (0.00 sec)

During the first stage of an ALTER TABLE:

mysql [localhost:32726] {msandbox} ((none)) > SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED        FROM performance_schema.events_stages_current;
+------------------------------------------------------+----------------+----------------+
| EVENT_NAME                                           | WORK_COMPLETED | WORK_ESTIMATED |
+------------------------------------------------------+----------------+----------------+
| stage/innodb/alter table (read PK and internal sort) |         542296 |         542296 |
+------------------------------------------------------+----------------+----------------+
1 row in set (0.00 sec)

We can see memory/innodb/std growing more than twice the allocation of innodb_ddl_buffer_size (8.12 GiB, while our limit was 4G):

mysql [localhost:32726] {msandbox} ((none)) > select * from sys.memory_global_by_current_bytes limit 10;
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| event_name                                                                  | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
| memory/innodb/std                                                           |          2103 | 8.12 GiB      | 3.96 MiB          |       5680 | 8.25 GiB   | 1.49 MiB       |
| memory/innodb/buf_buf_pool                                                  |            16 | 2.04 GiB      | 130.88 MiB        |         16 | 2.04 GiB   | 130.88 MiB     |
| memory/innodb/memory                                                        |        203108 | 1.53 GiB      | 7.92 KiB          |     353953 | 14.11 GiB  | 41.80 KiB      |
| memory/innodb/os0event                                                      |        375256 | 48.67 MiB     |  136 bytes        |     375274 | 48.67 MiB  |  136 bytes     |
| memory/performance_schema/events_statements_summary_by_digest               |             1 | 40.28 MiB     | 40.28 MiB         |          1 | 40.28 MiB  | 40.28 MiB      |
| memory/innodb/ut0link_buf                                                   |             2 | 24.00 MiB     | 12.00 MiB         |          2 | 24.00 MiB  | 12.00 MiB      |
| memory/innodb/log_buffer_memory                                             |             1 | 16.00 MiB     | 16.00 MiB         |          1 | 16.00 MiB  | 16.00 MiB      |
| memory/performance_schema/events_statements_history_long                    |             1 | 14.42 MiB     | 14.42 MiB         |          1 | 14.42 MiB  | 14.42 MiB      |
| memory/performance_schema/events_errors_summary_by_thread_by_error          |           257 | 13.24 MiB     | 52.75 KiB         |        257 | 13.24 MiB  | 52.75 KiB      |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |             1 | 11.97 MiB     | 11.97 MiB         |          1 | 11.97 MiB  | 11.97 MiB      |
+-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
10 rows in set (0.00 sec)

We can see it is not a performance_schema tracking issue, as the memory used by the mysqld process grows to 12G as well. Our buffer pool (buf_buf_pool) takes 2G, memory/innodb/std takes 8G, memory/innodb/memory the other 2G:
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
leonard+ 3146342 76.2  3.2 24590100 12682680 pts/0 Sl 17:19  26:46 /opt/mysql/8.0.40/bin/mysqld --defaults-file=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/my.sandbox.cnf --basedir=/opt/mysql/8.0.40 --datadir=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/data --plugin-dir=/opt/mysql/8.0.40/lib/plugin --log-error=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/data/msandbox.err --pid-file=/home/leonardo.bacchi.fernandes/sandboxes/ps_8_0_40/data/mysql_sandbox32726.pid --socket=/tmp/mysql_sandbox32726.sock --port=32726

The higher the value of innodb_ddl_buffer_size, the more memory is allocated under 'memory/innodb/std'. The memory allocated is not proportional to the number of ddl or parallel read threads.

This excessive memory consumption only happens during the first stage of the ALTER TABLE:

mysql [localhost:32726] {msandbox} ((none)) > SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED        FROM performance_schema.events_stages_current;
+------------------------------------------------------+----------------+----------------+
| EVENT_NAME                                           | WORK_COMPLETED | WORK_ESTIMATED |
+------------------------------------------------------+----------------+----------------+
| stage/innodb/alter table (read PK and internal sort) |              0 |         335583 |
+------------------------------------------------------+----------------+----------------+
1 row in set (0.00 sec)

The memory allocation of subsequent phases is tracked under 'memory/innodb/ddl', which is correctly limited by innodb_ddl_buffer_size.

How to repeat:
Steps to reproduce:

CREATE DATABASE IF NOT EXISTS test;

USE test;

CREATE TABLE `joinit` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int(11) NOT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

-- Insert data until the table is sufficiently large, so we have enough time to track the memory usage

INSERT INTO joinit VALUES (NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )));
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +256 rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +512 rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +1024 rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +16k rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +1M rows
INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit; -- +2M rows -> 150MB every 2M rows

-- Enable relevant p_s instruments and consumers:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME LIKE 'memory/%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%stages%';

-- Set variables for the test:
set session innodb_ddl_buffer_size = 4294967295;
set session innodb_ddl_threads = 1;
set session innodb_parallel_read_threads=1;

-- Create index 
alter table joinit add index `ix_cpvap_apdml_l_01` (`s`) ;

Using a different session while the index is created, watch as the memory allocation of 'memory/innodb/std' grows significantly larger than innodb_ddl_buffer_size:
 
select * from sys.memory_global_by_current_bytes limit 10;

This only happens while the ALTER TABLE stage is at stage/innodb/alter table (read PK and internal sort):

SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;

You can drop and recreate the index with different values of innodb_ddl_buffer_size. The memory/innodb/std memory allocation was twice the size of innodb_ddl_buffer_size with this data, but we've seen cases where its allocation is even larger. 

This is not expected, as innodb_ddl_buffer_size should limit the memory used by all phases of the DDL, but the first phase can allocate more than twice that value.
[8 Jan 15:12] MySQL Verification Team
Hello Leonardo,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh