MySQL Bugs: #97913: Undo logs growing during partitioning ALTER queries

Bug #97913	Undo logs growing during partitioning ALTER queries
Submitted:	6 Dec 2019 11:15	Modified:	9 Dec 2019 4:10
Reporter:	Przemyslaw Malkowski	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7.28	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	rollback, trx_rseg_current_size, undo

Description:
While an ALTER performing partitioning operations is in progress, the undo logs are growing significantly until the query finishes. 
This happens even when there is no other workload on the system. 

Interestingly, I don't see the same behavior for other ALTERs, including those using COPY algorithms. 
I also cannot reproduce this problem on MySQL 8.0.18. 

I don't see any mention in the documentation that the undo log may be used for any DDLs.

An example overhead for simple ~500MB table may be around 200MB growth in rollback segment size. For big tables, this may lead to significant issues with ibdata1 file size.
The undo size can be monitored with innodb metrics, like below.

Short before the ALTER is done, it looks like this:

mysql [localhost:5733] {msandbox} ((none)) > select name, subsystem, count, comment from information_schema.innodb_metrics where name like '%undo%' or name like '%rseg%';
+------------------------------+----------------+-------+-----------------------------------------------+
| name                         | subsystem      | count | comment                                       |
+------------------------------+----------------+-------+-----------------------------------------------+
| buffer_page_read_undo_log    | buffer_page_io |    10 | Number of Undo Log Pages read                 |
| buffer_page_written_undo_log | buffer_page_io | 28643 | Number of Undo Log Pages written              |
| trx_rseg_history_len         | transaction    |    58 | Length of the TRX_RSEG_HISTORY list           |
| trx_undo_slots_used          | transaction    |    48 | Number of undo slots used                     |
| trx_undo_slots_cached        | transaction    |    47 | Number of undo slots cached                   |
| trx_rseg_current_size        | transaction    | 14258 | Current rollback segment size in pages        |
| purge_undo_log_pages         | purge          |    58 | Number of undo log pages handled by the purge |
+------------------------------+----------------+-------+-----------------------------------------------+
7 rows in set (0.00 sec)

mysql [localhost:5733] {msandbox} ((none)) > SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;
+-----------------------------+----------------+----------------+
| EVENT_NAME                  | WORK_COMPLETED | WORK_ESTIMATED |
+-----------------------------+----------------+----------------+
| stage/sql/copy to tmp table |       15676237 |       16355398 |
+-----------------------------+----------------+----------------+
1 row in set (0.00 sec)

And right after it's done:

mysql [localhost:5733] {msandbox} ((none)) > select name, subsystem, count, comment from information_schema.innodb_metrics where name like '%undo%' or name like '%rseg%';
+------------------------------+----------------+-------+-----------------------------------------------+
| name                         | subsystem      | count | comment                                       |
+------------------------------+----------------+-------+-----------------------------------------------+
| buffer_page_read_undo_log    | buffer_page_io |    10 | Number of Undo Log Pages read                 |
| buffer_page_written_undo_log | buffer_page_io | 30254 | Number of Undo Log Pages written              |
| trx_rseg_history_len         | transaction    |    75 | Length of the TRX_RSEG_HISTORY list           |
| trx_undo_slots_used          | transaction    |    76 | Number of undo slots used                     |
| trx_undo_slots_cached        | transaction    |    76 | Number of undo slots cached                   |
| trx_rseg_current_size        | transaction    |   242 | Current rollback segment size in pages        |
| purge_undo_log_pages         | purge          |    75 | Number of undo log pages handled by the purge |
+------------------------------+----------------+-------+-----------------------------------------------+
7 rows in set (0.00 sec)

How to repeat:
$ dbdeployer deploy single -c "innodb_monitor_enable=module_dml,module_trx,module_purge,module_buffer_page,module_undo" 5.7.28
Database installed in $HOME/sandboxes/msb_5_7_28
run 'dbdeployer usage single' for basic instructions'
. sandbox server started

$ msb_5_7_28/use test

mysql [localhost:8019] {msandbox} (test) > create table t1 (id int auto_increment primary key, a varchar(20), b int);
Query OK, 0 rows affected (0.06 sec)

mysql [localhost:8019] {msandbox} (test) > insert into t1(a) values ("foobar"),("foobar"),("foo"),("foo");
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0

(...)

mysql [localhost:8019] {msandbox} (test) > insert into t1(a) select a from t1;
Query OK, 8388608 rows affected (1 min 19.76 sec)
Records: 8388608  Duplicates: 0  Warnings: 0

mysql [localhost:8019] {msandbox} (test) > ALTER TABLE t1 PARTITION BY HASH(id) partitions 4;
Query OK, 16777216 rows affected (1 min 26.78 sec)
Records: 16777216  Duplicates: 0  Warnings: 0

While ALTER is ongoing, monitor the rseg size using information_schema.innodb_metrics as shown above.

Suggested fix:
If that's unexpected behavior - fix it to avoid undo size problem.
If that's expected, please update the manual to explain it, also why it doesn't happen in version 8.0.

Hello Przemyslaw,

Thank you for the report.

regards,
Umesh

Test results - 5.7.28

Attachment: 97913_5.7.28.results (application/octet-stream, text), 7.38 KiB.

I assume this ALTER executes as COPY algorithm. Then one bug fix from friend thiru could be preventing extra undo logs in 8.0

The improvement is done as a bug fix and only in 8.0:

My friend Thiru did it:

commit b5f211744b49d620d00fdecd13d9af09ef26c15b
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date: Wed Dec 2 16:09:15 2015 +0530

Bug #17479594 AVOID INTERMEDIATE COMMIT WHILE DOING
ALTER TABLE ALGORITHM=COPY

Problem:
=======
During ALTER TABLE, we commit and restart the transaction
for every 10,000 rows, so that the rollback after recovery would not take
so long.

Fix:
====
Supress the undo logging during copy alter operation. If fts_index is
present then insert directly into fts auxiliary table rather
than doing at commit time.

ha_innobase::num_write_row: Remove the variable.
ha_innobase::write_row(): Remove the hack for committing every 10000 rows.
row_lock_table_for_mysql(): Remove the extra 2 parameters.
lock_get_src_table(), lock_is_table_exclusive(): Remove the function.

Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
RB: 10419

Or something from Atomic DDLs which converted operations to inplace APIs.

FYI: https://github.com/mysql/mysql-server/commit/b5f211744b49d620d00fdecd13d9af09ef26c15b