Bug #93865 innodb_online_alter_log_max_size - Inconsistency about transactions rollback
Submitted: 9 Jan 18:04 Modified: 10 Feb 13:56
Reporter: Vinicius Malvestio Grippa Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6, 5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any

[9 Jan 18:04] Vinicius Malvestio Grippa
Description:
Documentation states:

If a temporary log file exceeds the upper size limit, the ALTER TABLE operation fails and all uncommitted concurrent DML operations are rolled back.

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_online_alter_...

But when the DML fails, transactions are not rolled back. Only the DML.

How to repeat:
On Session 1 we start the DDL and after session 2 started we see the error:

master [localhost] {msandbox} (test) > alter table joinit add column test1 varchar(255);

ERROR 1799 (HY000): Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again.

On Session 2 we start a transaction after Session 1:

master [localhost] {msandbox} (test) > start transaction; update joinit set s = 'test1' limit 2000000; commit;
Query OK, 0 rows affected (0.00 sec)

Query OK, 2000000 rows affected (19.10 sec)
Rows matched: 2000000  Changed: 2000000  Warnings: 0

Query OK, 0 rows affected (0.67 sec)

It is possible to identify that Session 1 was rolled back (the column was not added):

master [localhost] {msandbox} (test) > show create table joinit\G

*************************** 1. row ***************************

       Table: joinit

Create Table: CREATE TABLE `joinit` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int(11) NOT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB AUTO_INCREMENT=12386101 DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

While the session 2 committed.

Suggested fix:
Fix the documentation where it says that DML is rolled back.
[9 Jan 18:13] Vinicius Malvestio Grippa
To make it more clear:

Open DML(started after the DDL) will be holding Metadata Lock on the said table, it won't rollback open transactions, and the error on DDL will only be seen when concurrent DML's either commit or rollback making it impossible to reach `all uncommitted concurrent DML operations are rolled back.` stated in the documentation.

Session 1
master [localhost] {msandbox} (test) > alter table joinit drop column test1;

Session 2
master [localhost] {msandbox} (test) > start transaction; update joinit set s = 'test7' limit 1000000;
Query OK, 0 rows affected (0.00 sec)

Query OK, 1000000 rows affected (24.25 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

Session 3
master [localhost] {msandbox} (test) > show processlist;
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+--------------------------------------+
| Id | User     | Host            | db   | Command     | Time | State                                                         | Info                                 |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+--------------------------------------+
|  4 | rsandbox | localhost:33426 | NULL | Binlog Dump | 8507 | Master has sent all binlog to slave; waiting for more updates | NULL                                 |
|  5 | rsandbox | localhost:33428 | NULL | Binlog Dump | 8507 | Master has sent all binlog to slave; waiting for more updates | NULL                                 |
|  6 | msandbox | localhost       | test | Query       |  117 | Waiting for table metadata lock                               | alter table joinit drop column test1 |
| 11 | msandbox | localhost       | test | Sleep       | 1671 |                                                               | NULL                                 |
| 20 | msandbox | localhost       | NULL | Sleep       | 1680 |                                                               | NULL                                 |
| 21 | msandbox | localhost       | test |
 Query       |    0 | starting                                                      | show processlist                     |
| 22 | msandbox | localhost       | test | Sleep       |  114 |                                                               | NULL                                 |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------

COMMIT:

Session 2
master [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.37 sec)

Session 1
master [localhost] {msandbox} (test) > alter table joinit drop column test1;                         ERROR 1799 (HY000): Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again.

No rollbacks have been made.
[10 Jan 13:56] Sinisa Milivojevic
Hi,

Thank you for your documentation bug report.

However, I have a few questions regarding your examples of the bad documentation.

First case:

When the certain file size exceeds its limit, the ALTER fails and all uncommitted CONCURRENT DMLs are rolled back. However, in your example, you do not have a concurrent DML. You clearly write that session 2 starts after session 1. Hence, this does not prove documentation to be wrong.

Second case:

Holding MDLs does not imply that other DMLs on that table should be rolled back. They will just wait for the MDL, if and when it is required.

Third case:

In your second example, there is this output:

Query OK, 1000000 rows affected (24.25 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

Does it come from the session 1 or session 2. In any case, both transactions are successful, so I do not see a point regarding documentation mistake.

Fourth case:

You show a processlist with only one relevant transaction, which is alter table. So, there are no concurrent DMLs, so again I can't see a point. 

Related to this  example, when DDL finally fails, there are no concurrent DMLs running, so there is nothing to roll back.
[11 Feb 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".