Bug #104626 Remove failure of Online ALTER because concurrent Duplicate entry.
Submitted: 15 Aug 2021 17:52 Modified: 16 Aug 2021 5:43
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.7.35, 8.0.26 OS:Any
Assigned to: CPU Architecture:Any

[15 Aug 2021 17:52] Jean-François Gagné
Description:
Hi,

it is "well known" that an online ALTER can fail because of a concurrent Duplicate entry, details in How to repeat.  This has already been described in Bug#76895 and Bug#98600 but to my knowledge, no feature request exists to make thing better.  This report is addressing this.

In Bug#104608, I already filed a feature request to make the error message more meaningful.  This new bug report is to address the problem that a non-special ALTER is failing, which is a user unfriendly behavior.  I would expect a "simple" ALTER TABLE without any special options to succeed without "weird" error conditions, which is not the case right now (Bug#76895).  I would also expect the same from OPTIMIZE TABLE (Bug#98600).

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
# Create a sandbox:
dbdeployer deploy single mysql_8.0.26

# Initialize the test environment:
./use <<< "
  CREATE DATABASE test_jfg;
  CREATE TABLE test_jfg.t(
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    v INT, v2 INT,
    UNIQUE KEY(v))"
 
# Make rows big to be able to quickly generate a big table (each row is close to 8 KB):
seq -f "ALTER TABLE t add column c%02.0f CHAR(255) DEFAULT '';" 1 30 | ./use test_jfg

# Make commit fast so loading is not too long:
./use <<< "set global sync_binlog = 0, global innodb_flush_log_at_trx_commit = 0"

# Fill the table with about 1 GB of data:
seq -f "INSERT INTO t (v) VALUES (%0.0f);" 1 $((1024/8*1024)) | ./use test_jfg

# Now generate the error:
( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)"; echo "INSERT $(date)"; ) & \
  ./use test_jfg <<< "ALTER TABLE t DROP COLUMN c01"; echo "ALTER $(date)"
[1] 9321
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 't.v'
INSERT Thu Aug 12 22:28:55 UTC 2021
ERROR 1062 (23000) at line 1: Duplicate entry '1' for key 't.v'
[1]+  Done                    ( sleep 1; ./use test_jfg <<< "INSERT INTO t (v) VALUES (1)"; echo "INSERT $(date)" )
ALTER Thu Aug 12 22:30:17 UTC 2021

Suggested fix:
Ideally, Online DDL algorithm would be improved to remove this failure condition.

A more "complicated" solution from a backward compatibility point of view would be to go back to the MySQL 5.5 default behavior.  It looks like this is possible by setting the old_alter_table variable to ON (globally or at the session level), but this feels backward to me as I would expect command to "succeed by default", not to "error in some obscure conditions".  My understanding of Oracle orientation for MySQL might be wrong here: maybe Oracle is going into the direction of "non-locking" by default to make DML succeed as "often and fast" as possible, but in this case, this orientation should be documented somewhere, and I did not find anything in the Online DDL section of the manual about this.

Another solution could be to deprecate and eventually make ALTER TABLE (and OPTIMIZE TABLE) fail when the LOCK option is not specified.  This would force the user to clearly specify their expectation on the behavior of the DDL (non-locking at the risk of failure, or no risk of failure at the cost of locking), and this would also need Bug#104608 to be fixed so when the DDL fails because of a concurrent Duplicate entry, the problem is fully understandable and that the LOCK=EXCLUSIVE solution is explicitly presented without the user having to read the manual or search the Internet.

(Sorry, I do not have any easy or super good solution for fixing this, it looks like we will be in some sort of Online DDL purgatory with this for some time.)
[16 Aug 2021 5:43] MySQL Verification Team
Hello Jean-François,

Thank you for the feature request!

regards,
Umesh