Bug #112653 Interrupting non-atomic OPTIMIZE leads to weird error conditions.
Submitted: 6 Oct 2023 18:17 Modified: 9 Oct 2023 10:32
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[6 Oct 2023 18:17] Jean-François Gagné
Description:
Hi,

For InnoDB tables, when running an OPTIMIZE TABLE, we get the message below, which is expected.

Table does not support optimize, doing recreate + analyze instead

This recreate + analyze is what I call non-atomic in the title: for InnoDB, an OPTIMIZE is two operations.

Some "interesting", but totally legitimate, actions on a MySQL server can lead to recreate succeeding, but analyze failing.  This is what I call interrupting in the title.

I identified three such interruptions.  These lead to partial / incomplete / badly formatted messages, which I call "weird error conditions" in the title.

1. Taking a table lock being taken between the recreate and the analyze.

In this case, mySQL reports "Operation failed", but this is only partially true: the recreate succeeded and only the analyze failed.  See How to repeat for details.  I would expect a message saying the OPTIMIZE partially succeeded / failed, or avoiding this failure condition by making OPTIMIZE grab a lock on the table (if I grab the lock myself, things go as expected as shown in How to repeat).

Note that I am omitting the consequences of the analyze failing (probably performance degradations).  I am also omitting that, because OPTIMIZE is non-atomic, the recreate can complete before the analyze is done, and a query in between might not have good statistics.  Both of these are left for later or to someone else.

2. Changing super read only between the recreate and the analyze.

Again in this case, the OPTIMIZE fails while the analyze succeeds.  See How to repeat for details.  But also in this case, the normal formatting of the OPTIMIZE is not applied, which is also a problem.  I am getting the below error:

ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement

while I would expect a message like below:

Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table does not support optimize, doing recreate + analyze instead
test_jfg.t      optimize        error   The MySQL server is running with the --super-read-only option so it cannot execute this statement
test_jfg.t      optimize        status  Operation partially succeeded: the recreate completed but the analyze failed

Agin, avoiding this failure condition by making OPTIMIZE grab an exclusive lock on the table could be a solution.

3.  Changing Groupe Replication primary between the recreate and the analyze.

Again in this case, the OPTIMIZE fails while the analyze succeeds.  See How to repeat for details.  But also in this case, the normal formatting of the OPTIMIZE is not applied, which is also a problem (like for case #2 above).

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
# Create a 8.0.34 sandbox.
dbdeployer deploy single mysql_8.0.34

# Create a schema and table.
./use <<< "
  create database test_jfg;
  create table test_jfg.t(id bigint not null auto_increment primary key)
"

# Fill the table.
# n chosen for having a table that takes a little time to ALTER / OPTIMIZE.
# "paste | sed" to insert many rows per trx for speed.
n=131072
time yes "(null)" | head -n $n |
  paste -s -d "$(printf ',%.0s' {1..1000})\n" |
  sed -e 's/.*/INSERT INTO t (id) values &;/' |
  ./use test_jfg

# Grow the rows for having ALTER takes some time.
time { echo "ALTER TABLE t ADD COLUMN c0 CHAR(240) DEFAULT ''"
       seq -f " ADD COLUMN c%.0f CHAR(240) DEFAULT ''" 1 3
} | paste -s -d "," | ./use test_jfg

# Actually grow the table (because instant ADD COLUMN).
time ./use test_jfg <<< "ALTER TABLE t FORCE"

# For completeness, run an OPTIMIZE to see expected output.
time ./use test_jfg <<< "OPTIMIZE TABLE t"

##################################################
# Output of above, comments added for readability.

# Inserts.
real    0m2.297s
user    0m0.152s
sys     0m0.028s

# ALTER ADD.
real    0m0.056s
user    0m0.015s
sys     0m0.001s

# ALTER FORCE.
real    0m16.201s
user    0m0.008s
sys     0m0.004s

Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table does not support optimize, doing recreate + analyze instead
test_jfg.t      optimize        status  OK

real    0m13.389s
user    0m0.011s
sys     0m0.000s

# Case #1: analyze failing because Lock wait timeout exceeded.
# This function runs the sql statement given as argument
#   but at the same time, a 1 second delayed lock table.
# The sleep before the kill it to avoid killing before
#  the output of the select.
function f() {
  ./use test_jfg -N <<< "
    do sleep(1);
    lock table t write;
    select 'Completed lock at:', now();
    do sleep(60)" &
  date; ./use test_jfg <<< "$1"; date
  sleep 1; kill %1
}

# OPTIMIZE is run with a lock timeout of 1 to make it fail quickly.
# Explanation of what is run with output below.
c="ls -li data/test_jfg/t.ibd"; \
  $c; f "set lock_wait_timeout = 1; do sleep(2); OPTIMIZE TABLE t"; $c; \
  sleep 5; echo; $c; f "set lock_wait_timeout = 1; OPTIMIZE TABLE t"; $c; \
  sleep 5; echo; f "set lock_wait_timeout = 1; lock table t write; OPTIMIZE TABLE t"

###################################################
# Output of above, comments added for explanations.

# If we wait 2 seconds before running OPTIMIZE, it fails at the recreate step
#   and we see the file is not modified because it has the same inode.
1163796 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:14 data/test_jfg/t.ibd
[1] 4542
Fri Oct  6 17:15:27 UTC 2023
Completed lock at:      2023-10-06 17:15:28
Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        Error   Lock wait timeout exceeded; try restarting transaction
test_jfg.t      optimize        status  Operation failed
Fri Oct  6 17:15:30 UTC 2023
1163796 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:14 data/test_jfg/t.ibd
[1]+  Terminated              ./use test_jfg -N <<< "
    do sleep(1);
    lock table t write;
    select 'Completed lock at:', now();
    do sleep(60)"

# If we do not wait before running OPTIMIZE, it fails at the analyze step
#   which we can infer with the traces (lock grabbing taking more time,
#   and OPTIMIZE failing just after lock grabbing).  We also see that the inode
#   of the file changes, which means the recreate completed.
1163796 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:14 data/test_jfg/t.ibd
[1] 4552
Fri Oct  6 17:15:36 UTC 2023
Completed lock at:      2023-10-06 17:15:50
Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table does not support optimize, doing recreate + analyze instead
test_jfg.t      optimize        error   Lock wait timeout exceeded; try restarting transaction
test_jfg.t      optimize        status  Operation failed
Fri Oct  6 17:15:51 UTC 2023
1163800 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:15 data/test_jfg/t.ibd
[1]+  Terminated              ./use test_jfg -N <<< "
    do sleep(1);
    lock table t write;
    select 'Completed lock at:', now();
    do sleep(60)"

# If we grab a lock before running OPTIMIZE, the operation is not interrupted.
[1] 4621
Fri Oct  6 17:15:57 UTC 2023
Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table does not support optimize, doing recreate + analyze instead
test_jfg.t      optimize        status  OK
Completed lock at:      2023-10-06 17:16:11
Fri Oct  6 17:16:11 UTC 2023

# Case #2: analyze failing because of super read only.
# This function runs the sql statement given as argument
#   but at the same time, a 1 second delayed set read only.
function f() {
  ./use -N <<< "
    do sleep(1);
    set global super_read_only = 1;
    select 'Completed setting sro at:', now()" &
  date; ./use test_jfg <<< "$1"; date
  wait
  ./use <<< "set global super_read_only = 0"
}

# Explanation of what is run with output below.
f "ALTER TABLE t FORCE"; \
  echo; $c; f "OPTIMIZE TABLE t"; $c; \
  echo; f "lock table t write; OPTIMIZE TABLE t"

# An ALTER TABLE blocks setting super read only, and completes without error.
[1] 4855
Fri Oct  6 17:23:08 UTC 2023
Completed setting sro at:       2023-10-06 17:23:21
Fri Oct  6 17:23:21 UTC 2023
[1]+  Done                    ./use -N <<< "
    do sleep(1);
    set global super_read_only = 1;
    select 'Completed setting sro at:', now()"

# Setting super read only makes the OPTIMIZE fail after the recreate
#   and before the analyze as shown by timestamps and inode.
# Also, the output of OPTIMIZE is inconsistent with the normal output
#   (missing the "does not support optimize" and other lines).
1163800 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:23 data/test_jfg/t.ibd
[1] 4868
Fri Oct  6 17:23:21 UTC 2023
Completed setting sro at:       2023-10-06 17:23:39
ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement
[1]+  Done                    ./use -N <<< "
    do sleep(1);
    set global super_read_only = 1;
    select 'Completed setting sro at:', now()"
Fri Oct  6 17:23:39 UTC 2023
1163796 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:23 data/test_jfg/t.ibd

# If we grab a lock before running OPTIMIZE, the operation is not interrupted.
[1] 4880
Fri Oct  6 17:23:39 UTC 2023
Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table, doing recreate + analyze instead
test_jfg.t      optimize        status  OK
Completed setting sro at:       2023-10-06 17:23:54
Fri Oct  6 17:23:54 UTC 2023
[1]+  Done                    ./use -N <<< "
    do sleep(1);
    set global super_read_only = 1;
    select 'Completed setting sro at:', now()"

##########################################
# Below needs a Group Replication sandbox.

dbdeployer deploy --topology=group replication mysql_8.0.34 --single-primary

# Same environment as above.
./n1 <<< "
  create database test_jfg;
  create table test_jfg.t(id bigint not null auto_increment primary key)
"

n=131072
yes "(null)" | head -n $n |
  paste -s -d "$(printf ',%.0s' {1..1000})\n" |
  sed -e 's/.*/INSERT INTO t (id) values &;/' |
  ./n1 test_jfg

{ echo "ALTER TABLE t ADD COLUMN c0 CHAR(240) DEFAULT ''"
       seq -f " ADD COLUMN c%.0f CHAR(240) DEFAULT ''" 1 3
} | paste -s -d "," | ./n1 test_jfg

./n1 test_jfg <<< "ALTER TABLE t FORCE"

# Wait for everything to replicate.
gtid="$(./n1 -N <<< "select @@gtid_executed")"; \
  ./n2 -N <<< "select WAIT_FOR_EXECUTED_GTID_SET('$gtid')" > /dev/null; \
  ./n3 -N <<< "select WAIT_FOR_EXECUTED_GTID_SET('$gtid')" > /dev/null

# Case #3: analyze failing because of primary change.
# This function runs the sql statement given as argument
#   but at the same time, a 1 second delayed primary change.
function f() {
  ./n2 -N <<< "
    do sleep(1);
    select group_replication_set_as_primary(@@server_uuid);
    select 'Completed set primary at:', now()" &
  date; ./n1 test_jfg <<< "$1"; date
  wait
  ./n1 -N <<< "select group_replication_set_as_primary(@@server_uuid)"
}

# Explanation of what is run with output below.
c="ls -li node1/data/test_jfg/t.ibd"; \
  f "ALTER TABLE t FORCE"; \
  echo; $c; f "OPTIMIZE TABLE t"; $c; \
  echo; f "lock table t write; OPTIMIZE TABLE t"

# The ALTER TABLE blocks the primary change, and completes without error.
[1] 7666
Fri Oct  6 17:47:29 UTC 2023
Fri Oct  6 17:47:44 UTC 2023
Primary server switched to: 00024436-2222-2222-2222-222222222222
Completed set primary at:       2023-10-06 17:47:57
[1]+  Done                    ./n2 -N <<< "
    do sleep(1);
    select group_replication_set_as_primary(@@server_uuid);
    select 'Completed set primary at:', now()"
Primary server switched to: 00024435-1111-1111-1111-111111111111

# Changing the primary makes the OPTIMIZE fail after the recreate
#   and before the analyze (timestamps and inode).
# Also, the output of OPTIMIZE is inconsistent with the normal output
#   (missing the "does not support optimize" and others).
3965954 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:47 node1/data/test_jfg/t.ibd
[1] 7699
Fri Oct  6 17:47:57 UTC 2023
Primary server switched to: 00024436-2222-2222-2222-222222222222
Completed set primary at:       2023-10-06 17:48:12
ERROR 1290 (HY000) at line 1: The MySQL server is running with the --super-read-only option so it cannot execute this statement
[1]+  Done                    ./n2 -N <<< "
    do sleep(1);
    select group_replication_set_as_primary(@@server_uuid);
    select 'Completed set primary at:', now()"
Fri Oct  6 17:48:12 UTC 2023
Primary server switched to: 00024435-1111-1111-1111-111111111111
3965955 -rw-r-----. 1 jgagne jgagne 171966464 Oct  6 17:48 node1/data/test_jfg/t.ibd

# If we grab a lock before running OPTIMIZE, the operation is not interrupted.
[1] 7726
Fri Oct  6 17:48:12 UTC 2023
Table   Op      Msg_type        Msg_text
test_jfg.t      optimize        note    Table does not support optimize, doing recreate + analyze instead
test_jfg.t      optimize        status  OK
Fri Oct  6 17:48:25 UTC 2023
Primary server switched to: 00024436-2222-2222-2222-222222222222
Completed set primary at:       2023-10-06 17:48:41
[1]+  Done                    ./n2 -N <<< "
    do sleep(1);
    select group_replication_set_as_primary(@@server_uuid);
    select 'Completed set primary at:', now()"
Primary server switched to: 00024435-1111-1111-1111-111111111111

Suggested fix:
When OPTIMIZE partially succeeds / fails, have an error reporting this, not just Operation failed.

For the read only and primari change cases, have an OPTIMIZE output which is consistent, not just a single ERROR 1290 (HY000) line.

Consider making OPTIMIZE atomic, and at the same time, make impossible to have a query "see" the recreated table before the analyse completes.
[9 Oct 2023 10:32] MySQL Verification Team
Salut Mr. Gagne,

Thank you for your bug report.

We have managed to recreate the behaviour in both 8.0 and 8.1.

Regarding the error report, that is a low severity bug, while making OPTIMIZE atomic is a feature request.

Verified as reported.