Bug #106418 CHECK TABLE should evaluate GENERATED COLUMN's expression and warn of errors.
Submitted: 9 Feb 15:07
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DDL Severity:S4 (Feature request)
Version:8.0.28 OS:Any
Assigned to: CPU Architecture:Any

[9 Feb 15:07] Shane Bester
Description:
After the fix for:
Bug 33199145 - CONVERT(.. USING charset) gets wrong max_length 

instances upgraded from 8.0.27 -> 8.0.28 encounter issues with indexes on GENERATED COLUMNS that perform certain CAST/CONVERT operations.

The underlying issue is that 8.0.27 stored invalid data in the index since one cannot reliably cast from any binary data to utf16, as not all bytes are valid utf16.   Yet 8.0.27 didn't give any error/warnings.

This means 8.0.28 'inherits' invalid data but CHECK TABLE doesn't warn us.

The difference is seen:

==========================
output:
==========================
Database changed
mysql> -- on 8.0.27 do this:
mysql> 
mysql> set sql_mode='STRICT_ALL_TABLES';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create database if not exists test;
Query OK, 1 row affected (0.05 sec)

mysql> use test;
Database changed
mysql> drop table if exists t;
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> create table t (
    ->   a int,
    ->   d mediumblob,
    ->   i binary(16) generated always as (unhex(md5(cast(`d` as char(3072) charset utf16)))) VIRTUAL, 
    ->   key (i)
    -> ) engine=innodb;
Query OK, 0 rows affected (0.46 sec)

mysql> 
mysql> insert into t(a,d) values(1,'a');
Query OK, 1 row affected (0.03 sec)

mysql> show warnings;
Empty set (0.00 sec)

mysql> 
mysql> insert into t(a,d) values(2,0xDF8912DC1DFE);
Query OK, 1 row affected (0.03 sec)

mysql> show warnings;          -- NO errors/warnings :-(
Empty set (0.00 sec)

mysql> 
mysql> select * from t;
+------+----------------+------------------------------------+
| a    | d              | i                                  |
+------+----------------+------------------------------------+
|    1 | 0x61           | 0x760F753576F2955B0074758ACB4D5FA6 |
|    2 | 0xDF8912DC1DFE | 0xF61B0BFFC8117FCB099A924C9A4903AD |  --bad data!
+------+----------------+------------------------------------+
2 rows in set (0.00 sec)

mysql> show warnings;          -- NO warnings!
Empty set (0.00 sec)

mysql> check table t extended; -- NO errors!
+--------+-------+----------+----------+
| Table  | Op    | Msg_type | Msg_text |
+--------+-------+----------+----------+
| test.t | check | status   | OK       |
+--------+-------+----------+----------+
1 row in set (0.00 sec)

mysql> show warnings;
Empty set (0.00 sec)

mysql> 
mysql> select cast(0xDF8912DC1DFE as char(3072) charset utf16);
+--------------------------------------------------+
| cast(0xDF8912DC1DFE as char(3072) charset utf16) |
+--------------------------------------------------+
| ?褒?᷾                                            |
+--------------------------------------------------+
1 row in set (0.00 sec)

mysql> show warnings;
Empty set (0.00 sec)

mysql> 
mysql> set global innodb_fast_shutdown=0;
Query OK, 0 rows affected (0.00 sec)

mysql> select version();
+-------------------+
| version()         |
+-------------------+
| 8.0.27-commercial |
+-------------------+
1 row in set (0.00 sec)

mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)

-----------

8.0.28 is *better* in that it lets us know in most cases,  but it doesn't
handle pre-existing invalid data very well:

mysql> \r
Connection id:    9
Current database: test

mysql> 
mysql> select version();
+-------------------+
| version()         |
+-------------------+
| 8.0.28-commercial |
+-------------------+
1 row in set (0.00 sec)

mysql> use test;
Database changed
mysql> select * from t;
+------+----------------+------------------------------------+
| a    | d              | i                                  |
+------+----------------+------------------------------------+
|    1 | 0x61           | 0x760F753576F2955B0074758ACB4D5FA6 |
|    2 | 0xDF8912DC1DFE | NULL                               |
+------+----------------+------------------------------------+
2 rows in set, 1 warning (0.01 sec)

mysql> show warnings;
+---------+------+------------------------------------------+
| Level   | Code | Message                                  |
+---------+------+------------------------------------------+
| Warning | 1300 | Invalid utf16 character string: 'DF8912' |
+---------+------+------------------------------------------+
1 row in set (0.00 sec)

mysql> check table t extended;
+--------+-------+----------+----------+
| Table  | Op    | Msg_type | Msg_text |
+--------+-------+----------+----------+
| test.t | check | status   | OK       |
+--------+-------+----------+----------+
1 row in set (0.00 sec)

mysql> select cast(0xDF8912DC1DFE as char(3072) charset utf16);
+--------------------------------------------------+
| cast(0xDF8912DC1DFE as char(3072) charset utf16) |
+--------------------------------------------------+
| NULL                                             |
+--------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------+
| Level   | Code | Message                                  |
+---------+------+------------------------------------------+
| Warning | 1300 | Invalid utf16 character string: 'DF8912' |
+---------+------+------------------------------------------+
1 row in set (0.01 sec)

-----------

Consequence is :

Index corruption AND RBR replication may fail due to not finding rows..

mysql> delete from t order by rand();
Query OK, 2 rows affected, 2 warnings (0.02 sec)

mysql> show warnings;
+---------+------+------------------------------------------+
| Level   | Code | Message                                  |
+---------+------+------------------------------------------+
| Warning | 1300 | Invalid utf16 character string: 'DF8912' |
| Warning | 1300 | Invalid utf16 character string: 'DF8912' |
+---------+------+------------------------------------------+
2 rows in set (0.00 sec)

mysql> check table t extended;
+--------+-------+----------+----------------------------------------------------+
| Table  | Op    | Msg_type | Msg_text                                           |
+--------+-------+----------+----------------------------------------------------+
| test.t | check | Warning  | InnoDB: Index 'i' contains 1 entries, should be 0. |
| test.t | check | error    | Corrupt                                            |
+--------+-------+----------+----------------------------------------------------+
2 rows in set (0.02 sec)

How to repeat:
-- on 8.0.27 do this:

set sql_mode='STRICT_ALL_TABLES';
create database if not exists test;
use test;
drop table if exists t;
create table t (
  a int,
  d mediumblob,
  i binary(16) generated always as (unhex(md5(cast(`d` as char(3072) charset utf16)))) VIRTUAL, 
  key (i)
) engine=innodb;

insert into t(a,d) values(1,'a');
show warnings;

insert into t(a,d) values(2,0xDF8912DC1DFE);
show warnings;          -- NO errors/warnings!

select * from t;
show warnings;          -- NO warnings!
check table t extended; -- NO errors!
show warnings;

select cast(0xDF8912DC1DFE as char(3072) charset utf16);
show warnings;

set global innodb_fast_shutdown=0;
select version();
shutdown;

-- upgrade to 8.0.28 and start mysqld on the datadir.

select version();
use test;
select * from t;  
show warnings;             -- Warning! Invalid utf16 character string: 'DF8912'
check table t extended;    -- NO WARNINGS/ERRORS
select cast(0xDF8912DC1DFE as char(3072) charset utf16); -- NULL
show warnings;             -- Warning! Invalid utf16 character string: 'DF8912'

Suggested fix:
Before any modifying functions to return different values due to whatever reasons,  any tables containing references to those functions in generated columns (and indexes on them) should be marked as needing upgrade.

CHECK TABLE should also report warnings/errors.