Bug #104984 Failed to upgrade to mysql 8.0 because of the character set of the field comment
Submitted: 18 Sep 11:21 Modified: 22 Sep 6:16
Reporter: Xiyan Xu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Data Dictionary Severity:S2 (Serious)
Version:8.0.23, 8.0.26 OS:CentOS
Assigned to: CPU Architecture:x86
Tags: character set

[18 Sep 11:21] Xiyan Xu
Description:
When upgrading a mysql instance to the mysql 8.0 or cloning a replicaion from a mysql 8.0 instance with the assumption that there are several tables with different character sets for the field comment with Chinese in a mysql instance, the following will happen.

2021-07-27T14:15:59.267828+08:00 1 [System] [MY-011012] [Server] Starting upgrade of data directory.
2021-07-27T14:15:59.267942+08:00 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-07-27T14:16:11.478109+08:00 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-07-27T14:16:11.719225+08:00 2 [Warning] [MY-010772] [Server] db.opt file not found for binlog database. Using default Character set.
2021-07-27T14:16:11.731900+08:00 2 [ERROR] [MY-013140] [Server] Invalid utf8 character string: 'C9F3C5'
2021-07-27T14:16:11.733024+08:00 2 [ERROR] [MY-013140] [Server] Invalid utf8 character string: 'C9F3C5'
2021-07-27T14:16:12.687947+08:00 2 [Warning] [MY-010772] [Server] db.opt file not found for tmpdir database. Using default Character set.
2021-07-27T14:16:12.992579+08:00 0 [ERROR] [MY-010022] [Server] Failed to Populate DD tables.
2021-07-27T14:16:12.994141+08:00 0 [ERROR] [MY-010119] [Server] Aborting

How to repeat:
First:

mysql> set names gbk;
mysql> create a table sakila_for_gbk(id int, name varchar(45) comment '中文');
mysql> set names uft8;
mysql> create a table sakila_for_utf8(id int, name varchar(45) comment '中文');

Second:

Upgrade a mysql 5.7 to mysql 8.0 or clone a replication.

Suggested fix:
Problem:

In the file 'sql/sql_table.cc' at the line 4518:

  LEX_CSTRING comment_cstr = {sql_field->comment.str,
                              sql_field->comment.length};
  if (is_invalid_string(comment_cstr, system_charset_info)) return true;

  if (validate_comment_length(thd, sql_field->comment.str,
                              &sql_field->comment.length, COLUMN_COMMENT_MAXLEN,
                              ER_TOO_LONG_FIELD_COMMENT, sql_field->field_name))
    return true;

Suggestion:

Record the character info for every table comment and validate it by different character info at table level, instead of using the system character info to be compatiable for all of tables.
[20 Sep 6:35] MySQL Verification Team
Hello Xiyan Xu,

Thank you for the report and feedback.
I quickly tried in-place upgrading from 5.7.35 to 8.0.26(and even 8.0.23) but not seeing any issues. I'll be joining the upgrade activity details for your reference. Is there anything that I'm missing here? Please let me know.

regards,
Umesh
[20 Sep 6:37] MySQL Verification Team
5.7.35 to 8.0.26/23 upgrade activity log

Attachment: 104984_5.7.35_8.0.26_23.results (application/octet-stream, text), 69.65 KiB.

[20 Sep 10:14] Xiyan Xu
Repeat as follow:

[root@s3cret /data1]# cat /data1/mysql/19099/my.cnf | grep character
character_set_server = utf8
default_character_set = utf8
default_character_set = utf8

[root@s3cret /data1]# /usr/local/mysql5.7.26/bin/mysqld --initialize --basedir=/usr/local/mysql5.7.26 --user=mysql --datadir=/data1/mysql/19099/        
2021-09-20T08:23:48.876426Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2021-09-20T08:23:49.053837Z 0 [Warning] InnoDB: New log files created, LSN=45790
2021-09-20T08:23:49.081801Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2021-09-20T08:23:49.134710Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 14aaa636-19ec-11ec-8397-505dac52c93e.
2021-09-20T08:23:49.134977Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2021-09-20T08:23:49.135468Z 1 [Note] A temporary password is generated for root@localhost: S3Gw_#UBNoqf

setsid /usr/local/mysql5.7.26/bin/mysqld_safe --defaults-file=/data1/mysql/19099/my.cnf &

# session character set: GB18130
# Description: Chinese character is visible.

echo "set names utf8; create table sakila.sakila_for_gbk(id int, name varchar(45) comment '中文');" | /usr/local/mysql5.7.26/bin/mysql -uroot -ps3cret -S /data1/mysql/19099/mysql.sock

echo "set names utf8; show create table sakila.sakila_for_gbk\G" | /usr/local/mysql5.7.26/bin/mysql -uroot -ps3cret -S /data1/mysql/19099/mysql.sock
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
       Table: sakila_for_gbk
Create Table: CREATE TABLE `sakila_for_gbk` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(45) DEFAULT NULL COMMENT '中文'
) ENGINE=InnoDB DEFAULT CHARSET=utf8

# session character set: UTF-8
# Description: Chinese character is invisible.

echo "set names utf8; show create table sakila.sakila_for_gbk\G" | /usr/local/mysql5.7.26/bin/mysql -uroot -ps3cret -S /data1/mysql/19099/mysql.sock
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
       Table: sakila_for_gbk
Create Table: CREATE TABLE `sakila_for_gbk` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(45) DEFAULT NULL COMMENT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8

/usr/local/mysql5.7.26/bin/mysql -uroot -ps3cret -S /data1/mysql/19099/mysql.sock

mysql> set global innodb_fast_shutdown = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> shutdown;

setsid /usr/local/mysql8.0.23/bin/mysqld_safe --defaults-file=/data1/mysql/19099/my.cnf &

2021-09-20T18:03:25.590496+08:00 0 [System] [MY-010116] [Server] /usr/local/mysql8.0.23/bin/mysqld (mysqld 8.0.23) starting as process 15992
2021-09-20T18:03:25.593611+08:00 0 [Warning] [MY-013242] [Server] --character-set-server: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.
2021-09-20T18:03:25.606984+08:00 1 [System] [MY-011012] [Server] Starting upgrade of data directory.
2021-09-20T18:03:25.607111+08:00 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-09-20T18:03:38.817488+08:00 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-09-20T18:03:39.049710+08:00 2 [Warning] [MY-010772] [Server] db.opt file not found for binlog database. Using default Character set.
2021-09-20T18:03:39.159932+08:00 2 [ERROR] [MY-013140] [Server] Invalid utf8 character string: 'D6D0CE'
2021-09-20T18:03:39.269023+08:00 2 [Warning] [MY-010772] [Server] db.opt file not found for tmpdir database. Using default Character set.
2021-09-20T18:03:39.493481+08:00 0 [ERROR] [MY-010022] [Server] Failed to Populate DD tables.
2021-09-20T18:03:39.495263+08:00 0 [ERROR] [MY-010119] [Server] Aborting
2021-09-20T18:03:40.920618+08:00 0 [System] [MY-010910] [Server] /usr/local/mysql8.0.23/bin/mysqld: Shutdown complete (mysqld 8.0.23)  MySQL Community Server - GPL.
[21 Sep 13:11] MySQL Verification Team
I had requested my colleague Dawei Tang to check this out since it requires terminal with(session character set: GB18130) but he confirmed to me that he's not seeing any issues with exact steps provided earlier. I tried with setting user specific locale(export LANG=zh_CN.GB18130) but still not seeing the issue.

May I request you to please provide exact conf file which is in use(you may mark it as private after posting here) and may I also request you to try upgrading to 8.0.26 instead of 8.0.23?  Thank you!
[22 Sep 3:09] Xiyan Xu
Linux session:
[root@s3cret ~]# echo $LANG
C

Switch the character set by SecureCRT as follow:
session options -> terminal -> appearance -> character coding

It is obvious that the mysql configuration file is not critical about this problem.
[22 Sep 6:16] MySQL Verification Team
Thank you for the feedback.

regards,
Umesh