From ebd691888795311404527e0fc2e4d5cadc1a3318 Mon Sep 17 00:00:00 2001 From: Daniel Lenski Date: Fri, 23 Feb 2024 12:02:15 -0800 Subject: [PATCH] Fix handling of table/schema names containing non-ASCII characters in get_schema_inconsistency_check() for MySQL 5.7 In https://github.com/mysql/mysql-shell/commit/db79350cdbc09d7efe356de96c465e0cf8aa8e3f, Konrad Olesinski modified `get_schema_inconsistency_check` to "remove[d] false positives related to full Unicode range." It appears that the intention of this change was to ignore table names containing non-ASCII characters, since their names are represented differently in `information_schema.tables` and in `information_schema.innodb_sys_tables`. In the latter, disallowed characters are replaced with '@' plus 4 hexadecimal digits representing the character's Unicode codepoint. However, the regular expression used for this (`@[0-9]`) was incorrect and incomplete: it will *only* exclude Unicode codepoints 0x0000-0x9ffff, and *won't* exclude Unicode codepoints 0xa000-0xffff. This commit fixes the mistake in the regular expression. In the longer term, tables whose names contain non-ASCII characters should *not* be excluded from schema inconsistency checks. This contribution is under the OCA signed by Amazon and covering submissions to the MySQL project. --- modules/util/upgrade_check.cc | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/modules/util/upgrade_check.cc b/modules/util/upgrade_check.cc index 587b304eb..01aad03df 100644 --- a/modules/util/upgrade_check.cc +++ b/modules/util/upgrade_check.cc @@ -1452,8 +1452,16 @@ Sql_upgrade_check::get_schema_inconsistency_check() { "information_schema.innodb_sys_tables where NAME like '%/%') A left " "join information_schema.tables I on A.table_name = I.table_name and " "A.schema_name = I.table_schema where A.table_name not like 'FTS_0%' " - "and (I.table_name IS NULL or I.table_schema IS NULL) and A.table_name " - "not REGEXP '@[0-9]' and A.schema_name not REGEXP '@[0-9]';"}, + "and (I.table_name IS NULL or I.table_schema IS NULL) " + // When a table's name contains non-ASCII characters, each non-ASCII character + // is replaced by '@' followed by exactly 4 hexadecimal digits representing + // its Unicode codepoint. + // + // It is difficult to write a SQL query to remap all such occurrences + // in a correct way to match their rows in I_S.TABLES and I_S.INNODB_SYS_TABLES, + // so for now any tables or schemas containing @-escapes (other than the ones + // specifically handed by replace_in_SQL) are simply ignored by this query. + "and A.table_name not REGEXP '@[0-9a-f]{4}' and A.schema_name not REGEXP '@[0-9a-f]{4}';"}, Upgrade_issue::ERROR, "Following tables show signs that either table datadir directory or frm " "file was removed/corrupted. Please check server logs, examine datadir "