Bug #114769 | Character set changing unexpectedly | ||
---|---|---|---|
Submitted: | 24 Apr 2024 22:52 | Modified: | 11 Jun 2024 15:34 |
Reporter: | Kevin Bull | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[24 Apr 2024 22:52]
Kevin Bull
[24 Apr 2024 23:05]
Kevin Bull
I forgot to mention that I can try to fix the meta_universalid id column but then a 'DESC cccc' will show the userid column has changed when I didn't touch that. mysql> desc cccc; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(uuid(),_utf8mb3\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.00 sec) mysql> ALTER TABLE cccc -> MODIFY COLUMN meta_universalid VARCHAR(36) NOT NULL DEFAULT (replace(uuid(),_utf8mb4'-',_utf8mb4'')); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> desc cccc; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(uuid(),_utf8mb3\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.00 sec) Then if I attempt to fix the userid column the meta_universalid column will lose the first change. mysql> ALTER TABLE cccc -> MODIFY COLUMN userid VARCHAR(100) NOT NULL DEFAULT (concat(_utf8mb4'X',convert(replace(uuid(),_utf8mb4'-',_utf8mb4'') using utf8mb4))); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> DESC cccc; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(uuid(),_utf8mb3\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.00 sec)
[25 Apr 2024 10:30]
MySQL Verification Team
Hi Mr. Bull, Thank you for your bug report. However, it is not a bug. You are using version 8.0. In the version 8.0 utfmb3 is deprecated and replaced with utfmb4. Hence, the output from SHOW CREATE TABLE is very correct. Not a bug.
[25 Apr 2024 16:10]
Kevin Bull
Thank you for the quick reply but your response confuses me. The original create table and the subsequent modify column statements explicitly use utf8mb4 but the 'show create table' and the 'DESCRIBE table' commands return utf8mb3. I don't send any commands using utf8mb3, but the server response includes utf8mb3. I would understand your response if it was the other way around or maybe if no charset was used at all. But since a char set is used I wouldn't expect it to change. Ultimately I am confused as to why I don't see the same charset returned by the server? Thanks for helping me understand why this is not a bug.
[26 Apr 2024 10:05]
MySQL Verification Team
Hi, The explanation is quite simple. All functions, like REPLACE have to be printed as they were defined ..... We hope that it helps ....
[26 Apr 2024 16:16]
Kevin Bull
But that is exactly what is not happening. When the table is created the replace function has utf8mb4 specified for both columns like this: mysql> CREATE TABLE test.cccc ( -> meta_universalid varchar(36) NOT NULL DEFAULT (REPLACE(UUID(), _utf8mb4'-', _utf8mb4'')), -> userid varchar(100) NOT NULL DEFAULT (CONCAT(_utf8mb4'X', CONVERT(REPLACE(UUID(), _utf8mb4'-', _utf8mb4'') USING utf8mb4))), -> PRIMARY KEY (meta_universalid) -> ) -> ENGINE = INNODB, -> AVG_ROW_LENGTH = 1260, -> CHARACTER SET utf8mb4, -> COLLATE utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.02 sec) However, if I ask the server to describe the table it shows the replace function with utf8mb3 for meta_universalid, notice the userid column shows replace using utf8mb4 correctly: mysql> DESC cccc; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(uuid(),_utf8mb3\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.02 sec) OK fine, I want to fix the meta_universalid column so I issue an alter table modify column command: mysql> ALTER TABLE cccc -> MODIFY COLUMN meta_universalid VARCHAR(36) NOT NULL DEFAULT (replace(uuid(),_utf8mb4'-',_utf8mb4'')); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 Then I ask the server to describe the table again and this time the meta_universalid column looks correct, replace is showing utf8mb4, but notice the userid column now shows replace using utf8mb3. The output is different from the last describe output even though I did nothing with the userid column: mysql> desc cccc; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(uuid(),_utf8mb3\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.00 sec) This is not the behavior I expected but maybe I am misunderstanding something? When a table is created using utf8mb4 in the replace function like this: mysql> CREATE TABLE test.dddd ( -> meta_universalid varchar(36) NOT NULL DEFAULT (REPLACE(UUID(), _utf8mb4'-', _utf8mb4'')), -> userid varchar(100) NOT NULL DEFAULT (CONCAT(_utf8mb4'X', CONVERT(REPLACE(UUID(), _utf8mb4'-', _utf8mb4'') USING utf8mb4))), -> PRIMARY KEY (meta_universalid) -> ) -> ENGINE = INNODB, -> AVG_ROW_LENGTH = 1260, -> CHARACTER SET utf8mb4, -> COLLATE utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.03 sec) A 'describe dddd' of that newly created table looks perfect: mysql> DESCRIBE dddd; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(uuid(),_utf8mb4\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.00 sec) But a 'show create table dddd' does shows utf8mb3 for both columns which is different from what describe said and different from what the original create table statement specified: mysql> show create table dddd; +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | dddd | CREATE TABLE `dddd` ( `meta_universalid` varchar(36) NOT NULL DEFAULT (replace(uuid(),_utf8mb3'-',_utf8mb4'')), `userid` varchar(100) NOT NULL DEFAULT (concat(_utf8mb4'X',convert(replace(uuid(),_utf8mb3'-',_utf8mb4'') using utf8mb4))), PRIMARY KEY (`meta_universalid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci AVG_ROW_LENGTH=1260 | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) The two statements show the default value for the columns differently. Is this expected behavior? Once the table has been altered the describe statement begins its bizarre behavior as shown above.
[29 Apr 2024 10:10]
MySQL Verification Team
HI, We repeated your behaviour: Table Create Table cccc CREATE TABLE `cccc` (\n `meta_universalid` varchar(36) NOT NULL DEFAULT (replace(uuid(),_utf8mb3'-',_utf8mb4'')),\n `userid` varchar(100) NOT NULL DEFAULT (concat(_utf8mb4'X',convert(replace(uuid(),_utf8mb3'-',_utf8mb4'') using utf8mb4))),\n PRIMARY KEY (`meta_universalid`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci AVG_ROW_LENGTH=1260 Alghouth this is an insignificant bug, it is still a bug. Verified for versions 8.0 and higher.
[2 May 2024 14:18]
Roy Lyseng
There is a problem here - but perhaps not what you think it is... In newer versions of MySQL, we may alter literal string values into the character set of the operation it is used inside. We do this early, so that we see that the literal value is actually compatible with the operation, and so that no conversion is required at runtime. For the REPLACE function, the character set is determined by the first argument, thus the second and third arguments are converted to the character set of the first argument, since they are literal values. The first argument is a UUID() function call, which returns a character string in the utf8mb3 character set. Hence, the second and third arguments are also converted to this character set. But here is the bug: Only the second argument is actually converted to utf8mb3. The third argument is kept in the original character set. You don't have to worry about the function returning a utf8mb3 character string. It is automatically converted to utf8mb4 when the DEFAULT value is inserted. However, the SHOW CREATE TABLE report will look better if you wrap the UUID() function call in a CAST, e.g CAST(UUID() AS CHAR CHARACTER SET utf8mb4).
[2 May 2024 15:32]
Kevin Bull
Thank you Roy! The description you provided seems to fit what I am seeing. I think show create table is acting as you describe but describe table is NOT converting the second or third parameter, it appears to keep the original charset. Consider creating the table using the cast as you suggested but removing the explicit charset from the second and third parameters to REPLACE: mysql> CREATE TABLE test.eeee ( -> meta_universalid varchar(36) NOT NULL DEFAULT (REPLACE(CAST(UUID() AS CHAR CHARACTER SET utf8mb4), '-', '')), -> userid varchar(100) NOT NULL DEFAULT (CONCAT(_utf8mb4'X', CONVERT(REPLACE(CAST(UUID() AS CHAR CHARACTER SET utf8mb4), '-', '') USING utf8mb4))), -> PRIMARY KEY (meta_universalid) -> ) -> ENGINE = INNODB, -> AVG_ROW_LENGTH = 1260, -> CHARACTER SET utf8mb4, -> COLLATE utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.05 sec) The show create table matches your description with regard to the REPLACE parameters. The second parameter gets the charset of the first parameter but the third parameter does not get converted: mysql> show create table eeee\G; *************************** 1. row *************************** Table: eeee Create Table: CREATE TABLE `eeee` ( `meta_universalid` varchar(36) NOT NULL DEFAULT (replace(cast(uuid() as char charset utf8mb4),_utf8mb4'-',_latin1'')), `userid` varchar(100) NOT NULL DEFAULT (concat(_utf8mb4'X',convert(replace(cast(uuid() as char charset utf8mb4),_utf8mb4'-',_latin1'') using utf8mb4))), PRIMARY KEY (`meta_universalid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci AVG_ROW_LENGTH=1260 1 row in set (0.01 sec) But check out what describe table shows: mysql> desc eeee; +------------------+--------------+------+-----+---------------------------------------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+---------------------------------------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(cast(uuid() as char charset utf8mb4),_latin1\'-\',_latin1\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(cast(uuid() as char charset utf8mb4),_latin1\'-\',_latin1\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+---------------------------------------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.01 sec) It doesn't convert the second or third parameters to the REPLACE function. Now if I use your suggestion of CAST combined with the original create where it specifies the charset for the second and third parameters everything seems to look fine for both show create table and describe: mysql> CREATE TABLE test.ffff ( -> meta_universalid varchar(36) NOT NULL DEFAULT (REPLACE(CAST(UUID() AS CHAR CHARACTER SET utf8mb4), _utf8mb4'-', _utf8mb4'')), -> userid varchar(100) NOT NULL DEFAULT (CONCAT(_utf8mb4'X', CONVERT(REPLACE(CAST(UUID() AS CHAR CHARACTER SET utf8mb4), _utf8mb4'-', _utf8mb4'') USING utf8mb4))), -> PRIMARY KEY (meta_universalid) -> ) -> ENGINE = INNODB, -> AVG_ROW_LENGTH = 1260, -> CHARACTER SET utf8mb4, -> COLLATE utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.02 sec) mysql> show create table ffff\G; *************************** 1. row *************************** Table: ffff Create Table: CREATE TABLE `ffff` ( `meta_universalid` varchar(36) NOT NULL DEFAULT (replace(cast(uuid() as char charset utf8mb4),_utf8mb4'-',_utf8mb4'')), `userid` varchar(100) NOT NULL DEFAULT (concat(_utf8mb4'X',convert(replace(cast(uuid() as char charset utf8mb4),_utf8mb4'-',_utf8mb4'') using utf8mb4))), PRIMARY KEY (`meta_universalid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci AVG_ROW_LENGTH=1260 1 row in set (0.00 sec) mysql> desc ffff; +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------------------------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------------------------------------+-------------------+ | meta_universalid | varchar(36) | NO | PRI | replace(cast(uuid() as char charset utf8mb4),_utf8mb4\'-\',_utf8mb4\'\') | DEFAULT_GENERATED | | userid | varchar(100) | NO | | concat(_utf8mb4\'X\',convert(replace(cast(uuid() as char charset utf8mb4),_utf8mb4\'-\',_utf8mb4\'\') using utf8mb4)) | DEFAULT_GENERATED | +------------------+--------------+------+-----+-----------------------------------------------------------------------------------------------------------------------+-------------------+ 2 rows in set (0.00 sec) Based on these results, it appears the describe command does not convert the second or third parameter. All that being said it sounds like you are onto the real bug I just wanted to point out the seemingly related issue. Thank you for taking the time to explain what is happening. Hopefully, it is an easy fix. Cheers!
[11 Jun 2024 15:34]
Jon Stephens
Documented fix as follows in the MySQL 9.0.0 changelog: When resolving a call to the REPLACE() function, the character set and collation of the function result are copied from the first argument. The remaining two arguments, if they are literal values, should be converted to this character set, but only the second argument was converted. This fix ensures that the third argument is also converted to the first argument's character set and collation. Closed.
[12 Jun 2024 10:05]
MySQL Verification Team
Thank you, Jon.