Description:
When creating a table with a DEFAULT for a utf8mb4 column which uses a 4 byte encoded unicode character, the output is corrupted and the character replaced with a ? in SHOW CREATE TABLE.
How to repeat:
CREATE TABLE `encodings` (
`id` bigint NOT NULL AUTO_INCREMENT ,
`a` varchar(100) CHARACTER SET ascii DEFAULT 'foo',
`l` varchar(100) CHARACTER SET latin1 DEFAULT 'føö',
`u` varchar(100) CHARACTER SET utf8mb4 DEFAULT 'føö√🙈',
PRIMARY KEY (`id`)
);
SHOW CREATE TABLE encodings\G
*************************** 1. row ***************************
Table: encodings
Create Table: CREATE TABLE `encodings` (
`id` bigint NOT NULL AUTO_INCREMENT,
`a` varchar(100) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT 'foo',
`l` varchar(100) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'føö',
`u` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT 'føö√?',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
Suggested fix:
It looks like MySQL internally still uses the utf8mb3 encoding to print the DEFAULT value. Unicode characters which encode to 3 bytes work correctly but 4 byte characters like emojis break.
Description: When creating a table with a DEFAULT for a utf8mb4 column which uses a 4 byte encoded unicode character, the output is corrupted and the character replaced with a ? in SHOW CREATE TABLE. How to repeat: CREATE TABLE `encodings` ( `id` bigint NOT NULL AUTO_INCREMENT , `a` varchar(100) CHARACTER SET ascii DEFAULT 'foo', `l` varchar(100) CHARACTER SET latin1 DEFAULT 'føö', `u` varchar(100) CHARACTER SET utf8mb4 DEFAULT 'føö√🙈', PRIMARY KEY (`id`) ); SHOW CREATE TABLE encodings\G *************************** 1. row *************************** Table: encodings Create Table: CREATE TABLE `encodings` ( `id` bigint NOT NULL AUTO_INCREMENT, `a` varchar(100) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT 'foo', `l` varchar(100) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'føö', `u` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT 'føö√?', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) Suggested fix: It looks like MySQL internally still uses the utf8mb3 encoding to print the DEFAULT value. Unicode characters which encode to 3 bytes work correctly but 4 byte characters like emojis break.