| Bug #97569 | Connector is inserting Japanese as gibberish when using utf8 charset | ||
|---|---|---|---|
| Submitted: | 10 Nov 2019 7:00 | Modified: | 11 Nov 2019 5:49 |
| Reporter: | Chris M | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | Connector / C | Severity: | S3 (Non-critical) |
| Version: | 8.0.18 | OS: | Ubuntu (16.04) |
| Assigned to: | MySQL Verification Team | CPU Architecture: | x86 |
| Tags: | jibberish, truncate, utf8, utf8mb4 | ||
[10 Nov 2019 14:17]
MySQL Verification Team
Thank you for the bug report. Please provide the complete test case (the C client file attaching it with Files tab). Thanks.
[10 Nov 2019 21:05]
Chris M
Example program - please read the comments at the top
Attachment: bugtest.c (application/octet-stream, text), 2.82 KiB.
[10 Nov 2019 21:05]
Chris M
textfile that the test program uses, contains english and japanese text
Attachment: textfile (application/octet-stream, text), 35 bytes.
[11 Nov 2019 5:49]
Ryusuke Kajiyama
Attached textfile is encoded in Shift JIS, not UTF-8. Using MySQL charset cp932 is suggested. Or, convert text data into UTF-8 and continue to use utf8mb4 charset in MySQL.

Description: Client application using charset utf8 no longer inserts Japanese but just question marks / jibberish appear stored in column, and setting charset to utf8mb4 simply truncates the Japanese text. I can make it insert Japanese properly by changing charset to cp932, however this should be working with utf8 as I know for certain it did before I upgraded to MySQL 8. My DB settings: All table columns in question have character set to utf8mb4 and collation to utf8mb4_0900_ai_ci. +-----------------------------------------------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql-8.0/charsets/ | +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | | collation_database | utf8mb4_0900_ai_ci | | collation_server | utf8mb4_0900_ai_ci | +----------------------+--------------------+ my.cnf: [mysqld] ft_min_word_len=2 sql_mode = "NO_BACKSLASH_ESCAPES" default-authentication-plugin=mysql_native_password innodb_buffer_pool_size=1G skip-character-set-client-handshake character-set-server = utf8mb4 collation-server = utf8mb4_0900_ai_ci [mysql] default-character-set = utf8mb4 [client] default-character-set = utf8mb4 How to repeat: Using the c api (mysql.h): mysql_query(con, "SET CHARSET utf8;") mysql_query(con, "INSERT INTO table_name (column_name) VALUES ('japanese text')); open mysql and view the table contents... Also if you use "SET CHARSET utf8mb4;" instead of utf8, the japanese text is truncated but English will remain.