Bug #97569 Connector is inserting Japanese as gibberish when using utf8 charset
Submitted: 10 Nov 2019 7:00 Modified: 11 Nov 2019 5:49
Reporter: Chris M Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / C Severity:S3 (Non-critical)
Version:8.0.18 OS:Ubuntu (16.04)
Assigned to: Ashwini Patil CPU Architecture:x86
Tags: jibberish, truncate, utf8, utf8mb4

[10 Nov 2019 7:00] Chris M
Description:
Client application using charset utf8 no longer inserts Japanese but just question marks / jibberish appear stored in column, and setting charset to utf8mb4 simply truncates the Japanese text. I can make it insert Japanese properly by changing charset to cp932, however this should be working with utf8 as I know for certain it did before I upgraded to MySQL 8. 

My DB settings:

All table columns in question have character set to utf8mb4 and collation to utf8mb4_0900_ai_ci. 

+-----------------------------------------------------------+
| character_set_client     | utf8mb4                        |
| character_set_connection | utf8mb4                        |
| character_set_database   | utf8mb4                        |
| character_set_filesystem | binary                         |
| character_set_results    | utf8mb4                        |
| character_set_server     | utf8mb4                        |
| character_set_system     | utf8                           |
| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |

+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database   | utf8mb4_0900_ai_ci |
| collation_server     | utf8mb4_0900_ai_ci |
+----------------------+--------------------+

my.cnf:

[mysqld]
ft_min_word_len=2
sql_mode = "NO_BACKSLASH_ESCAPES"
default-authentication-plugin=mysql_native_password
innodb_buffer_pool_size=1G
skip-character-set-client-handshake
character-set-server = utf8mb4
collation-server = utf8mb4_0900_ai_ci

[mysql]
default-character-set = utf8mb4

[client]
default-character-set = utf8mb4

How to repeat:
Using the c api (mysql.h):

mysql_query(con, "SET CHARSET utf8;")

mysql_query(con, "INSERT INTO table_name (column_name) VALUES ('japanese text'));

open mysql and view the table contents... Also if you use "SET CHARSET utf8mb4;" instead of utf8, the japanese text is truncated but English will remain.
[10 Nov 2019 14:17] Miguel Solorzano
Thank you for the bug report. Please provide the complete test case (the C client file attaching it with Files tab). Thanks.
[10 Nov 2019 21:05] Chris M
Example program - please read the comments at the top

Attachment: bugtest.c (application/octet-stream, text), 2.82 KiB.

[10 Nov 2019 21:05] Chris M
textfile that the test program uses, contains english and japanese text

Attachment: textfile (application/octet-stream, text), 35 bytes.

[11 Nov 2019 5:49] Ryusuke Kajiyama
Attached textfile is encoded in Shift JIS, not UTF-8. Using MySQL charset cp932 is suggested.
Or, convert text data into UTF-8 and continue to use utf8mb4 charset in MySQL.