| Bug #120406 | Inconsistent GBK-to-utf8mb4 conversion between implicit INSERT ... SELECT assignment and explicit CONVERT() | ||
|---|---|---|---|
| Submitted: | 6 May 4:46 | Modified: | 6 May 9:20 |
| Reporter: | Chen Zhongpu | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
| Version: | OS: | Any | |
| Assigned to: | CPU Architecture: | Any | |
[6 May 4:46]
Chen Zhongpu
[6 May 7:31]
Roy Lyseng
When I run this set of queries, the first INSERT gives we error 1366: Incorrect string value. Thus, this seems not reproducible for me.
[6 May 8:00]
Chen Zhongpu
Sorry for my typos. The SQL statements to reproduce the issue are: ```sql CREATE TABLE g ( id INT PRIMARY KEY, s VARCHAR(20) CHARACTER SET gbk ); CREATE TABLE u ( id INT PRIMARY KEY, s VARCHAR(20) CHARACTER SET utf8mb4 ); -- OK, without warning INSERT INTO g VALUES(1, 0xA140); ``` Next: ```sql -- ERROR 1366 INSERT INTO u SELECT id, s FROM g; ``` ```sql -- OK, without warning INSERT INTO u SELECT id, CONVERT(s USING utf8mb4) FROM g; ```
[6 May 9:01]
Roy Lyseng
Thank you for the revised test script. However, this is not a bug. The statement without explicit CONVERT is more strict than CONVERT, which converts incompatible characters into question marks. I am thus making this a documentation issue, so that we can document more clearly that implicit character conversion enforces strict error checking, whereas CONVERT (and CAST) converts incompatible characters silently.
[6 May 9:20]
Chen Zhongpu
Thanks for your explanation. > implicit character conversion enforces strict error checking, whereas CONVERT (and CAST) converts incompatible characters silently. Another concern is that: since those bytes in GBK can NEVER be mapped to valid Unicode code points, is it reasonable to accept it (i.e., `INSERT INTO g VALUES(1, 0xA140)`) without any warning?
[6 May 9:43]
Roy Lyseng
Good question... But this has been like this "forever", and people are probably used to the current behavior. One might consider implementing a directive on how to handle impossible conversions, like specifying a substitution character, or giving an error.
