Bug #120406 Inconsistent GBK-to-utf8mb4 conversion between implicit INSERT ... SELECT assignment and explicit CONVERT()
Submitted: 6 May 4:46 Modified: 6 May 9:20
Reporter: Chen Zhongpu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any

[6 May 4:46] Chen Zhongpu
Description:
This is related to https://bugs.mysql.com/bug.php?id=120384.

The issue is not simply that SELECT and INSERT behave differently.

The surprising behavior is that an implicit assignment conversion from a GBK column to a utf8mb4 column fails, while an explicit CONVERT(s USING utf8mb4) on the same source value succeeds without warnings and can be inserted into the same utf8mb4 column.

This suggests inconsistent error handling between the implicit field-assignment conversion path and the explicit expression-level character-set conversion path.

How to repeat:
```sql
CREATE TABLE g (
  id INT PRIMARY KEY,
  s VARCHAR(20) CHARACTER SET gbk
);

CREATE TABLE u (
  id INT PRIMARY KEY,
  s VARCHAR(20) CHARACTER SET utf8mb4
);

-- OK, without warning
INSERT INTO g VALUES(1, 0xA102); 
```

```
-- ERROR 1366
INSERT INTO u SELECT id, s FROM g; 
```

```
-- OK, without warning
 INSERT INTO u SELECT id, CONVERT(s USING utf8mb4) FROM g;
```

Suggested fix:
The implicit assignment conversion and the explicit CONVERT(... USING utf8mb4) conversion should either produce consistent results, or the difference should be documented clearly.

At least, the divergences should be well-documented.
[6 May 7:31] Roy Lyseng
When I run this set of queries, the first INSERT gives we error 1366: Incorrect string value. Thus, this seems not reproducible for me.
[6 May 8:00] Chen Zhongpu
Sorry for my typos. The SQL statements to reproduce the issue are:

```sql
CREATE TABLE g (
  id INT PRIMARY KEY,
  s VARCHAR(20) CHARACTER SET gbk
);

CREATE TABLE u (
  id INT PRIMARY KEY,
  s VARCHAR(20) CHARACTER SET utf8mb4
);

-- OK, without warning
INSERT INTO g VALUES(1, 0xA140);
```

Next:

```sql
-- ERROR 1366
INSERT INTO u SELECT id, s FROM g;
```

```sql
-- OK, without warning
 INSERT INTO u SELECT id, CONVERT(s USING utf8mb4) FROM g;
```
[6 May 9:01] Roy Lyseng
Thank you for the revised test script.

However, this is not a bug. The statement without explicit CONVERT is more strict than CONVERT, which converts incompatible characters into question marks.

I am thus making this a documentation issue, so that we can document more clearly that implicit character conversion enforces strict error checking, whereas CONVERT (and CAST) converts incompatible characters silently.
[6 May 9:20] Chen Zhongpu
Thanks for your explanation.

> implicit character conversion enforces strict error checking, whereas CONVERT (and CAST) converts incompatible characters silently.

Another concern is that: since those bytes in GBK can NEVER be mapped to valid Unicode code points, is it reasonable to accept it (i.e., `INSERT INTO g VALUES(1, 0xA140)`) without any warning?
[6 May 9:43] Roy Lyseng
Good question...

But this has been like this "forever", and people are probably used to the current behavior. One might consider implementing a directive on how to handle impossible conversions, like specifying a substitution character, or giving an error.