Bug #104063 wrong results returned from server depending on client session settings
Submitted: 20 Jun 2021 18:06 Modified: 2 Jul 2021 17:37
Reporter: Mikhail Izioumtchenko Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.25 OS:Any
Assigned to: CPU Architecture:Any

[20 Jun 2021 18:06] Mikhail Izioumtchenko
Description:
Results of a SELECT query returned from server depend on character_set_client value.

How to repeat:
see attached files
[20 Jun 2021 18:06] Mikhail Izioumtchenko
test case

Attachment: btest.sql (application/octet-stream, text), 799 bytes.

[20 Jun 2021 18:07] Mikhail Izioumtchenko
test case output

Attachment: btest.out (application/octet-stream, text), 4.33 KiB.

[20 Jun 2021 18:16] Mikhail Izioumtchenko
pasting btest.out  with my comments starting with ###

--------------
SELECT @@version
--------------

+-----------+
| @@version |
+-----------+
| 8.0.25    |
+-----------+
1 row in set (0.00 sec)
### also seen with 8.0.21

--------------
SHOW VARIABLES LIKE 'character_set\_%'
--------------

+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_filesystem | binary  |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8mb3 |
+--------------------------+---------+
7 rows in set (0.00 sec)

--------------
DROP TABLE IF EXISTS `btest`
--------------

Query OK, 0 rows affected (0.02 sec)

--------------
CREATE TABLE `btest` (
  `id` int unsigned NOT NULL PRIMARY KEY,
  `vcol` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--------------

Query OK, 0 rows affected (0.04 sec)

--------------
INSERT INTO btest (id, vcol) VALUES ('4231', 'Go2Zahadum') ON DUPLICATE KEY UPDATE id=id
--------------

Query OK, 1 row affected (0.01 sec)

--------------
INSERT INTO btest (id, vcol) VALUES (987, 'Go2Zahadum💕🌸now')
--------------

Query OK, 1 row affected (0.01 sec)

--------------
SELECT * FROM btest
--------------

+------+-----------------------+
| id   | vcol                  |
+------+-----------------------+
|  987 | Go2Zahadum💕🌸now         |
| 4231 | Go2Zahadum            |
+------+-----------------------+
2 rows in set (0.00 sec)

--------------
SELECT * FROM btest WHERE vcol LIKE 'Go2Zahadum💕🌸%'
--------------

+-----+-----------------------+
| id  | vcol                  |
+-----+-----------------------+
| 987 | Go2Zahadum💕🌸now         |
+-----+-----------------------+
1 row in set (0.00 sec)

--------------
SHOW WARNINGS
--------------

Empty set (0.00 sec)

### so far so good but when I change character_set_client strange thing happen imho
--------------
SET SESSION character_set_client='utf8mb3'
--------------

Query OK, 0 rows affected, 1 warning (0.00 sec)

--------------
SHOW WARNINGS
--------------

+---------+------+---------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                     |
+---------+------+---------------------------------------------------------------------------------------------+
| Warning | 1287 | 'utf8mb3' is deprecated and will be removed in a future release. Please use utf8mb4 instead |
+---------+------+---------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

--------------
SHOW VARIABLES LIKE 'character_set\_%'
--------------

+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb3 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_filesystem | binary  |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8mb3 |
+--------------------------+---------+
7 rows in set (0.00 sec)

--------------
SELECT * FROM btest WHERE vcol LIKE 'Go2Zahadum💕🌸%'
--------------
### used to be one row (correct!) previously, now it is empty (wrong!).
Empty set, 1 warning (0.00 sec)

--------------
SHOW WARNINGS
--------------

+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1300 | Cannot convert string 'Go2Zah...' from utf8 to utf8mb4 |
+---------+------+--------------------------------------------------------+
1 row in set (0.01 sec)
### how is this ever a warning? It seems to mean that 'we do not know what the client is trying to request'. I think this should be a hard error.
### more so I could not find a way to make this an error e.g. by changing the value of SQL_MODE.

--------------
SELECT * FROM btest
--------------

+------+-----------------------+
| id   | vcol                  |
+------+-----------------------+
|  987 | Go2Zahadum💕🌸now         |
| 4231 | Go2Zahadum            |
+------+-----------------------+
2 rows in set (0.00 sec)

--------------
SET NAMES utf8
--------------

Query OK, 0 rows affected, 1 warning (0.00 sec)

--------------
SELECT * FROM btest WHERE vcol LIKE 'Go2Zahadum💕🌸%'
--------------

ERROR 1267 (HY000) at line 20 in file: 'btest.sql': Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'
### this is what triggered this investigation. With SET NAMES utf8 and the data in utf8mb4 an error is returned which is OK.
Bye
[20 Jun 2021 18:18] Mikhail Izioumtchenko
So I have two questions:
1. is there any way to make the warning in question, an error
2. different results with different client settings. This really seems wrong though I may of course be mistaken.
[21 Jun 2021 8:06] MySQL Verification Team
Hello Mikhail,

Thank you for the report and test case.

regards,
Umesh
[24 Jun 2021 10:57] Bernt Marius Johnsen
This is not a bug. The emojis 💕 and 🌸 are outside BMP and therefore not possible to represent in utf8mb3 (see https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html). Hence the warning. Because of this, you can not expect the same results of the query. It as a kind of "you get what you asked for" situation and is expected behaviour:

mysql> select '💕' as heart;
+-------+
| heart |
+-------+
| 💕      |
+-------+
1 row in set (0,00 sec)

mysql> SET SESSION character_set_client='utf8mb3';
Query OK, 0 rows affected, 1 warning (0,00 sec)

mysql> select '💕' as heart;
+-------+
| heart |
+-------+
| ????  |
+-------+
1 row in set, 1 warning (0,00 sec)

(The hearts emoji is converted to 4 question marks, since the byte stream from the client contains a 4-byte UTF-8 encoding of the character, but that sequence does not make any sense in utf8mb3).
[2 Jul 2021 17:32] Mikhail Izioumtchenko
Hi Bernt,

I understand your GIGI approach, the client has to be exactly correct
to hope the server try to do something useful.
However there was a question in my submission as to would it be possible
to receive an error instead of warning to avoid the server returning 
arbitrary stuff because those emojies are not in BMP.
Thought the strict mode would include that but apparently MySQL still works in its relaxed mode where aside of emojies it is possible to receive just a warning when not mentioning a NON NULL column without a DEFAULT value in an INSERT. 
I was really hopping that there was something similar to NO_ZERO_DATE for trying to use emojies with UTF3
[2 Jul 2021 17:37] Mikhail Izioumtchenko
the real status would be 'Additional information required' or maybe
I should change it to feature request for an SQL mode to reject inappropriate characters in queries.