Bug #111268 Different character_set_connection returns inconsistent results
Submitted: 4 Jun 2023 17:21 Modified: 5 Jun 2023 8:47
Reporter: Xuefeng Zhang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.0.25, 8.0.33 OS:Any
Assigned to: CPU Architecture:Any
Tags: Inconsistent Results

[4 Jun 2023 17:21] Xuefeng Zhang
Description:
The difference in the configuration item character_set_connection appears to cause an inconsistency in the result sets.

When we set character_set_connection to utf8mb4, collation_connection will be utf8mb4_0900_ai_ci, the pad_attribute is NO PAD and 'abc' is not equivalent to 'abc   ' at this time;
But when we set character_set_connection to utf8mb3, collation_connection will be utf8_general_ci, the pad_attribute is PAD SPACE and 'abc' is equivalent to 'abc   ' at this time.

The pad_attribute of PAD SPACE can ignore the query value of the space, NO PAD will not ignore.

How to repeat:
MySQL8025

show variables like 'character_set_%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8mb4                          |
| character_set_connection | utf8mb4                          |
| character_set_database   | utf8mb4                          |
| character_set_filesystem | binary                           |
| character_set_results    | utf8mb4                          |
| character_set_server     | utf8mb4                          |
| character_set_system     | utf8mb3                          |

show variables like '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci    |
| collation_database            | utf8mb4_bin        |
| collation_server              | utf8mb4_bin        |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+

set character_set_connection = utf8mb3;

show variables like 'character_set_%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8mb4                          |
| character_set_connection | utf8mb3                          |
| character_set_database   | utf8mb4                          |
| character_set_filesystem | binary                           |
| character_set_results    | utf8mb4                          |
| character_set_server     | utf8mb4                          |
| character_set_system     | utf8mb3                          |

show variables like '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8_general_ci    |
| collation_database            | utf8mb4_bin        |
| collation_server              | utf8mb4_bin        |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+

show create table t;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `id` int DEFAULT NULL,
  `c1` varchar(10) COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |

insert into t values (1,'abc');
insert into t values (2,'abc ');

select * from t;
+------+------+
| id   | c1   |
+------+------+
|    1 | abc  |
|    2 | abc  |
+------+------+

mysql> select 'abc'='abc ';
+--------------+
| 'abc'='abc ' |
+--------------+
|            1 |
+--------------+

mysql> select * from t where c1 = 'abc' and c1 = 'abc ';
+------+------+
| id   | c1   |
+------+------+
|    1 | abc  |
|    2 | abc  |
+------+------+

mysql>
mysql> set character_set_connection = utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> select 'abc'='abc ';
+--------------+
| 'abc'='abc ' |
+--------------+
|            0 |
+--------------+
1 row in set (0.00 sec)

mysql> select * from t where c1 = 'abc' and c1 = 'abc ';
Empty set (0.00 sec)

-- In the function remove_eq_conds, the optimizer optimizes the condition to be constant false in above case.

​

Suggested fix:
Avoid the corresponding sql is a temporarily good solution, but if we can manually set the current pad_atrribute, we can also avoid the problem of the result set inconsistency caused by changing the character set.
[5 Jun 2023 8:47] MySQL Verification Team
Hello Xuefeng Zhang,

Thank you for the report and test case.

regards,
Umesh
[6 Jun 2023 11:00] Bernt Marius Johnsen
I am only able to reproduce the erroneous behavior when the column collates
as utf8mb4_bin:

create table t1 (i integer, v varchar(10) collate utf8mb4_bin);
insert into t1 values (1, 'abc'), (2, 'abc ');
set character_set_connection = utf8mb3;
select * from t1 where v = 'abc' and v = 'abc ';
set character_set_connection = utf8mb4;
select * from t1 where v = 'abc' and v = 'abc ';

The other collations I have tried give consistent behavior and seems to be
independent of collation_connection. 2 rows for PAD SPACE collations and zero rows for NO PAD collations.

=========================================================================

Note that the following is correct behavior. The reason is that there is
no context to determine the collation of the literals, so MySQL will
use the collation_connection when the literals are compared. 

utf8mb3 implies utf8mb3_general_ci which is PAD SPACE while 
utf8mb4 implies utf8mb4_0900_ai_ci which is NO PAD.

mysql> set character_set_connection = utf8mb3;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select 'abc' = 'abc ';
+----------------+
| 'abc' = 'abc ' |
+----------------+
|              1 |
+----------------+
1 row in set (0.00 sec)

mysql> set character_set_connection = utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> select 'abc' = 'abc ';
+----------------+
| 'abc' = 'abc ' |
+----------------+
|              0 |
+----------------+
1 row in set (0.00 sec)