Bug #113506 Item_func_conv_charset::eq not correct
Submitted: 25 Dec 2023 2:26 Modified: 11 Jun 2024 16:55
Reporter: Xinhao Zhao Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:8.0,8.0.35, 8.1.0, 8.2.0 OS:Any
Assigned to: CPU Architecture:Any

[25 Dec 2023 2:26] Xinhao Zhao
Description:
Function Item_func_conv_charset::eq is not implemented and may cause wrong result. 

How to repeat:
mysql> create table t1 (a char(10));
Query OK, 0 rows affected (0.21 sec)

mysql> insert into t1 values ('你好'), ('中午好');
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select a, convert(a using latin1) from t1 order by convert(a using utf8mb4);
+-----------+-------------------------+
| a         | convert(a using latin1) |
+-----------+-------------------------+
| 你好      | ??                      |
| 中午好    | ???                     |
+-----------+-------------------------+
2 rows in set (0.01 sec)

mysql> select a from t1 order by convert(a using utf8mb4);
+-----------+
| a         |
+-----------+
| 中午好    |
| 你好      |
+-----------+
2 rows in set (0.00 sec)

Suggested fix:
Add comparison to conv_charset

bool Item_func::eq(const Item *item, bool binary_cmp, bool semantic_eq) const {
  /* Assume we don't have rtti */
  if (this == item) return true;
  if (item->type() != FUNC_ITEM) return false;
  const Item_func *item_func = down_cast<const Item_func *>(item);
  if (!compare_funcs_type(this, item_func)) return false;
  if (conv_charset != down_cast<const Item_func_conv_charset *>(item_func)->conv_charset) return false;
  for (uint i = 0; i < arg_count; i++)
    if (!args[i]->eq(item_func->args[i], binary_cmp, semantic_eq)) return false;
  return true;
}
[25 Dec 2023 2:27] Xinhao Zhao
In the first query, the item in group list is wrongly fixed to item #2 in select list
[25 Dec 2023 9:57] MySQL Verification Team
Hello Xinhao Zhao,

Thank you for the report and testcase.

regards,
Umesh
[27 Dec 2023 11:58] Tor Didriksen
Posted by developer:
 
in 5.7, which is also affected:

mysql> create table t1 (a char(10));
Query OK, 0 rows affected (0,01 sec)

mysql> insert into t1 values ('你好'), ('中午好');
ERROR 1366 (HY000): Incorrect string value: '\xE4\xBD\xA0\xE5\xA5\xBD' for column 'a' at row 1

mysql> create table t1 (a char(10) character set utf8mb4);
Query OK, 0 rows affected (0,01 sec)

mysql> insert into t1 values ('你好'), ('中午好');
Query OK, 2 rows affected (0,00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select a, convert(a using latin1) from t1 order by convert(a using utf8mb4);
+-----------+-------------------------+
| a         | convert(a using latin1) |
+-----------+-------------------------+
| 你好      | ??                      |
| 中午好    | ???                     |
+-----------+-------------------------+
2 rows in set (0,00 sec)

mysql> select a from t1 order by convert(a using utf8mb4);
+-----------+
| a         |
+-----------+
| 中午好    |
| 你好      |
+-----------+
2 rows in set (0,00 sec)
[3 Jan 2024 17:06] huahua xu
Hi All:

The patch would be helpful.

diff --git a/sql/item_strfunc.cc b/sql/item_strfunc.cc
index edd0006..92d52c1 100644
--- a/sql/item_strfunc.cc
+++ b/sql/item_strfunc.cc
@@ -3577,6 +3577,13 @@ String *Item_charset_conversion::val_str(String *str) {
   return res;
 }

+bool Item_charset_conversion::eq(const Item *item, bool binary_cmp) const {
+  if (!Item_func::eq(item, binary_cmp)) return false;
+
+  const Item_charset_conversion *item_func = down_cast<const Item_charset_conversion *>(item);
+  return my_charset_same(m_cast_cs, item_func->m_cast_cs);
+}
+
 uint32 Item_charset_conversion::compute_max_char_length() {
   uint32 new_max_chars;
   Item *from = args[0];
diff --git a/sql/item_strfunc.h b/sql/item_strfunc.h
index afcc289..eb51650 100644
--- a/sql/item_strfunc.h
+++ b/sql/item_strfunc.h
@@ -994,6 +994,8 @@ class Item_charset_conversion : public Item_str_func {
       : Item_str_func(pos, a), m_cast_cs(cs_arg) {}

   String *val_str(String *) override;
+
+  bool eq(const Item *item, bool binary_cmp) const override;
 };
[11 Jun 2024 16:55] Jon Stephens
Documented fix as follows in the MySQL 9.0.0 changelog:

    The internal function Item_func::eq() erroneously treated the
    two expressions CONVERT(a USING latin1) and CONVERT(a USING
    utf8mb4) as being equal.

Closed.