Bug #106841 Error comparing Json objects and text utf32
Submitted: 25 Mar 2022 16:55 Modified: 29 Mar 2022 9:13
Reporter: Luis Donoso Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:8.0.28, 5.7.37 OS:Any
Assigned to: CPU Architecture:Any

[25 Mar 2022 16:55] Luis Donoso
Description:
Error comparing Json objects and text utf32

Comparing the result of `json_extract` and text utf32 gives a wrong result:

```
create table t(c text CHARACTER SET utf32);

insert into t values('"Hello"');

select json_extract('{"k1":"Hello"}', '$.k1') = c from t;

> 0
```

But the other way around is correct:

```
select c = json_extract('{"k1":"Hello"}', '$.k1') from t;

> 1
```

The cause might be that the `Json_wrapper` for the text value `c` has lost any information about the original charset, from the function `sql_scalar_to_json`:

```
        // BINARY or similar
        if (create_scalar<Json_opaque>(scalar, &dom, field_type, res->ptr(),
                                       res->length()))
```

Please, mind `Json_opaque` has no information about charset or a conversion to utf8mb4 is done. Later the comparison for json is done expecting it it is binary or utf8mb4, from the function:

```
/**
  Compare the contents of two strings in a JSON value. The strings
  could be either JSON string scalars encoded in utf8mb4, or binary
  strings from JSON opaque scalars. In either case they are compared
  byte by byte.
    ...
*/
static int compare_json_strings(const char *str1, size_t str1_len,
```

How to repeat:
create table t(c text CHARACTER SET utf32);

insert into t values('"Hello"');

select json_extract('{"k1":"Hello"}', '$.k1') = c from t;

drop table t;
[26 Mar 2022 11:18] MySQL Verification Team
Hello Luis Donoso,

Thank you for the report and feedback.

regards,
Umesh
[28 Mar 2022 2:29] huahua xu
Hello Luis Donoso,

This problem has nothing to do with the character set.

The first query statement will be executed as:
	Item_func_eq
	  ->left: Item_field(t.c)
	  ->right: Item_func_conv_charset
	    ->args: Item_func_json_extract
	    ->conv_charset: my_charset_utf32_general_ci
	  ->cmp: Arg_comparator::compare_string

In the function Arg_comparator::compare_string, the parameters will both be converted to String before being compared. Both of their values are "\"Hello\"".

The second query statement will be executed as:
	Item_func_eq
	  ->left: Item_func_json_extract
	  ->right: Item_func_conv_charset
	    ->args: Item_field(t.c)
	    ->conv_charset: my_charset_utf8mb4_bin
	  ->cmp: Arg_comparator::compare_json

In the function Arg_comparator::compare_json, the parameters will both be converted to Json_wrapper before being compared. the real value of left is "Hello", but right is "\"Hello\"".so, they are not equal.
[28 Mar 2022 8:04] Luis Donoso
Thank you Xu for looking into this, you're absolutely right in your analysis.

My 'how to reproduce' example was not correct. Please, try this one:

```
create table t(c text CHARACTER SET utf32);

insert into t values('Hello');

select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
0

drop table t;

create table t(c text CHARACTER SET utf8mb4);

insert into t values('Hello');

select json_extract('{"k1":"Hello"}', '$.k1') = c from t;

json_extract('{"k1":"Hello"}', '$.k1') = c
1

drop table t;
```

Because of that, I think the issue is that the Json wrapper is expected to be utf8mb4 but no conversion is done if the original charset is another one.
[29 Mar 2022 7:54] huahua xu
Hi Luis Donoso,

In my environment, the examples both are correct.

mysql> create table t(c text CHARACTER SET utf32);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t values('Hello');
Query OK, 1 row affected (0.01 sec)

mysql> select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
+--------------------------------------------+
| json_extract('{"k1":"Hello"}', '$.k1') = c |
+--------------------------------------------+
|                                          1 |
+--------------------------------------------+

mysql> drop table t;
Query OK, 0 rows affected (0.01 sec)

mysql> create table t(c text CHARACTER SET utf8mb4);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t values('Hello');
Query OK, 1 row affected (0.01 sec)

mysql> select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
+--------------------------------------------+
| json_extract('{"k1":"Hello"}', '$.k1') = c |
+--------------------------------------------+
|                                          1 |
+--------------------------------------------+

Which version about your mysql?
[29 Mar 2022 8:43] Luis Donoso
It's weird, I am using 8.0.28

```
./mtr --force --parallel=10 --repeat=1 --max-test-fail=0 --retry-failure=1 --big-test \
json.json_func_blob  \

Logging: /home/ldonoso/src/mysql-8.0/mysql-test/mysql-test-run.pl  --force --parallel=10 --repeat=1 --max-test-fail=0 --retry-failure=1 --big-test json.json_func_blob
MySQL Version 8.0.28
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ldonoso/src/mysql-8.0-bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] json.json_func_blob                       [ pass ]    259
[100%] shutdown_report                           [ pass ]       
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 0.259 of 28 seconds executing testcases

Completed: All 2 tests were successful.

~/src$ cat mysql-8.0/mysql-test/suite/json/r/json_func_blob.result 
create table t(c text CHARACTER SET utf32);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
0
drop table t;
create table t(c text CHARACTER SET utf8mb4);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
1
drop table t;
```
[27 Apr 2022 8:42] Luis Donoso
This problem has not yet been fixed in 8.0.29:

```
create table t(c text CHARACTER SET utf32);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
0
drop table t;

create table t(c text CHARACTER SET utf8mb4);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
1
drop table t;
```