| Bug #106841 | Error comparing Json objects and text utf32 | ||
|---|---|---|---|
| Submitted: | 25 Mar 2022 16:55 | Modified: | 29 Mar 2022 9:13 |
| Reporter: | Luis Donoso | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: JSON | Severity: | S3 (Non-critical) |
| Version: | 8.0.28, 5.7.37 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[26 Mar 2022 11:18]
MySQL Verification Team
Hello Luis Donoso, Thank you for the report and feedback. regards, Umesh
[28 Mar 2022 2:29]
huahua xu
Hello Luis Donoso, This problem has nothing to do with the character set. The first query statement will be executed as: Item_func_eq ->left: Item_field(t.c) ->right: Item_func_conv_charset ->args: Item_func_json_extract ->conv_charset: my_charset_utf32_general_ci ->cmp: Arg_comparator::compare_string In the function Arg_comparator::compare_string, the parameters will both be converted to String before being compared. Both of their values are "\"Hello\"". The second query statement will be executed as: Item_func_eq ->left: Item_func_json_extract ->right: Item_func_conv_charset ->args: Item_field(t.c) ->conv_charset: my_charset_utf8mb4_bin ->cmp: Arg_comparator::compare_json In the function Arg_comparator::compare_json, the parameters will both be converted to Json_wrapper before being compared. the real value of left is "Hello", but right is "\"Hello\"".so, they are not equal.
[28 Mar 2022 8:04]
Luis Donoso
Thank you Xu for looking into this, you're absolutely right in your analysis.
My 'how to reproduce' example was not correct. Please, try this one:
```
create table t(c text CHARACTER SET utf32);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
0
drop table t;
create table t(c text CHARACTER SET utf8mb4);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
1
drop table t;
```
Because of that, I think the issue is that the Json wrapper is expected to be utf8mb4 but no conversion is done if the original charset is another one.
[29 Mar 2022 7:54]
huahua xu
Hi Luis Donoso,
In my environment, the examples both are correct.
mysql> create table t(c text CHARACTER SET utf32);
Query OK, 0 rows affected (0.02 sec)
mysql> insert into t values('Hello');
Query OK, 1 row affected (0.01 sec)
mysql> select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
+--------------------------------------------+
| json_extract('{"k1":"Hello"}', '$.k1') = c |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
mysql> drop table t;
Query OK, 0 rows affected (0.01 sec)
mysql> create table t(c text CHARACTER SET utf8mb4);
Query OK, 0 rows affected (0.02 sec)
mysql> insert into t values('Hello');
Query OK, 1 row affected (0.01 sec)
mysql> select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
+--------------------------------------------+
| json_extract('{"k1":"Hello"}', '$.k1') = c |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
Which version about your mysql?
[29 Mar 2022 8:43]
Luis Donoso
It's weird, I am using 8.0.28
```
./mtr --force --parallel=10 --repeat=1 --max-test-fail=0 --retry-failure=1 --big-test \
json.json_func_blob \
Logging: /home/ldonoso/src/mysql-8.0/mysql-test/mysql-test-run.pl --force --parallel=10 --repeat=1 --max-test-fail=0 --retry-failure=1 --big-test json.json_func_blob
MySQL Version 8.0.28
Checking supported features
- Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ldonoso/src/mysql-8.0-bld/mysql-test/var'
Installing system database
Using parallel: 1
==============================================================================
TEST NAME RESULT TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] json.json_func_blob [ pass ] 259
[100%] shutdown_report [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 0.259 of 28 seconds executing testcases
Completed: All 2 tests were successful.
~/src$ cat mysql-8.0/mysql-test/suite/json/r/json_func_blob.result
create table t(c text CHARACTER SET utf32);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
0
drop table t;
create table t(c text CHARACTER SET utf8mb4);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
1
drop table t;
```
[27 Apr 2022 8:42]
Luis Donoso
This problem has not yet been fixed in 8.0.29:
```
create table t(c text CHARACTER SET utf32);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
0
drop table t;
create table t(c text CHARACTER SET utf8mb4);
insert into t values('Hello');
select json_extract('{"k1":"Hello"}', '$.k1') = c from t;
json_extract('{"k1":"Hello"}', '$.k1') = c
1
drop table t;
```

Description: Error comparing Json objects and text utf32 Comparing the result of `json_extract` and text utf32 gives a wrong result: ``` create table t(c text CHARACTER SET utf32); insert into t values('"Hello"'); select json_extract('{"k1":"Hello"}', '$.k1') = c from t; > 0 ``` But the other way around is correct: ``` select c = json_extract('{"k1":"Hello"}', '$.k1') from t; > 1 ``` The cause might be that the `Json_wrapper` for the text value `c` has lost any information about the original charset, from the function `sql_scalar_to_json`: ``` // BINARY or similar if (create_scalar<Json_opaque>(scalar, &dom, field_type, res->ptr(), res->length())) ``` Please, mind `Json_opaque` has no information about charset or a conversion to utf8mb4 is done. Later the comparison for json is done expecting it it is binary or utf8mb4, from the function: ``` /** Compare the contents of two strings in a JSON value. The strings could be either JSON string scalars encoded in utf8mb4, or binary strings from JSON opaque scalars. In either case they are compared byte by byte. ... */ static int compare_json_strings(const char *str1, size_t str1_len, ``` How to repeat: create table t(c text CHARACTER SET utf32); insert into t values('"Hello"'); select json_extract('{"k1":"Hello"}', '$.k1') = c from t; drop table t;