| Bug #120269 | Wrong result in string comparison with JSON_UNQUOTE(JSON_EXTRACT(...)) | ||
|---|---|---|---|
| Submitted: | 15 Apr 7:40 | Modified: | 15 Apr 10:34 |
| Reporter: | Guo Yuxiao | Email Updates: | |
| Status: | Open | Impact on me: | |
| Category: | MySQL Server | Severity: | S1 (Critical) |
| Version: | 9.6.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[15 Apr 8:42]
Roy Lyseng
Hi Guo, I do not see anything wrong in the comparison here. Please indicate what you think is the problem.
[15 Apr 9:44]
Guo Yuxiao
Hi, thanks. I reduced the testcase further.
It seems the issue is not in JSON evaluation itself.
JSON_UNQUOTE(JSON_EXTRACT(...)) evaluates to the text {"k": 0}, while ref_1.c1 is a TINYTEXT column with collation utf8mb4_0900_ai_ci. The JSON expression appears to use utf8mb4_bin.
So the concern is not that JSON_EXTRACT or JSON_UNQUOTE returns an incorrect value, but that the comparison result seems to depend on collation resolution.
My understanding is that, for the same textual contents, the comparison result would normally be expected to remain stable unless collation handling differs in a meaningful way. In this testcase, the outcome appears to change when the left-hand operand is a TINYTEXT column rather than a string literal or a casted string expression.
Please let me know if this understanding is correct.
[15 Apr 9:59]
Roy Lyseng
A string comparison result will always be dependent on the collation derivation rules applied to the comparison expression. Collation derivation is performed using strict rules as documented. E.g, collation of columns take preference over literal values, etc. Without a CAST, a JSON value will use collation utf8mb4_bin, and this collation will often give different results than a more common collation like utf8mb4_0900_ai_ci. That said, this is a rather uncommon query, so the change in result will probably not have much practical impact. I think we need a more specific description to validate this as a bug report.
[15 Apr 10:34]
Guo Yuxiao
I added the explicit control comparisons.
--query1
SELECT
ref_1.c1 AS lhs,
COLLATION(ref_1.c1) AS lhs_collation,
COERCIBILITY(ref_1.c1) AS lhs_coercibility,
JSON_UNQUOTE(
JSON_EXTRACT(
JSON_OBJECT(
'k',
JSON_SET(JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0)
),
'$.k'
)
) AS rhs,
COLLATION(
JSON_UNQUOTE(
JSON_EXTRACT(
JSON_OBJECT(
'k',
JSON_SET(JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0)
),
'$.k'
)
)
) AS rhs_collation,
COERCIBILITY(
JSON_UNQUOTE(
JSON_EXTRACT(
JSON_OBJECT(
'k',
JSON_SET(JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0)
),
'$.k'
)
)
) AS rhs_coercibility,
ref_1.c1 >= JSON_UNQUOTE(
JSON_EXTRACT(
JSON_OBJECT(
'k',
JSON_SET(JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0)
),
'$.k'
)
) AS cmp_original
FROM t1 AS ref_0
JOIN t0 AS ref_1 ON TRUE;
-- result of query1
+--------------------+--------------------+------------------+----------+---------------+------------------+--------------+
| lhs | lhs_collation | lhs_coercibility | rhs | rhs_collation | rhs_coercibility | cmp_original |
+--------------------+--------------------+------------------+----------+---------------+------------------+--------------+
| 0.7387026232049367 | utf8mb4_0900_ai_ci | 2 | {"k": 0} | utf8mb4_bin | 4 | 1 |
+--------------------+--------------------+------------------+----------+---------------+------------------+--------------+
1 row in set (0.00 sec)
-- query2
WITH q AS (
SELECT
ref_1.c1 AS lhs,
JSON_UNQUOTE(
JSON_EXTRACT(
JSON_OBJECT(
'k',
JSON_SET(JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0)
),
'$.k'
)
) AS rhs
FROM t1 AS ref_0
JOIN t0 AS ref_1 ON TRUE
)
SELECT
(lhs COLLATE utf8mb4_bin) >= (rhs COLLATE utf8mb4_bin) AS cmp_bin,
STRCMP(lhs COLLATE utf8mb4_bin, rhs COLLATE utf8mb4_bin) AS strcmp_bin
FROM q;
-- result of query2
+---------+------------+
| cmp_bin | strcmp_bin |
+---------+------------+
| 0 | -1 |
+---------+------------+
1 row in set (0.00 sec)
-- query3
WITH q AS (
SELECT
ref_1.c1 AS lhs,
JSON_UNQUOTE(
JSON_EXTRACT(
JSON_OBJECT(
'k',
JSON_SET(JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0)
),
'$.k'
)
) AS rhs
FROM t1 AS ref_0
JOIN t0 AS ref_1 ON TRUE
)
SELECT
(lhs COLLATE utf8mb4_0900_ai_ci) >=
(CAST(rhs AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_0900_ai_ci) AS cmp_ai_ci,
STRCMP(
lhs COLLATE utf8mb4_0900_ai_ci,
CAST(rhs AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_0900_ai_ci
) AS strcmp_ai_ci
FROM q;
-- result of query3
+-----------+--------------+
| cmp_ai_ci | strcmp_ai_ci |
+-----------+--------------+
| 1 | 1 |
+-----------+--------------+
1 row in set (0.00 sec)
So the original result matches the explicit utf8mb4_0900_ai_ci comparison, and does not match the explicit utf8mb4_bin comparison.
Please confirm whether this is the intended behavior according to the collation derivation rules. If so, I understand that this report is not a bug.

Description: Hi, I found a logic bug in MySQL 9.6.0. A query that should have returned an empty set actually returned 1. I executed this query in DBMSs such as TiDB, and the result returned was an empty set. How to repeat: -- create table DROP TABLE IF EXISTS `t0`; DROP TABLE IF EXISTS `t1`; CREATE TABLE t0 (c0 DECIMAL(10,0) UNSIGNED NOT NULL, c1 TINYTEXT); INSERT INTO t0 VALUES (1, '0.7387026232049367'); CREATE TABLE t1 ( c0 DECIMAL(10,0) UNSIGNED); INSERT INTO t1 VALUES(0); -- query, expect:Empty set, actual:{1} SELECT 1 FROM `t1` AS ref_0 INNER JOIN `t0` AS ref_1 ON TRUE WHERE ref_1.c1 >= JSON_UNQUOTE( JSON_EXTRACT( JSON_OBJECT( 'k', JSON_SET( JSON_OBJECT('k', ref_0.c0), '$.k', ref_0.c0 ) ), '$.k' ) ) ORDER BY c1 ASC;