Bug #114682 Duplicate key error when executing query
Submitted: 18 Apr 2024 11:57 Modified: 18 Apr 2024 13:37
Reporter: Jingqi Tian (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[18 Apr 2024 11:57] Jingqi Tian
Description:
The substitute_generated_columns step in the optimizer may mismatch virtual columns and fields in the SELECT list, causing the query to report duplicate key errors.

How to repeat:
CREATE TABLE `t1` (
  `uid` int NOT NULL,
  `col1` json DEFAULT NULL,
  `col2` varchar(10) GENERATED ALWAYS AS (json_unquote(json_extract(`col1`,_utf8mb4'$.id'))) VIRTUAL,
  PRIMARY KEY (`uid`),
  KEY `index_col2` (`col2`)
);

insert into t1 (uid, col1) values (1, '{"id":"1", "name":"Alice"}');
insert into t1 (uid, col1) values (2, '{"id":"1", "name":"Alice"}');

select json_extract(col1, '$.id'), count(json_extract(col1, '$.name')) from t1 where uid in (select uid from t1) group by 1;

Suggested fix:
The substitute_generated_columns step mismatch 'json_extract(col1, '$.id')' and virtual column 'col2'.
[18 Apr 2024 12:55] MySQL Verification Team
Hi Mr. Tian,

Thank you for your bug report.

With latest 8.0, we get the expected error from the SELECT and the expected results from EXPLAIN of the same SELECT :

ERROR 1062 (23000) at line 8: Duplicate entry '1' for key '/Users/sinisa/razno/work/repo/mysql-8.0.35/mysqltmp/#sql2c82_a_0.<group_key>'
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t1	NULL	index	PRIMARY	index_col2	43	NULL	2	100.00	Using index; Using temporary
1	SIMPLE	t1	NULL	ALL	PRIMARY,index_col2	NULL	NULL	NULL	2	50.00	Range checked for each record (index map: 0x3)

Hence, we do not see what is wrong in the above output ......
[18 Apr 2024 13:34] Jingqi Tian
The ERROR 1062 (23000) at line 8: Duplicate entry '1' for key '/Users/sinisa/razno/work/repo/mysql-8.0.35/mysqltmp/#sql2c82_a_0.<group_key>' is not expected.

The query should not return any error and should return result:
+------------------------------------------+-------------------------------------+
| json_unquote(json_extract(col1, '$.id')) | count(json_extract(col1, '$.name')) |
+------------------------------------------+-------------------------------------+
| 1                                        |                                   2 |
+------------------------------------------+-------------------------------------+

To further illustrate this is a bug, if we drop the index 'index_col2' on t1, the query will return correct result. The test case will be like:

CREATE TABLE `t1` (
  `uid` int NOT NULL,
  `col1` json DEFAULT NULL,
  `col2` varchar(10) GENERATED ALWAYS AS (json_unquote(json_extract(`col1`,_utf8mb4'$.id'))) VIRTUAL,
  PRIMARY KEY (`uid`),
  KEY `index_col2` (`col2`)
);

insert into t1 (uid, col1) values (1, '{"id":"1", "name":"Alice"}');
insert into t1 (uid, col1) values (2, '{"id":"1", "name":"Alice"}');

select json_extract(col1, '$.id'), count(json_extract(col1, '$.name')) from t1 where uid in (select uid from t1) group by 1;

Get error:
ERROR 1022 (23000): Can't write; duplicate key in table '/home/mysql/tmp/#sql17a7d_1d_5'.

alter table t1 drop index index_col2;

select json_extract(col1, '$.id'), count(json_extract(col1, '$.name')) from t1 where uid in (select uid from t1) group by 1;

Get correct result:
+----------------------------+-------------------------------------+
| json_extract(col1, '$.id') | count(json_extract(col1, '$.name')) |
+----------------------------+-------------------------------------+
| "1"                        |                                   2 |
+----------------------------+-------------------------------------+

Obviously, the presence or absence of index_col2 affects the result of the query, which is definitely not as expected.
[18 Apr 2024 13:37] MySQL Verification Team
Hi Mr. Tian,

You are quite right.

This is now a verified bug report.

Thank you very much.