MySQL Bugs: #114682: Duplicate key error when executing query

Bug #114682	Duplicate key error when executing query
Submitted:	18 Apr 2024 11:57	Modified:	18 Apr 2024 13:37
Reporter:	Jingqi Tian (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The substitute_generated_columns step in the optimizer may mismatch virtual columns and fields in the SELECT list, causing the query to report duplicate key errors.

How to repeat:
CREATE TABLE `t1` (
  `uid` int NOT NULL,
  `col1` json DEFAULT NULL,
  `col2` varchar(10) GENERATED ALWAYS AS (json_unquote(json_extract(`col1`,_utf8mb4'$.id'))) VIRTUAL,
  PRIMARY KEY (`uid`),
  KEY `index_col2` (`col2`)
);

insert into t1 (uid, col1) values (1, '{"id":"1", "name":"Alice"}');
insert into t1 (uid, col1) values (2, '{"id":"1", "name":"Alice"}');

select json_extract(col1, '$.id'), count(json_extract(col1, '$.name')) from t1 where uid in (select uid from t1) group by 1;

Suggested fix:
The substitute_generated_columns step mismatch 'json_extract(col1, '$.id')' and virtual column 'col2'.

Hi Mr. Tian,

Thank you for your bug report.

With latest 8.0, we get the expected error from the SELECT and the expected results from EXPLAIN of the same SELECT :

ERROR 1062 (23000) at line 8: Duplicate entry '1' for key '/Users/sinisa/razno/work/repo/mysql-8.0.35/mysqltmp/#sql2c82_a_0.<group_key>'
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t1	NULL	index	PRIMARY	index_col2	43	NULL	2	100.00	Using index; Using temporary
1	SIMPLE	t1	NULL	ALL	PRIMARY,index_col2	NULL	NULL	NULL	2	50.00	Range checked for each record (index map: 0x3)

Hence, we do not see what is wrong in the above output ......

The ERROR 1062 (23000) at line 8: Duplicate entry '1' for key '/Users/sinisa/razno/work/repo/mysql-8.0.35/mysqltmp/#sql2c82_a_0.<group_key>' is not expected.

The query should not return any error and should return result:
+------------------------------------------+-------------------------------------+
| json_unquote(json_extract(col1, '$.id')) | count(json_extract(col1, '$.name')) |
+------------------------------------------+-------------------------------------+
| 1                                        |                                   2 |
+------------------------------------------+-------------------------------------+

To further illustrate this is a bug, if we drop the index 'index_col2' on t1, the query will return correct result. The test case will be like:

CREATE TABLE `t1` (
  `uid` int NOT NULL,
  `col1` json DEFAULT NULL,
  `col2` varchar(10) GENERATED ALWAYS AS (json_unquote(json_extract(`col1`,_utf8mb4'$.id'))) VIRTUAL,
  PRIMARY KEY (`uid`),
  KEY `index_col2` (`col2`)
);

insert into t1 (uid, col1) values (1, '{"id":"1", "name":"Alice"}');
insert into t1 (uid, col1) values (2, '{"id":"1", "name":"Alice"}');

select json_extract(col1, '$.id'), count(json_extract(col1, '$.name')) from t1 where uid in (select uid from t1) group by 1;

Get error:
ERROR 1022 (23000): Can't write; duplicate key in table '/home/mysql/tmp/#sql17a7d_1d_5'.

alter table t1 drop index index_col2;

select json_extract(col1, '$.id'), count(json_extract(col1, '$.name')) from t1 where uid in (select uid from t1) group by 1;

Get correct result:
+----------------------------+-------------------------------------+
| json_extract(col1, '$.id') | count(json_extract(col1, '$.name')) |
+----------------------------+-------------------------------------+
| "1"                        |                                   2 |
+----------------------------+-------------------------------------+

Obviously, the presence or absence of index_col2 affects the result of the query, which is definitely not as expected.

Hi Mr. Tian,

You are quite right.

This is now a verified bug report.

Thank you very much.