Description:
There are cases where EXPLAIN FORMAT=JSON produces "duplicates_removal", while no duplicate removal is taking place.
This issue is similar (but not the same as) BUG#74462, which is about ordering_operation when no ordering takes place.
How to repeat:
Consider this example:
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
CREATE TABLE tpk (
pk int(11) NOT NULL,
a int(11) DEFAULT NULL,
b int(11) DEFAULT NULL,
PRIMARY KEY (pk),
KEY a (a)
);
insert into tpk select a,a,a from one_k;
explain select distinct a from tpk where a<10;
explain select distinct pk from tpk where pk<10;
explain format=json select distinct a from tpk where a<10\G
explain format=json select distinct pk from tpk where pk<10\G
MySQL [j1]> explain format=json select distinct a from tpk where a<10\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"duplicates_removal": {
"using_filesort": false,
"table": {
"table_name": "tpk",
"access_type": "range",
"possible_keys": [
"a"
],
"key": "a",
"used_key_parts": [
"a"
],
"key_length": "5",
"rows": 9,
"filtered": 100,
"using_index": true,
"attached_condition": "(`j1`.`tpk`.`a` < 10)"
}
}
}
}
1 row in set, 1 warning (0.00 sec)
MySQL [j1]> explain format=json select distinct pk from tpk where pk<10\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"duplicates_removal": {
"using_filesort": false,
"table": {
"table_name": "tpk",
"access_type": "range",
"possible_keys": [
"PRIMARY",
"a"
],
"key": "PRIMARY",
"used_key_parts": [
"pk"
],
"key_length": "4",
"rows": 10,
"filtered": 100,
"using_index": true,
"attached_condition": "(`j1`.`tpk`.`pk` < 10)"
}
}
}
}
1 row in set, 1 warning (0.01 sec)
Both queries show "duplicates_removal", they look like they are using the same execution plans (except for the name of used indexes).
In reality, their execution is different
* when using PK, the optimizer figures that primary key values are already distinct, so no duplicate_removal takes place.
* when using non-unique index "a", index scan may produce duplicates which are removed "on the fly" by comparing each row with the previous one. If it's equal, it's removed.
Suggested fix:
Make EXPLAIN JSON show whether/how duplicates are removed.