Bug #76679 EXPLAIN incorrectly shows Distinct for tables using join buffer
Submitted: 13 Apr 2015 15:48 Modified: 14 Apr 2015 6:45
Reporter: Sergey Petrunya Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6, 5.6.24, 5.6.25, 5.7.8 OS:Any
Assigned to: CPU Architecture:Any

[13 Apr 2015 15:48] Sergey Petrunya
Description:

create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table twenty (a int, filler char(200), key(a));
insert into twenty select A.a + B.a* 10, 'AAAAAAAAAAAAAAAAAAAA' from ten A, ten B where B.a in (0,1);

MySQL [test]>  explain select A.a from ten A, twenty B where A.a+B.a> 0;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                           |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
|  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | NULL                                                            |
|  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------------------------+
2 rows in set (0.00 sec)

MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0; 
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                                     |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+
|  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | Using temporary                                                           |
|  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index; Distinct; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------------------------------------------+

Look at the second query. It has "Distinct".  I think this wrong. "Distinct"
doesn't work with "Using join buffer". 

To make sure we are on the same page: "Distinct" in table B means that as soon
as we've got a match for current record in table A, we don't have to look for 
any other matches. The query's SELECT list is "SELECT DISTINCT A.a", there is 
no point to look for additional A.row-B.row pairs with the same A.row.

How to repeat:

You can see Distinct in action: put a breakpoint in end_write and end_send. 
Then:

set optimizer_switch='block_nested_loop=off';
# Put a breakpoint in end_send
select A.a from ten A, twenty B where A.a+B.a> 0;
# observe that breakpoint was hit 200 times.

# Run EXPLAIN to be sure about what's going on:
MySQL [test]> explain select A.a from ten A, twenty B where A.a+B.a> 0;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | NULL                     |
|  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

Then,
set optimizer_switch='block_nested_loop=off';
# Put a breakpoint in end_write 
select distinct A.a from ten A, twenty B where A.a+B.a> 0;
# Observe that breakpoint was hit 11 times.
# This is how Distinct optimization works.

# (just in case, let's check EXPLAIN: )
MySQL [test]> explain select distinct A.a from ten A, twenty B where A.a+B.a> 0;
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                              |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
|  1 | SIMPLE      | A     | ALL   | NULL          | NULL | NULL    | NULL |   10 | Using temporary                    |
|  1 | SIMPLE      | B     | index | NULL          | a    | 5       | NULL |   20 | Using where; Using index; Distinct |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+

Ok, now with "Using join buffer":

set optimizer_switch='block_nested_loop=on';
# Put a breakpoint in end_send
select A.a from ten A, twenty B where A.a+B.a> 0;
# observe that breakpoint was hit 200 times.

# Put a breakpoint in end_write 
select distinct A.a from ten A, twenty B where A.a+B.a> 0;
# Observe that breakpoint was hit 200 times, again.

When one takes time to think about, it becomes obvious that "Distinct"
optimization cannot be used with Join buffering.  And it is not used. But EXPLAIN shows it's used.
[14 Apr 2015 6:45] MySQL Verification Team
Hello Sergey,

Thank you for the report and test case.
Observed this with 5.6.24, 5.6.25 and 5.7.8 builds.

Thanks,
Umesh
[1 Feb 2017 14:15] Ricardo Adao
Marked as S3 (Non-critical) but the outputs with 
set optimizer_switch='block_nested_loop=off';
and 
set optimizer_switch='block_nested_loop=off';

are different, this is pretty serious
[1 Feb 2017 14:16] Ricardo Adao
sorry, I meant 

set optimizer_switch='block_nested_loop=off';

and 

set optimizer_switch='block_nested_loop=on';
[1 Feb 2017 14:28] Paulo Condeça
We are in the year 2017 and I was amazed to have this issue, easy to repro this same situation with a query returning != results :s