Bug #68254 subquery optimize to wrong semi-join with extensions syntax to group by
Submitted: 2 Feb 2013 10:31 Modified: 9 Jan 2015 16:42
Reporter: lou shuai (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.6.10 OS:Any
Assigned to: Guilhem Bichot CPU Architecture:Any
Tags: GROUP BY, semi join, subquery

[2 Feb 2013 10:31] lou shuai
Description:
subquery with group by produce wrong result set.

How to repeat:
use test;
CREATE TABLE `t1` (
  `t1_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `type` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`t1_id`),
  KEY `type` (`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
CREATE TABLE `t2` (
  `t2_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `t1_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`t2_id`),
  KEY `t1_id` (`t1_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
 
INSERT INTO t1 SET TYPE=0;
INSERT INTO t1 SET TYPE=1;
INSERT INTO t1 SET TYPE=1;
INSERT INTO t1 SET TYPE=1;
 
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
 
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
 
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
 
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
COMMIT;

mysql> SELECT t1_id from t1 group by type;
+-------+
| t1_id |
+-------+
|     1 |
|     2 |
+-------+
2 rows in set (0.00 sec)

mysql> SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (1,2);
+-------+-------+
| t2_id | t1_id |
+-------+-------+
|     1 |     1 |
|     2 |     1 |
|     3 |     1 |
|     4 |     1 |
|     5 |     1 |
|     6 |     1 |
|     7 |     2 |
|     8 |     2 |
|     9 |     2 |
|    10 |     2 |
|    11 |     2 |
|    12 |     2 |
+-------+-------+
12 rows in set (0.00 sec)

mysql> SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (SELECT t1_id FROM t1 GROUP BY TYPE);
+-------+-------+
| t2_id | t1_id |
+-------+-------+
|     1 |     1 |
|     2 |     1 |
|     3 |     1 |
|     4 |     1 |
|     5 |     1 |
|     6 |     1 |
|     7 |     2 |
|     8 |     2 |
|     9 |     2 |
|    10 |     2 |
|    11 |     2 |
|    12 |     2 |
|    13 |     3 |
|    14 |     3 |
|    15 |     3 |
|    16 |     3 |
|    17 |     3 |
|    18 |     3 |
|    19 |     4 |
|    20 |     4 |
|    21 |     4 |
|    22 |     4 |
|    23 |     4 |
|    24 |     4 |
+-------+-------+
24 rows in set (0.00 sec)

Suggested fix:
see the attach file: patch.diff which is based on 
revno: 4579
tags: mysql-5.6.9
[2 Feb 2013 10:31] lou shuai
patch for this bug

Attachment: patch.diff (text/x-patch), 910 bytes.

[2 Feb 2013 13:50] Matthew Lord
Thank you for the excellent bug report! I will copy this over to the development team now, so that they can take a closer look at it.
[2 Feb 2013 16:42] Erlend Dahl
5.5.31 seems to return a correct result, so this looks like a regression.
[4 Feb 2013 2:36] lou shuai
@Erlend Dahl
Actually, it's not a regression cause this is a subquery optimize fault which is
implemented only in 5.6. So the bug is brought in with the new optimize method:
subquery to semi join.
[4 Feb 2013 8:36] Guilhem Bichot
Hello. As explained in
http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html
the subquery "SELECT t1_id from t1 group by type" is bound to give random results:
"MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. [...] However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.". In table t1, for the group which has type=1, there are three different values of t1_id, so results are indeterminate.
If we want to say this in more pedantic words, selected columns are not functionally dependent on grouping columns, so the subquery is also not compliant with standard SQL2011.

Because it gives random results, the top query also gives random results.
If running with sql_mode=only_full_group_by, it is properly rejected:
SELECT t1_id from t1 group by type;
ERROR 1055 (42000): 'test.t1.t1_id' isn't in GROUP BY
[5 Feb 2013 2:55] lou shuai
@Guilhem Bichot
I have read the manual about the extention syntax with Group By, and I also know the syntax is not valid in standard SQL syntax. But for now, MySQL already supports this syntax, so we should not just use an ONLY_FULL_GROUP_BY sqlmode to avoid this problem, and we should deal with this problem.

And the indeterminate value means the server is free to choose any value from each group. What every the server choose, the number of the result will be determinate according to the group by list. So whatever the subquery returns, it should just return two rows, so the result of the sql is wrong.

MySQL 5.5.* and before works well, and just MySQL 5.6.* return the different result  set. You should have a deep look into the Optimizer, and you will find this is caused by the subquery Optimize which is introduced only in 5.6.*, the Optimizer translate the subquery to semi-join, so the problem happens. And supply a patch for this problem, it's kind of you to have a glance at the patch.

Generally speaking, the bug is introduced only in 5.6.*, and the older version works well, so please fix it to make sure all of mysql versions works the same.
[5 Feb 2013 13:32] MySQL Verification Team
There is a bug here, but it is not important, IMHO.

There is no way that one should expect to see nested query finishing in:

... IN (1,2);

returning the same result as the nested query presented in the test case.

A nested query is free to return, for this test case, any of the three rows :

(1,2) or (1,3) or (1,4)

but no, it can't return (1,2,3,4) as `type` has only two distinct value. This is the only bug here, although very small.

Even that bug could be explained, by someone from Optimizer team. 

It is quite possible that in the optimizing stage sub query returns (1,2), while in execution it returns (1,4), which would lead to the error that we see in the last result set.

If Optimizer team discovers that this is a possible scenario, then I would say that this is not a bug. Simply, the extended results from aggregated queries are unreliable and can't be used in optimization.
[26 Jun 2013 21:33] MySQL Verification Team
http://bugs.mysql.com/bug.php?id=69521 marked as duplicate of this one.
[27 Jun 2013 8:30] Guilhem Bichot
actually they are not duplicates. This one has IN(subquery).