Bug #68254 | subquery optimize to wrong semi-join with extensions syntax to group by | ||
---|---|---|---|
Submitted: | 2 Feb 2013 10:31 | Modified: | 9 Jan 2015 16:42 |
Reporter: | lou shuai (OCA) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 5.6.10 | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | GROUP BY, semi join, subquery |
[2 Feb 2013 10:31]
lou shuai
[2 Feb 2013 10:31]
lou shuai
patch for this bug
Attachment: patch.diff (text/x-patch), 910 bytes.
[2 Feb 2013 13:50]
Matthew Lord
Thank you for the excellent bug report! I will copy this over to the development team now, so that they can take a closer look at it.
[2 Feb 2013 16:42]
Erlend Dahl
5.5.31 seems to return a correct result, so this looks like a regression.
[4 Feb 2013 2:36]
lou shuai
@Erlend Dahl Actually, it's not a regression cause this is a subquery optimize fault which is implemented only in 5.6. So the bug is brought in with the new optimize method: subquery to semi join.
[4 Feb 2013 8:36]
Guilhem Bichot
Hello. As explained in http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html the subquery "SELECT t1_id from t1 group by type" is bound to give random results: "MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. [...] However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.". In table t1, for the group which has type=1, there are three different values of t1_id, so results are indeterminate. If we want to say this in more pedantic words, selected columns are not functionally dependent on grouping columns, so the subquery is also not compliant with standard SQL2011. Because it gives random results, the top query also gives random results. If running with sql_mode=only_full_group_by, it is properly rejected: SELECT t1_id from t1 group by type; ERROR 1055 (42000): 'test.t1.t1_id' isn't in GROUP BY
[5 Feb 2013 2:55]
lou shuai
@Guilhem Bichot I have read the manual about the extention syntax with Group By, and I also know the syntax is not valid in standard SQL syntax. But for now, MySQL already supports this syntax, so we should not just use an ONLY_FULL_GROUP_BY sqlmode to avoid this problem, and we should deal with this problem. And the indeterminate value means the server is free to choose any value from each group. What every the server choose, the number of the result will be determinate according to the group by list. So whatever the subquery returns, it should just return two rows, so the result of the sql is wrong. MySQL 5.5.* and before works well, and just MySQL 5.6.* return the different result set. You should have a deep look into the Optimizer, and you will find this is caused by the subquery Optimize which is introduced only in 5.6.*, the Optimizer translate the subquery to semi-join, so the problem happens. And supply a patch for this problem, it's kind of you to have a glance at the patch. Generally speaking, the bug is introduced only in 5.6.*, and the older version works well, so please fix it to make sure all of mysql versions works the same.
[5 Feb 2013 13:32]
MySQL Verification Team
There is a bug here, but it is not important, IMHO. There is no way that one should expect to see nested query finishing in: ... IN (1,2); returning the same result as the nested query presented in the test case. A nested query is free to return, for this test case, any of the three rows : (1,2) or (1,3) or (1,4) but no, it can't return (1,2,3,4) as `type` has only two distinct value. This is the only bug here, although very small. Even that bug could be explained, by someone from Optimizer team. It is quite possible that in the optimizing stage sub query returns (1,2), while in execution it returns (1,4), which would lead to the error that we see in the last result set. If Optimizer team discovers that this is a possible scenario, then I would say that this is not a bug. Simply, the extended results from aggregated queries are unreliable and can't be used in optimization.
[26 Jun 2013 21:33]
MySQL Verification Team
http://bugs.mysql.com/bug.php?id=69521 marked as duplicate of this one.
[27 Jun 2013 8:30]
Guilhem Bichot
actually they are not duplicates. This one has IN(subquery).