Bug #11211 | GROUP BY doesn't work correctly | ||
---|---|---|---|
Submitted: | 9 Jun 2005 16:10 | Modified: | 23 Jun 2005 5:07 |
Reporter: | Francois MASUREL | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
Version: | 5.0.6 | OS: | Windows (Windows XP) |
Assigned to: | Timour Katchaounov | CPU Architecture: | Any |
[9 Jun 2005 16:10]
Francois MASUREL
[9 Jun 2005 16:54]
MySQL Verification Team
The duplicates I got using the columns alias in GROUP BY, however without alias works as expected: mysql> SELECT object.guid AS object_guid, user.login AS user_login FROM -> objects AS object -> INNER JOIN user AS user ON (user.id_object = object.id) -> INNER JOIN object_acl ON object_acl.id_obj_acl = object.id -> INNER JOIN usergroup_tree AS usergrouptree ON usergrouptree.id_parent = -> object_acl.id_group -> LEFT JOIN usergroup_parent AS c3916 ON (c3916.object_id_parent = -> object.id) -> LEFT JOIN usergroup_tree AS usergroup_tree_c3916 ON -> (c3916.usergroup_parent_id_parent = usergroup_tree_c3916.id_child) -> GROUP BY object.guid, user.login ORDER BY object_guid; +-------------+------------+ | object_guid | user_login | +-------------+------------+ | 3253_3912 | admin | | 3255_3912 | f.masurel | | 3278_3912 | anonymous | +-------------+------------+ 3 rows in set (0.00 sec)
[14 Jun 2005 12:24]
Timour Katchaounov
This is not a bug, but expected behavior. Detailed explanation: The problem is that the test query groups by a column that is ambiguous from user's perspective. It seems that the user wanted to group by a column alias, but didn't notice that there is another column in a different table that has the same name. Notice that "object_guid" is a column in table "usergroup_parent". It is not clear what is the intent of the query - group by "objects.guid" which is aliased as "object_guid" or by "usergroup_parent.object_guid" which is aliased as "c3916.object_guid". The current name resolution algorithm for GROUP BY searches first the FROM clause for column references, and only if a column reference is not found there, it searches the SELECT clause. As a result, in the test query we actually group by column "c3916.object_guid" which is an alias for "usergroup_parent.object_guid". If one selects that column, one will see that indeed things are grouped correctly. So in this case the alias "object_guid" in the SELECT clause is not used for grouping at all and there is no bug. On the other hand, as Miguel wrote, there is another query, where we can group by the column "object.guid", which is an alias for "objects.guid", and then we get the expected result. Notice that this query is *NOT* equivalent to the previous one! From the bug report I assume that the initial intent was to get the result of the second query, but the user has made a mistake by using an alias named as a column from a different table. It is an open question whether we should detect such dangerous cases and issue an error or a warning. As a side note - I noteced that in many cases in the submitted database the user have named different things with the same names. This is generally a bad idea as it may lead to similar problems where things work, but not as expected due to one name overshadowing another.
[14 Jun 2005 13:21]
Timour Katchaounov
Ok, after all this is not so simple since we resolve first in the SELECT list, and only then check in the FROM clause. I will investigate this a bit more. However, my previous comment is valid in that the two queries are not equivalent, and that the "object_guid" in both queries is resolved to a column from a different table. This is the reason why the two queries produce different result - because they are not equivalent.
[14 Jun 2005 14:23]
Francois MASUREL
You should notice that it works perfectly fine on mysql 4.1.12. Francois
[14 Jun 2005 16:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/25986
[14 Jun 2005 17:03]
Timour Katchaounov
Francois, you are perfectly right - it works as you say in 4.1. The reason is that in 4.1 we only check whether GROUP columns are in the SELECT list, but we don't check the FROM clause. Our goal is to make 5.0 SQL standard compliant, and the standard says: " 1) Each <grouping column reference> shall unambiguously reference a column of the table resulting from the <from clause>. A column referenced in a <group by clause> is a grouping column. " On the other hand we preserve the MySQL extension that allows to group by column references in the SELECT clause. However, in case there is a name resolution conflict, we preserve the standard behavior. The patch I suggest issues a warning in such cases.
[15 Jun 2005 7:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/26009
[16 Jun 2005 15:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/26060
[22 Jun 2005 9:00]
Timour Katchaounov
Pushed in version 5.0.8.
[23 Jun 2005 5:07]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented in 5.0.8 changelog; maked as Closed.