Bug #70203 | GROUP BY is consistent with ORDER BY in subquery | ||
---|---|---|---|
Submitted: | 30 Aug 2013 16:38 | Modified: | 31 Aug 2013 12:31 |
Reporter: | Anthony Master | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
Version: | 5.5.32 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | group-by, order-by, subquery |
[30 Aug 2013 16:38]
Anthony Master
[30 Aug 2013 21:46]
Anthony Master
Quoted from: http://stackoverflow.com/users/107744/spencer7593 "The idea behind SQL (Structured Query Language) is that it is declarative, in that it declares the resultset to be returned, and does not specify "how" the result set is to be returned. [...] well-written code isn't 'random' in nature, it executes following a set of rules, and we observe consistent behavior. However, this is not a guarantee that this behavior won't ever change. (Compilers implement optimizations which result in machine code that gives the same result, but operates differently.) For years, when we did a GROUP BY in Oracle, we observed that Oracle would use a sort operation as part of the "how" it went about satisfying the GROUP BY. Everytime we ran the same query, Oracle would do the same sort. But Oracle never made a guarantee that a sort operation would always be used. When a new version of Oracle was rolled out, lo and behold, we observe that Oracle is now using a hash operation, rather than a sort operation, to satisfy a GROUP BY. Any query that included an ORDER BY clause continued to return results in the order that was specified (guaranteed behavior), but other queries which didn't have an ORDER BY had rows returned in a different order. Oracle just improved "how" it went about preparing a resultset that satisfied the requirements. This one example demonstrates why we don't write queries that depend on a behavior that is not guaranteed. In some later release of MySQL, we will likely observe changes in the behavior, where the resultset returned by a query is guaranteed to meet the specifications, but will be DIFFERENT from the resultset returned currently. One of the primary design goals of SQL is that we declare the result set is to be returned, without specifying "how" the DBMS is to go about producing the resultset. We do use hints, and we structure queries in certain ways, and that does influence the execution plan generated by the optimizer. But this is not a guarantee. If the resultset is to meet a specified requirement, then the SQL should include that specification; we should not rely on a particular observed behavior to imply that specification." RECOMMENDATION: I recommend changing the line found on https://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html From: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate." To: "The server's choice of the value from each group is based on the current version's optimization logic which may change from version to version, so unless the values are the same, they are indeterminate and may not be the same results from one version to the next." Because: The server is not given the freedom to choose the values, but the values are chosen based upon optimization logic within the current version.
[31 Aug 2013 12:31]
Sveta Smirnova
Thank you for the report. What you are asking is just same phrase, but written in different words. Also not only server version matters, but tables content, because optimizer can choose different plan depending from the table statistics. Regarding to Geometry you can only refute proves this way. So, if we'd guarantee order is same and you showed opposite this method would work. But not in this case. So I don't think new re-phrasing will make the user manual clearer.