Bug #12762 | Aggregate function cannot be used in statement | ||
---|---|---|---|
Submitted: | 23 Aug 2005 20:58 | Modified: | 21 Dec 2005 9:56 |
Reporter: | Hakan Küçükyılmaz | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 5.0.12 | OS: | Linux (Linux) |
Assigned to: | Igor Babaev | CPU Architecture: | Any |
[23 Aug 2005 20:58]
Hakan Küçükyılmaz
[15 Oct 2005 21:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/31138
[13 Dec 2005 2:47]
Igor Babaev
ChangeSet 1.2045 05/10/15 14:32:37 igor@rurik.mysql.com +25 -0 Fixed bug #12762: allowed set functions aggregated in outer subqueries, allowed nested set functions. The fix will appear in 5.0.18, it was merged into 5.1 as well. The following comment taken fron the patch explains what actually has been done. ======================================================= A set function cannot be used in certain positions where expressions are accepted. There are some quite explicable restrictions for the usage of set functions. In the query: SELECT AVG(b) FROM t1 WHERE SUM(b) > 20 GROUP by a the usage of the set function AVG(b) is legal, while the usage of SUM(b) is illegal. A WHERE condition must contain expressions that can be evaluated for each row of the table. Yet the expression SUM(b) can be evaluated only for each group of rows with the same value of column a. In the query: SELECT AVG(b) FROM t1 WHERE c > 30 GROUP BY a HAVING SUM(b) > 20 both set function expressions AVG(b) and SUM(b) are legal. We can say that in a query without nested selects an occurrence of a set function in an expression of the SELECT list or/and in the HAVING clause is legal, while in the WHERE clause it's illegal. The general rule to detect whether a set function is legal in a query with nested subqueries is much more complicated. Consider the the following query: SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL (SELECT t2.c FROM t2 WHERE SUM(t1.b) < t2.c). The set function SUM(b) is used here in the WHERE clause of the subquery. Nevertheless it is legal since it is under the HAVING clause of the query to which this function relates. The expression SUM(t1.b) is evaluated for each group defined in the main query, not for groups of the subquery. The problem of finding the query where to aggregate a particular set function is not so simple as it seems to be. In the query: SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.c FROM t2 GROUP BY t2.c HAVING SUM(t1.a) < t2.c) the set function can be evaluated for both outer and inner selects. If we evaluate SUM(t1.a) for the outer query then we get the value of t1.a multiplied by the cardinality of a group in table t1. In this case in each correlated subquery SUM(t1.a) is used as a constant. But we also can evaluate SUM(t1.a) for the inner query. In this case t1.a will be a constant for each correlated subquery and summation is performed for each group of table t2. (Here it makes sense to remind that the query SELECT c FROM t GROUP BY a HAVING SUM(1) < a is quite legal in our SQL). So depending on what query we assign the set function to we can get different result sets. The general rule to detect the query where a set function is to be evaluated can be formulated as follows. Consider a set function S(E) where E is an expression with occurrences of column references C1, ..., CN. Resolve these column references against subqueries that contain the set function S(E). Let Q be the innermost subquery of those subqueries. (It should be noted here that S(E) in no way can be evaluated in the subquery embedding the subquery Q, otherwise S(E) would refer to at least one unbound column reference) If S(E) is used in a construct of Q where set functions are allowed then we evaluate S(E) in Q. Otherwise we look for a innermost subquery containing S(E) of those where usage of S(E) is allowed. Let's demonstrate how this rule is applied to the following queries. 1. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 GROUP BY t2.b HAVING t2.b > ALL(SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) For this query the set function SUM(t1.a+t2.b) depends on t1.a and t2.b with t1.a defined in the outermost query, and t2.b defined for its subquery. The set function is in the HAVING clause of the subquery and can be evaluated in this subquery. 2. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 WHERE t2.b > ALL (SELECT t3.c FROM t3 GROUP BY t3.c HAVING SUM(t1.a+t2.b) < t3.c)) Here the set function SUM(t1.a+t2.b)is in the WHERE clause of the second subquery - the most upper subquery where t1.a and t2.b are defined. If we evaluate the function in this subquery we violate the context rules. So we evaluate the function in the third subquery (over table t3) where it is used under the HAVING clause. 3. SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a > ALL(SELECT t2.b FROM t2 WHERE t2.b > ALL (SELECT t3.c FROM t3 WHERE SUM(t1.a+t2.b) < t3.c)) In this query evaluation of SUM(t1.a+t2.b) is not legal neither in the second nor in the third subqueries. So this query is invalid. Mostly set functions cannot be nested. In the query SELECT t1.a from t1 GROUP BY t1.a HAVING AVG(SUM(t1.b)) > 20 the expression SUM(b) is not acceptable, though it is under a HAVING clause. Yet it is acceptable in the query: SELECT t.1 FROM t1 GROUP BY t1.a HAVING SUM(t1.b) > 20. An argument of a set function does not have to be a reference to a table column as we saw it in examples above. This can be a more complex expression SELECT t1.a FROM t1 GROUP BY t1.a HAVING SUM(t1.b+1) > 20. The expression SUM(t1.b+1) has a very clear semantics in this context: we sum up the values of t1.b+1 where t1.b varies for all values within a group of rows that contain the same t1.a value. A set function for an outer query yields a constant within a subquery. So the semantics of the query SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(t2.c+SUM(t1.b)) > 20) is still clear. For a group of the rows with the same t1.a values we calculate the value of SUM(t1.b). This value 's' is substituted in the the subquery: SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(t2.c+s) than returns some result set. By the same reason the following query with a subquery SELECT t1.a FROM t1 GROUP BY t1.a HAVING t1.a IN (SELECT t2.c FROM t2 GROUP BY t2.c HAVING AVG(SUM(t1.b)) > 20) is also acceptable.
[21 Dec 2005 9:56]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented in 5.0.18 and 5.1.3 changelogs. Closed.