Bug #24484 Aggregate function used in column list subquery gives erroneous error
Submitted: 21 Nov 2006 20:27 Modified: 7 Apr 2007 19:10
Reporter: Harrison Fisk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.28 OS:MacOS (Mac OS X)
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: bfsm_2006_12_07

[21 Nov 2006 20:27] Harrison Fisk
Description:
When you use an aggregate in a subquery in the column list, you get an error stating that it returns too many rows ( ERROR 1242 (21000): Subquery returns more than 1 row. )

The subquery should not return more than 1 row, however the aggregate function isn't evaluated properly and it is returning a lot of matches instead.

I believe this bug (and fix) might be related to http://bugs.mysql.com/bug.php?id=12762 

How to repeat:
This test cases demonstrates the problem and evidence of the cause:

CREATE TABLE t_1 (a int, b INT, c CHAR(10) NOT NULL, PRIMARY KEY (a, b));
INSERT INTO t_1 VALUES (1,1,'a'), (1,2,'b'), (1,3,'c'), (1,4,'d'), (1,5,'e'), (2,1,'f'), (2,2,'g'), (2,3,'h'), (3,4,'i'),(3,3,'j'), (3,2,'k'), (3,1,'l'), (1,9,'m');

-- Gives error, but should work since it is (a, b) is the PK so only one given match possible
SELECT a, MAX(b), (SELECT t.c FROM t_1 AS t WHERE t_1.a=t.a AND t.b=MAX(t_1.b)) as test FROM t_1 GROUP BY a;

-- LIMIT 1 works
SELECT a, MAX(b), (SELECT t.c FROM t_1 AS t WHERE t_1.a=t.a AND t.b=MAX(t_1.b) LIMIT 1) as test FROM t_1 GROUP BY a;

-- LIMIT 1, 1 works, but has a different 'test' column, which shows multiple results
SELECT a, MAX(b), (SELECT t.c FROM t_1 AS t WHERE t_1.a=t.a AND t.b=MAX(t_1.b) LIMIT 1,1) as test FROM t_1 GROUP BY a;

-- LIMIT 2, 1 works, but has a different 'test' column, which shows multiple results
SELECT a, MAX(b), (SELECT t.c FROM t_1 AS t WHERE t_1.a=t.a AND t.b=MAX(t_1.b) LIMIT 2,1) as test FROM t_1 GROUP BY a;

Suggested fix:
It is possible to work around this by using a subquery in the FROM clause instead and then joining against the parent query.
[2 Feb 2007 17:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19260

ChangeSet@1.2400, 2007-02-02 19:29:03+02:00, gkodinov@macbook.gmz +5 -0
  Bug #24484: Aggregate function used in column list subquery gives erroneous
              error
  Most of the aggregate functions depend on the number of rows they process, 
  even if their arguments are constants (e.g. SUM(1) cannot be calculated 
  without knowing the number of rows that it is called for).
  There are however two exceptions : MIN and MAX. These depend only on their
  arguments : so if they are constant, them MIN and MAX can be considered
  constants. In that respect MIN and MAX are just as any other scalar SQL 
  function.
  However they were erroneously treated as the other aggregates by the
  optimizer.
  Fixed by introducing in Item_sum_hybrid (the superclass of MIN and MAX)
  the same calculation for used_tables as for Item_func.
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[6 Feb 2007 14:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19398

ChangeSet@1.2400, 2007-02-06 16:29:05+02:00, gkodinov@macbook.gmz +6 -0
  Bug #24484:
  Most of the aggregate functions depend on the number of rows they process, 
  even if their arguments are constants (e.g. SUM(1) cannot be calculated 
  without knowing the number of rows that it is called for).
  There are however two exceptions : MIN and MAX. These depend only on their
  arguments : so if they are constant, them MIN and MAX can be considered
  constants. In that respect MIN and MAX are just as any other scalar SQL 
  function.
  However they were erroneously treated as the other aggregates by the
  optimizer.
  Fixed by introducing in Item_sum_hybrid (the superclass of MIN and MAX)
  the same calculation for used_tables as for Item_func.
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
  Also used_tables() for other aggregates now excludes the special
  "non-table" bits (like OUTER_REF_TABLE_BIT for example).
[8 Feb 2007 16:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19570

ChangeSet@1.2400, 2007-02-08 18:02:10+02:00, gkodinov@macbook.gmz +6 -0
  Bug #24484:
  To correctly decide which predicates can be evaluated with a given table
  the optimizer must know the exact set of tables that a predicate depends 
  on. If that mask is too wide (refer to non-existing tables) the optimizer
  can erroneously skip a predicate.
  One such case of wrong table usage mask were the aggregate functions.
  The have a all-1 mask (meaning depend on all tables, including non-existent
  ones).
  Fixed by making a real used_tables mask for the aggregates. The mask is
  constructed in the following way :
  1. OR the table dependency masks of all the arguments of the aggregate.
  2. If all the arguments of the function are from the local name resolution 
    context; it's not a MAX/MIN and it is evaluated in the same name resolution
    context where it is referenced all the tables from that name resolution 
    context are OR-ed to the dependency mask. This is to denote that an
    aggregate function depends on the number of rows it processes.
  3. Handle correctly the case of an aggregate function optimization (such that
    the aggregate function can be pre-calculated and made a constant).
  
  Made sure that an aggregate function is never a constant (unless subject of a 
  specific optimization and pre-calculation).  
  
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[23 Feb 2007 10:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/20446

ChangeSet@1.2400, 2007-02-23 12:31:35+02:00, gkodinov@macbook.gmz +6 -0
  Bug #24484:
  To correctly decide which predicates can be evaluated with a given table
  the optimizer must know the exact set of tables that a predicate depends 
  on. If that mask is too wide (refer to non-existing tables) the optimizer
  can erroneously skip a predicate.
  One such case of wrong table usage mask were the aggregate functions.
  The have a all-1 mask (meaning depend on all tables, including non-existent
  ones).
  Fixed by making a real used_tables mask for the aggregates. The mask is
  constructed in the following way :
  1. OR the table dependency masks of all the arguments of the aggregate.
  2. If all the arguments of the function are from the local name resolution 
    context; it's not a MAX/MIN and it is evaluated in the same name resolution
    context where it is referenced all the tables from that name resolution 
    context are OR-ed to the dependency mask. This is to denote that an
    aggregate function depends on the number of rows it processes.
  3. Handle correctly the case of an aggregate function optimization (such that
    the aggregate function can be pre-calculated and made a constant).
  
  Made sure that an aggregate function is never a constant (unless subject of a 
  specific optimization and pre-calculation).  
  
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[9 Mar 2007 17:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21625

ChangeSet@1.2400, 2007-03-09 17:57:51+02:00, gkodinov@macbook.gmz +12 -0
  Bug #24484:
  To correctly decide which predicates can be evaluated with a given table
  the optimizer must know the exact set of tables that a predicate depends 
  on. If that mask is too wide (refer to non-existing tables) the optimizer
  can erroneously skip a predicate.
  One such case of wrong table usage mask were the aggregate functions.
  The have a all-1 mask (meaning depend on all tables, including non-existent
  ones).
  Fixed by making a real used_tables mask for the aggregates. The mask is
  constructed in the following way :
  1. OR the table dependency masks of all the arguments of the aggregate.
  2. If all the arguments of the function are from the local name resolution 
    context; it's not a MAX/MIN and it is evaluated in the same name resolution
    context where it is referenced all the tables from that name resolution 
    context are OR-ed to the dependency mask. This is to denote that an
    aggregate function depends on the number of rows it processes.
  3. Handle correctly the case of an aggregate function optimization (such that
    the aggregate function can be pre-calculated and made a constant).
  
  Made sure that an aggregate function is never a constant (unless subject of a 
  specific optimization and pre-calculation).  
  
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[13 Mar 2007 16:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21822

ChangeSet@1.2400, 2007-03-13 16:44:19+02:00, gkodinov@macbook.gmz +10 -0
  Bug #24484:
  To correctly decide which predicates can be evaluated with a given table
  the optimizer must know the exact set of tables that a predicate depends 
  on. If that mask is too wide (refer to non-existing tables) the optimizer
  can erroneously skip a predicate.
  One such case of wrong table usage mask were the aggregate functions.
  The have a all-1 mask (meaning depend on all tables, including non-existent
  ones).
  Fixed by making a real used_tables mask for the aggregates. The mask is
  constructed in the following way :
  1. OR the table dependency masks of all the arguments of the aggregate.
  2. If all the arguments of the function are from the local name resolution 
    context and it is evaluated in the same name resolution
    context where it is referenced all the tables from that name resolution 
    context are OR-ed to the dependency mask. This is to denote that an
    aggregate function depends on the number of rows it processes.
  3. Handle correctly the case of an aggregate function optimization (such that
    the aggregate function can be pre-calculated and made a constant).
  
  Made sure that an aggregate function is never a constant (unless subject of a 
  specific optimization and pre-calculation).  
  
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[14 Mar 2007 17:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21909

ChangeSet@1.2400, 2007-03-14 15:52:27+02:00, gkodinov@macbook.gmz +9 -0
  Bug #24484:
  To correctly decide which predicates can be evaluated with a given table
  the optimizer must know the exact set of tables that a predicate depends 
  on. If that mask is too wide (refer to non-existing tables) the optimizer
  can erroneously skip a predicate.
  One such case of wrong table usage mask were the aggregate functions.
  The have a all-1 mask (meaning depend on all tables, including non-existent
  ones).
  Fixed by making a real used_tables mask for the aggregates. The mask is
  constructed in the following way :
  1. OR the table dependency masks of all the arguments of the aggregate.
  2. If all the arguments of the function are from the local name resolution 
    context and it is evaluated in the same name resolution
    context where it is referenced all the tables from that name resolution 
    context are OR-ed to the dependency mask. This is to denote that an
    aggregate function depends on the number of rows it processes.
  3. Handle correctly the case of an aggregate function optimization (such that
    the aggregate function can be pre-calculated and made a constant).
  
  Made sure that an aggregate function is never a constant (unless subject of a 
  specific optimization and pre-calculation).  
  
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[16 Mar 2007 8:27] Georgi Kodinov
Moving back to in-progress due to merge problems.
[20 Mar 2007 17:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/22395

ChangeSet@1.2400, 2007-03-20 19:46:02+02:00, gkodinov@macbook.local +10 -0
  Bug #24484:
  To correctly decide which predicates can be evaluated with a given table
  the optimizer must know the exact set of tables that a predicate depends 
  on. If that mask is too wide (refer to non-existing tables) the optimizer
  can erroneously skip a predicate.
  One such case of wrong table usage mask were the aggregate functions.
  The have a all-1 mask (meaning depend on all tables, including non-existent
  ones).
  Fixed by making a real used_tables mask for the aggregates. The mask is
  constructed in the following way :
  1. OR the table dependency masks of all the arguments of the aggregate.
  2. If all the arguments of the function are from the local name resolution 
    context and it is evaluated in the same name resolution
    context where it is referenced all the tables from that name resolution 
    context are OR-ed to the dependency mask. This is to denote that an
    aggregate function depends on the number of rows it processes.
  3. Handle correctly the case of an aggregate function optimization (such that
    the aggregate function can be pre-calculated and made a constant).
  
  Made sure that an aggregate function is never a constant (unless subject of a 
  specific optimization and pre-calculation).  
  
  One other flaw was revealed and fixed in the process : references were 
  not calling the recalculation method for used_tables of their targets.
[23 Mar 2007 13:59] Alexey Botchkov
Pushed in 5.0.40 and 5.1.18
[7 Apr 2007 19:10] Paul DuBois
Noted in 5.0.40, 5.1.18 changelogs.

A problem in handling of aggregate functions in subqueries caused
predicates containing aggregate functions to be ignored during query
execution.