Bug #43092 Group by still optimizing out extra nulls
Submitted: 22 Feb 2009 21:09 Modified: 27 Feb 2009 18:08
Reporter: Brad Lanam Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.67-0ubuntu6 OS:Linux (Ubuntu 8.10)
Assigned to: CPU Architecture:Any
Tags: GROUP BY, null

[22 Feb 2009 21:09] Brad Lanam
Description:
The group by clause in the second query below is rolling up null values into a single row, even though only the first column is null, and the second column has many distinct values.

select count(*) from (
select
  c.custname,
  d.did final_did,
  sum(billable_time) billable_time,
  count(*) number_of_calls
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
left outer join usage_data ud
  on ud.final_did = d.did
  and ud.cust_id = 20
  and (ud.year = 2008 and ud.month = 12)
left outer join customers c
  on ud.cust_id = c.id
group by custname, final_did
) x;

+----------+
| count(*) |
+----------+
|       45 |    # wrong
+----------+

Workaround:

select count(*) from (
select custname, final_did,
  sum(billable_time) billable_time,
  sum(number_of_calls) number_of_calls
from (
select
  c.custname,
  d.did final_did,
  billable_time,
  1 number_of_calls
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
left outer join usage_data ud
  on ud.final_did = d.did
  and ud.cust_id = 20
  and (ud.year = 2008 and ud.month = 12)
left outer join customers c
  on ud.cust_id = c.id
) y
group by custname, final_did
) x;

+----------+
| count(*) |
+----------+
|      808 |     # correct
+----------+

select count(*) from (
select
  d.did final_did
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
) x;

+----------+
| count(*) |
+----------+
|      808 |     # correct
+----------+

select count(*) from (
select
  c.custname,
  d.did final_did,
  billable_time,
  1 number_of_calls
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
left outer join usage_data ud
  on ud.final_did = d.did
  and ud.cust_id = 20
  and (ud.year = 2008 and ud.month = 12)
left outer join customers c
  on ud.cust_id = c.id
) x;

+----------+
| count(*) |
+----------+
|      879 |  # don't know why this is low.
+----------+

How to repeat:
Sorry, I have not been able create a reproducable case.
The optimizer has too many dependencies on unique/non unique indexes, 
number of rows, etc.
[23 Feb 2009 7:59] Valeriy Kravchuk
Thank you for the problem report. Can you, at least, send the results of SHOW CREATE TABLE for all the tables used in that case where you noted this problem?
[23 Feb 2009 15:54] Brad Lanam
In 4.1.14:
   Query 1 (group by) works.
   Query 2 (derived) has the too many values problem (879 rows).
   Query 3 (basic select) works.
   Query 4 (left outer join) works.
In 5.0.26, 5.045:
   Query 1 (group by) has null rollup problem.
   Query 2 (derived) works.
   Query 3 (basic select) works.
   Query 4 (left outer join) has too many values.
Private message w/test tables to follow.
[23 Feb 2009 15:59] Brad Lanam
Table Creation script uploaded as:
   test_tables_43092.sql.gz
[23 Feb 2009 17:05] Valeriy Kravchuk
Sorry, but your script is incomplete. It does NOT have CREATE TABLE and data to demonstrate the problem. 

Please, check with a newer version, 5.0.77, and, in case of the same problem, send entire script to reproduce.
[23 Feb 2009 17:09] Brad Lanam
Please see table creation script uploaded to ftp site as:
   test_tables_43092.sql.gz
[27 Feb 2009 16:46] Valeriy Kravchuk
You are using ambiguous name to group by (you have column alias in SELECT and real column in one of the tables with this name). Look:

valeriy-kravchuks-macbook-pro:5.0 openxs$ bin/mysql -uroot test < test_tables_43092.sql 
valeriy-kravchuks-macbook-pro:5.0 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.79-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select count(*) from (
    -> select
    ->   c.custname,
    ->   d.did final_did,
    ->   sum(billable_time) billable_time,
    ->   count(*) number_of_calls
    -> from dids d
    -> inner join did_listings dl
    ->   on dl.did_id = d.id
    ->   and dl.cust_id = 20
    -> left outer join usage_data ud
    ->   on ud.final_did = d.did
    ->   and ud.cust_id = 20
    ->   and (ud.year = 2008 and ud.month = 12)
    -> left outer join customers c
    ->   on ud.cust_id = c.id
    -> group by custname, final_did
    -> ) x;
+----------+
| count(*) |
+----------+
|       45 | 
+----------+
1 row in set, 1 warning (0.18 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1052
Message: Column 'final_did' in group statement is ambiguous
1 row in set (0.00 sec)

mysql> select count(*) from ( select   c.custname,   d.did final_did,   sum(billable_time) billable_time,   count(*) number_of_calls from dids d inner join did_listings dl   on dl.did_id = d.id   and dl.cust_id = 20 left outer join usage_data ud   on ud.final_did = d.did   and ud.cust_id = 20   and (ud.year = 2008 and ud.month = 12) left outer join customers c   on ud.cust_id = c.id group by custname, d.did ) x;
+----------+
| count(*) |
+----------+
|      808 | 
+----------+
1 row in set (0.03 sec)

mysql> select count(*) from ( select   c.custname,   d.did final_did,   sum(billable_time) billable_time,   count(*) number_of_calls from dids d inner join did_listings dl   on dl.did_id = d.id   and dl.cust_id = 20 left outer join usage_data ud   on ud.final_did = d.did   and ud.cust_id = 20   and (ud.year = 2008 and ud.month = 12) left outer join customers c   on ud.cust_id = c.id group by custname, ud.final_did ) x;
+----------+
| count(*) |
+----------+
|       45 | 
+----------+
1 row in set (0.03 sec)

final_did column in d table has 808 different values, and it has 45 different values in u. The result depends on what table's column you are grouping by.

So, do not use alias in GROUP BY with the same name as one of the columns to get expected results. Please, check.
[27 Feb 2009 18:08] Brad Lanam
Confirmed.
Very embarassing.  Sorry to trouble you with this.
I'll try to pay attention to warnings in the future.

Should an ambiguous group by be an error rather than a warning?