MySQL Bugs: #43092: Group by still optimizing out extra nulls

Bug #43092	Group by still optimizing out extra nulls
Submitted:	22 Feb 2009 21:09	Modified:	27 Feb 2009 18:08
Reporter:	Brad Lanam	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.0.67-0ubuntu6	OS:	Linux (Ubuntu 8.10)
Assigned to:		CPU Architecture:	Any
Tags:	GROUP BY, null

Description:
The group by clause in the second query below is rolling up null values into a single row, even though only the first column is null, and the second column has many distinct values.

select count(*) from (
select
  c.custname,
  d.did final_did,
  sum(billable_time) billable_time,
  count(*) number_of_calls
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
left outer join usage_data ud
  on ud.final_did = d.did
  and ud.cust_id = 20
  and (ud.year = 2008 and ud.month = 12)
left outer join customers c
  on ud.cust_id = c.id
group by custname, final_did
) x;

+----------+
| count(*) |
+----------+
|       45 |    # wrong
+----------+

Workaround:

select count(*) from (
select custname, final_did,
  sum(billable_time) billable_time,
  sum(number_of_calls) number_of_calls
from (
select
  c.custname,
  d.did final_did,
  billable_time,
  1 number_of_calls
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
left outer join usage_data ud
  on ud.final_did = d.did
  and ud.cust_id = 20
  and (ud.year = 2008 and ud.month = 12)
left outer join customers c
  on ud.cust_id = c.id
) y
group by custname, final_did
) x;

+----------+
| count(*) |
+----------+
|      808 |     # correct
+----------+

select count(*) from (
select
  d.did final_did
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
) x;

+----------+
| count(*) |
+----------+
|      808 |     # correct
+----------+

select count(*) from (
select
  c.custname,
  d.did final_did,
  billable_time,
  1 number_of_calls
from dids d
inner join did_listings dl
  on dl.did_id = d.id
  and dl.cust_id = 20
left outer join usage_data ud
  on ud.final_did = d.did
  and ud.cust_id = 20
  and (ud.year = 2008 and ud.month = 12)
left outer join customers c
  on ud.cust_id = c.id
) x;

+----------+
| count(*) |
+----------+
|      879 |  # don't know why this is low.
+----------+

How to repeat:
Sorry, I have not been able create a reproducable case.
The optimizer has too many dependencies on unique/non unique indexes, 
number of rows, etc.

Thank you for the problem report. Can you, at least, send the results of SHOW CREATE TABLE for all the tables used in that case where you noted this problem?

In 4.1.14:
   Query 1 (group by) works.
   Query 2 (derived) has the too many values problem (879 rows).
   Query 3 (basic select) works.
   Query 4 (left outer join) works.
In 5.0.26, 5.045:
   Query 1 (group by) has null rollup problem.
   Query 2 (derived) works.
   Query 3 (basic select) works.
   Query 4 (left outer join) has too many values.
Private message w/test tables to follow.

Table Creation script uploaded as:
   test_tables_43092.sql.gz

Sorry, but your script is incomplete. It does NOT have CREATE TABLE and data to demonstrate the problem. 

Please, check with a newer version, 5.0.77, and, in case of the same problem, send entire script to reproduce.

Please see table creation script uploaded to ftp site as:
   test_tables_43092.sql.gz

You are using ambiguous name to group by (you have column alias in SELECT and real column in one of the tables with this name). Look:

valeriy-kravchuks-macbook-pro:5.0 openxs$ bin/mysql -uroot test < test_tables_43092.sql 
valeriy-kravchuks-macbook-pro:5.0 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.79-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select count(*) from (
    -> select
    ->   c.custname,
    ->   d.did final_did,
    ->   sum(billable_time) billable_time,
    ->   count(*) number_of_calls
    -> from dids d
    -> inner join did_listings dl
    ->   on dl.did_id = d.id
    ->   and dl.cust_id = 20
    -> left outer join usage_data ud
    ->   on ud.final_did = d.did
    ->   and ud.cust_id = 20
    ->   and (ud.year = 2008 and ud.month = 12)
    -> left outer join customers c
    ->   on ud.cust_id = c.id
    -> group by custname, final_did
    -> ) x;
+----------+
| count(*) |
+----------+
|       45 | 
+----------+
1 row in set, 1 warning (0.18 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1052
Message: Column 'final_did' in group statement is ambiguous
1 row in set (0.00 sec)

mysql> select count(*) from ( select   c.custname,   d.did final_did,   sum(billable_time) billable_time,   count(*) number_of_calls from dids d inner join did_listings dl   on dl.did_id = d.id   and dl.cust_id = 20 left outer join usage_data ud   on ud.final_did = d.did   and ud.cust_id = 20   and (ud.year = 2008 and ud.month = 12) left outer join customers c   on ud.cust_id = c.id group by custname, d.did ) x;
+----------+
| count(*) |
+----------+
|      808 | 
+----------+
1 row in set (0.03 sec)

mysql> select count(*) from ( select   c.custname,   d.did final_did,   sum(billable_time) billable_time,   count(*) number_of_calls from dids d inner join did_listings dl   on dl.did_id = d.id   and dl.cust_id = 20 left outer join usage_data ud   on ud.final_did = d.did   and ud.cust_id = 20   and (ud.year = 2008 and ud.month = 12) left outer join customers c   on ud.cust_id = c.id group by custname, ud.final_did ) x;
+----------+
| count(*) |
+----------+
|       45 | 
+----------+
1 row in set (0.03 sec)

final_did column in d table has 808 different values, and it has 45 different values in u. The result depends on what table's column you are grouping by.

So, do not use alias in GROUP BY with the same name as one of the columns to get expected results. Please, check.

Confirmed.
Very embarassing.  Sorry to trouble you with this.
I'll try to pay attention to warnings in the future.

Should an ambiguous group by be an error rather than a warning?