Bug #3199 SELECT COUNT(distinct null_values) returns wrong number
Submitted: 17 Mar 2004 3:40 Modified: 18 Mar 2004 1:30
Reporter: Peter Valkov Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.18 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[17 Mar 2004 3:40] Peter Valkov
Description:
When I try to use COUNT(distinct <field-lst>) and there is a field in a list with null values the result do not count nulls. Table type is InnoDB. Here is an example:

mysql>
mysql> select version();
+--------------+
| version()    |
+--------------+
| 4.0.18-Max   |
+--------------+
1 row in set (0.00 sec)

mysql> select * from test;
+----+------+------------+------+
| id | name | familyName | age  |
+----+------+------------+------+
|  1 | John | Smith      |   33 |
|  2 | John | Doe        | NULL |
|  3 | Jane | Smith      | NULL |
|  4 | Jane | Doe        | NULL |
+----+------+------------+------+
4 rows in set (0.00 sec)

mysql> select count(distinct name) from test;
+----------------------+
| count(distinct name) |
+----------------------+
|                    2 | OK
+----------------------+
1 row in set (0.00 sec)

mysql> select count(distinct name, familyName) from test;
+----------------------------------+
| count(distinct name, familyName) |
+----------------------------------+
|                                4 | OK
+----------------------------------+
1 row in set (0.00 sec)

mysql> select count(distinct name, age) from test;
+---------------------------+
| count(distinct name, age) |
+---------------------------+
|                         1 | HAVE TO BE 3 (in my opinion)
+---------------------------+
1 row in set (0.00 sec)

mysql> select count(distinct name, familyName, age) from test;
+---------------------------------------+
| count(distinct name, familyName, age) |
+---------------------------------------+
|                                     1 | HAVE TO BE 4 (in my opinion)
+---------------------------------------+
1 row in set (0.00 sec)

How to repeat:
#
# Table structure for table 'test'
#

CREATE TABLE `test` (
  `id` int(3) unsigned NOT NULL auto_increment,
  `name` varchar(100) default NULL,
  `familyName` varchar(100) default NULL,
  `age` int(10) unsigned default NULL,
  PRIMARY KEY  (`id`)
) TYPE=InnoDB;

#
# Dumping data for table 'test'
#
INSERT INTO test VALUES(1,"John","Smith",33);
INSERT INTO test VALUES(2,"John","Doe",NULL);
INSERT INTO test VALUES(3,"Jane","Smith",NULL);
INSERT INTO test VALUES(4,"Jane","Doe",NULL);
[17 Mar 2004 15:13] Dean Ellis
Changing the category (not specific to a storage engine).  COUNT() disposes of NULLs, but whether it should dispose of them in multi-column cases where only some of the columns are NULL is arguable.

ie: I think that "2" is the logical return value for:

SELECT COUNT(DISTINCT name, age) FROM test;

with the provided data.

There is at least documentation work to do here, so I will Verify this and let other discussion determine whether or not the behavior should change.
[18 Mar 2004 1:30] Sergei Golubchik
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

Manual defines our nonstandard extension COUNT(DISTINCT a,b,c,...)
as being equivalent to standard COUNT(DISTINCT CONCAT(a,b,c,...))

COUNT(a,b,...,NULL) is NULL
and COUNT(DISTINCT expr) ignores NULLs, as it should.
[18 Mar 2004 2:09] Peter Valkov
Let me describe my logic. If I rum following query:

mysql> select distinct name, age from test;
+------+------+
| name | age  |
+------+------+
| John |   33 |
| John | NULL |
| Jane | NULL |
+------+------+
3 rows in set (0.01 sec)

the result is 3 rows. Then I run 

mysql> select count(distinct name,age) from test;
+--------------------------+
| count(distinct name,age) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.00 sec)

and the result is 1 ... little bit confusing. Don't you think?

So it may be not a bug according to documentation but it worth to consider is this behavior is OK.

Of course there is workaround:

mysql> select count(distinct name, ifnull(age,0)) from test; 

... but in case name is NULL as well I have to rewrite it like:

mysql> select count(distinct ifnull(name,''), ifnull(age,0)) from test;

and same for every column. This according to me is a little bit annoying.