Bug #64445 Different execution plan alter results with Loose Index Scan with Aggregate
Submitted: 24 Feb 2012 15:10 Modified: 26 Mar 2013 6:13
Reporter: Todd Mosier Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.5.21, 5.5.23 OS:Linux (Ubuntu 11.10)
Assigned to: CPU Architecture:Any
Tags: aggregate functions, count, distinct, index for group by, loose index scan, regression

[24 Feb 2012 15:10] Todd Mosier
Description:
The bug comes from this post: http://dba.stackexchange.com/questions/13817/mysql-5-5-strange-behavior-with-group-by-with...

We are counting the number of distinct values in a table based conditionally off of two columns' values.  If user_id=0, then we want the distinct to run off the ip column.  Otherwise, it should use the user_id column.  

This ran successfully in 5.1.47, using the index.  It also works in 5.5.19 when it uses the index instead of index_for_group-by, which can be checked by putting in a limit.

However, when using the index_for_group-by, the result set is invalid.

How to repeat:
This should be repeatable with random data in the columns.

mysql> show create table tm_tmp;
| Table  | Create Table
| tm_tmp | CREATE TABLE `tm_tmp` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `artist_id` int(11) NOT NULL DEFAULT '0',
  `user_id` int(11) NOT NULL DEFAULT '0',
  `ip` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `sp_artist_reach` (`artist_id`,`user_id`,`ip`),
) ENGINE=MyISAM AUTO_INCREMENT=46 DEFAULT CHARSET=latin1 |
1 row in set (0.00 sec)

mysql>  select count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by artist_id limit 11;
+-----------------------------------------------+
| count(DISTINCT(IF(user_id = 0, ip, user_id))) |
+-----------------------------------------------+
|                                             1 |
|                                             2 |
|                                             1 |
|                                             2 |
|                                             1 |
|                                             4 |
|                                             1 |
|                                             3 |
|                                             2 |
|                                             1 |
|                                             1 |
+-----------------------------------------------+
11 rows in set (0.00 sec)

mysql>  select count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by artist_id limit 12;
+-----------------------------------------------+
| count(DISTINCT(IF(user_id = 0, ip, user_id))) |
+-----------------------------------------------+
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
+-----------------------------------------------+
12 rows in set (0.00 sec)

mysql> explain extended  select          count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by artist_id limit 11;
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key             | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tm_tmp | index | NULL          | sp_artist_reach | 12      | NULL |   45 |    26.67 | Using index |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain extended  select          count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by artist_id limit 12;
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+--------------------------+
| id | select_type | table  | type  | possible_keys | key             | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | tm_tmp | range | NULL          | sp_artist_reach | 4       | NULL |   12 |   100.00 | Using index for group-by |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+--------------------------+

Suggested fix:
In this particular case, it can be solved by forcing mysql to use a derived table.  The performance is severely decreased, however.

SELECT count(DISTINCT(IF(user_id = 0, ip, user_id)))
FROM
(
 SELECT artist_id, user_id, ip
 FROM tm_tmp
 GROUP BY artist_id, user_id, ip
) summary
GROUP BY artist_id
LIMIT 12;

mysql> explain SELECT count(DISTINCT(IF(user_id = 0, ip, user_id))) FROM (  SELECT artist_id, user_id, ip  FROM tm_tmp  GROUP BY artist_id, user_id, ip ) summary GROUP BY artist_id LIMIT 11;
+----+-------------+------------+-------+---------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table      | type  | possible_keys | key             | key_len | ref  | rows | Extra                    |
+----+-------------+------------+-------+---------------+-----------------+---------+------+------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL            | NULL    | NULL |   21 | Using filesort           |
|  2 | DERIVED     | tm_tmp     | range | NULL          | sp_artist_reach | 12      | NULL |   23 | Using index for group-by |
+----+-------------+------------+-------+---------------+-----------------+---------+------+------+--------------------------+
[24 Feb 2012 15:17] Todd Mosier
It may be related to the following Worklog: http://forge.mysql.com/worklog/task.php?id=3220
[24 Feb 2012 17:20] Valeriy Kravchuk
Please, make sure the problem is still repeatable with a recent version, 5.5.21. If it does, dump of table data that demonstrates the problem would help to speed up processing.
[29 Feb 2012 19:41] Todd Mosier
dump of tm_tmp used to demonstrate the error.

Attachment: mysqldump_tm_tmp_20120229.sql (text/x-sql), 2.96 KiB.

[29 Feb 2012 19:42] Todd Mosier
dump of tm_tmp used to demonstrate the error.

Attachment: mysqldump_tm_tmp_20120229.sql (text/x-sql), 2.96 KiB.

[29 Feb 2012 19:44] Todd Mosier
I have recreated this on my local development environment, with 5.5.21. 

The bug still exists in the current release.
[1 Mar 2012 13:40] Valeriy Kravchuk
Verified with current mysql-5.5 using your .sql file uploaded:

mysql> select count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by
    -> artist_id limit 11;
+-----------------------------------------------+
| count(DISTINCT(IF(user_id = 0, ip, user_id))) |
+-----------------------------------------------+
|                                             1 |
|                                             2 |
|                                             1 |
|                                             2 |
|                                             1 |
|                                             4 |
|                                             1 |
|                                             3 |
|                                             2 |
|                                             1 |
|                                             1 |
+-----------------------------------------------+
11 rows in set (0.02 sec)

mysql> select count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by
    -> artist_id limit 12;
+-----------------------------------------------+
| count(DISTINCT(IF(user_id = 0, ip, user_id))) |
+-----------------------------------------------+
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
|                                             0 |
+-----------------------------------------------+
12 rows in set (0.00 sec)

mysql> explain extended  select          count(DISTINCT(IF(user_id = 0, ip, user_id)))
    -> from tm_tmp group by artist_id limit 11;
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key             | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tm_tmp | index | NULL          | sp_artist_reach | 12      | NULL |   45 |    26.67 | Using index |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain extended  select          count(DISTINCT(IF(user_id = 0, ip, user_id))) from tm_tmp group by artist_id limit 12;
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+--------------------------+
| id | select_type | table  | type  | possible_keys | key             | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | tm_tmp | range | NULL          | sp_artist_reach | 4       | NULL |   12 |   100.00 | Using index for group-by |
+----+-------------+--------+-------+---------------+-----------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.5.23-debug |
+--------------+
1 row in set (0.00 sec)
[25 Mar 2013 18:18] Paul DuBois
Noted in 5.5.27, 5.6.7, 5.7.0 changelogs.

COUNT(DISTINCT(IF ...)) could be evaluated incorrectly if the
optimizer used Loose Index Scan.