Bug #21 DISTINCT is not taken in account for index Selection (RightNow)
Submitted: 5 Jan 2003 11:18 Modified: 15 Dec 2003 15:05
Reporter: Peter Zaitsev (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0 OS:Linux (Linux)
Assigned to: Michael Widenius CPU Architecture:Any

[5 Jan 2003 11:18] Peter Zaitsev
Description:
I belive I found a bug in the optimizer. Let me illustrate with the following example: 

The following query is run nightly to clean up each of our databases: 

SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON i.sessionid = c.sessionid WHERE c.timestamp < '2002-12-05 01:05:08' AND c.interface_id = 1 AND (i.status NOT IN (1,9,8) OR i.status IS NULL);

It ran for hours when it usally only takes a couple mintes. 

Here is the explain, notice it isn't using any indexes, I tried forcing it to use an index with the use index syntax with no change. I also tried optimizing the tables with no change:

explain SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON i.sessionid = c.sessionid WHERE c.timestamp < '2002-12-05 01:05:08' AND c.interface_id = 1 AND (i.status NOT IN (1,9,8) OR i.status IS NULL);

+-------+------+----------------------+------+---------+------+---------+-----------------------------+ 
| table | type | possible_keys        | key  | key_len | ref  | rows    | Extra                       | 
+-------+------+----------------------+------+---------+------+---------+-----------------------------+ 
| c     | ALL  | clicktrack$timestamp | NULL |    NULL | NULL | 1491872 | where used; Using temporary | 
| i     | ALL  | incidents$sessionid  | NULL |    NULL | NULL |    1461 | where used; Distinct        | 
+-------+------+----------------------+------+---------+------+---------+-----------------------------+ 
2 rows in set (0.00 sec) 

Here are the indexes on the incidents table, notice the cardinality value for incidents$sessionid, it currently 1. There are only two values in this column NULLs and a 1 unique session_id. It's not typical for most of our customers to have only one sessionid so I decided to add one more (so a total of two rows now have a session_id)

mysql> show index from incidents; 
+-----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+---------+

| Table     | Non_unique | Key_name              | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment |

+-----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+---------+

| incidents |          0 | incidents$ref_no      |            1 | ref_no      | A         |        1461 |     NULL | NULL   |         |

| incidents |          1 | incidents$status      |            1 | status      | A         |           4 |     NULL | NULL   |         |

| incidents |          1 | incidents$assigned    |            1 | assigned    | A         |          54 |     NULL | NULL   |         |

| incidents |          1 | incidents$customer_id |            1 | customer_id | A         |         132 |     NULL | NULL   |         |

| incidents |          1 | incidents$escldate    |            1 | escldate    | A         |           1 |     NULL | NULL   |         |

| incidents |          1 | incidents$sessionid   |            1 | sessionid   | A         |           1 |     NULL | NULL   |         |

+-----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+---------+

6 rows in set (0.00 sec) 

update incidents set sessionid='zxyyHFe' where ref_no='001110-000006'; 

optimize table incidents; 

Now Notice the cardinality value is 3. 

mysql> show index from incidents; 
+-----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+---------+

| Table     | Non_unique | Key_name              | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment |

+-----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+---------+

| incidents |          0 | incidents$ref_no      |            1 | ref_no      | A         |        1461 |     NULL | NULL   |         |

| incidents |          1 | incidents$status      |            1 | status      | A         |           4 |     NULL | NULL   |         |

| incidents |          1 | incidents$assigned    |            1 | assigned    | A         |          54 |     NULL | NULL   |         |

| incidents |          1 | incidents$customer_id |            1 | customer_id | A         |         132 |     NULL | NULL   |         |

| incidents |          1 | incidents$escldate    |            1 | escldate    | A         |           1 |     NULL | NULL   |         |

| incidents |          1 | incidents$sessionid   |            1 | sessionid   | A         |           3 |     NULL | NULL   |         |

+-----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+---------+

Also notice that in now uses an index. When I run this query now it takes only 1 minute to run. 

mysql> desc SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON i.sessionid = c.sessionid WHERE c.timestamp < '2002-12-05 01:05:08' AND c.interface_id = 1 AND (i.status NOT IN (1,9,8) OR i.status IS NULL);

+-------+------+----------------------+---------------------+---------+-------------+---------+-----------------------------+

| table | type | possible_keys        | key                 | key_len | ref         | rows    | Extra                       |

+-------+------+----------------------+---------------------+---------+-------------+---------+-----------------------------+

| c     | ALL  | clicktrack$timestamp | NULL                |    NULL | NULL        | 1491864 | where used; Using temporary |

| i     | ref  | incidents$sessionid  | incidents$sessionid |      12 | c.sessionid |     487 | where used; Distinct        |

+-------+------+----------------------+---------------------+---------+-------------+---------+-----------------------------+

2 rows in set (0.00 sec) 

I can easily get you the tables involved if you would like to do more testing. It appears to me that the optimizer  incorrectly decides not to use an index if the cardinality is too low.

On an unrelated note I would like to no why indexes don't ever seem to be used with columns of type datetime when using '<' '>' and between oporators are used.

How to repeat:
CREATE TABLE sessions (
  id int(10) unsigned NOT NULL auto_increment,
  last_update timestamp(14) NOT NULL,
  PRIMARY KEY  (id)
) TYPE=MyISAM;

--
-- Dumping data for table 'sessions'
--

INSERT INTO sessions VALUES 
(1,20030105122040),(2,20030105122040),(3,20030105122040),(4,20030105122040),(5,20030105122040),(6,20030105122040),(7,20030105122040),(8,20030105122040),(9,20030105122040),(10,20030105122040),(11,20030105122040),(12,20030105122040),(13,20030105122040),(14,20030105122040),(15,20030105122040),(16,20030105122040),(17,20030105122040),(18,20030105122040),(19,20030105122040),(20,20030105122040),(21,20030105122040),(22,20030105122040),(23,20030105122040),(24,20030105122040),(25,20030105122040),(26,20030105122040),(27,20030105122040),(28,20030105122040),(29,20030105122040),(30,20030105122040),(31,20030105122040),(32,20030105122040),(33,20030105122040),(34,20030105122040),(35,20030105122040),(36,20030105122040),(37,20030105122040),(38,20030105122040);

CREATE TABLE session_data (
  session_id int(10) unsigned NOT NULL default '0',
  var char(20) NOT NULL default '',
  KEY session_id (session_id)
) TYPE=MyISAM;

--
-- Dumping data for table 'session_data'
--

INSERT INTO session_data VALUES 
(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a'),(1,'a');

explain select distinct id from sessions left outer join 
session_data on id=session_id where last_update<now() and 
(var="a" or var is null);

Suggested fix:
Workaround:

add index (session_id,var) to session_data table to change access type to the table to index covered.
[9 Jan 2003 23:23] MySQL Developer
Thank you for your bug report. This issue has already been fixed
in the latest released version of that product, which you can download at 
http://www.mysql.com/downloads/

This particular case (search on NULL) is fixed in 4.0.9
[22 Jan 2017 2:13] ali abani ousmane
exxx

Attachment: 21.xlsx (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, text), 18.62 KiB.