MySQL Bugs: #9523: abuse of index in simple query

Bug #9523	abuse of index in simple query
Submitted:	31 Mar 2005 13:15	Modified:	14 Jan 2010 16:40
Reporter:	Vadim Tkachenko	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.0.9-beta, 4.1.10a	OS:	Windows (Windows XP)
Assigned to:		CPU Architecture:	Any

Description:
I have two tables:

CREATE TABLE `t1` (
   `ID` int(11) default NULL,
   `VAL` varchar(5) default NULL,
   KEY `VAL` (`VAL`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|   250001 |
+----------+

CREATE TABLE `t2` (
   `ID` int(11) NOT NULL auto_increment,
   `NAME` varchar(255) default NULL,
   PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
|   50001 |
+----------+

89.9% rows in t1 contain val='val1'

SELECT c2/c1
FROM (SELECT COUNT(*) c1 FROM t1) d1,
 (SELECT COUNT(*) c2 FROM t1 WHERE val='val1') d2;

0.89993

Query:
SELECT COUNT(DISTINCT t2.id)
FROM  t2 JOIN
      t1 USING(id)
WHERE t1.val='val1';
uses index VAL:

EXPLAIN SELECT COUNT(DISTINCT t2.id) FROM t2 JOIN t1 USING(id) WHERE 
t1.val='val1';

*************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t1
          type: ref
possible_keys: VAL
           key: VAL
       key_len: 6
           ref: const
          rows: 107131
         Extra: Using where
*************************** 2. row ***************************
            id: 1
   select_type: SIMPLE
         table: t2
          type: eq_ref
possible_keys: PRIMARY
           key: PRIMARY
       key_len: 4
           ref: test.t1.ID
          rows: 1
         Extra: Using index
2 rows in set (0.00 sec)

Time of execution:
SELECT COUNT(DISTINCT t2.id) FROM t2 JOIN t1 USING(id) WHERE 
t1.val='val1';
(2453 ms taken)

Time of execution without index:
SELECT COUNT(DISTINCT t2.id) FROM t2 JOIN t1 IGNORE INDEX (VAL) USING(id) WHERE t1.val='val1';
(1250 ms taken)

How to repeat:
CREATE TABLE `t1` (
   `ID` int(11) default NULL,
   `VAL` varchar(5) default NULL,
   KEY `VAL` (`VAL`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `t2` (
   `ID` int(11) NOT NULL auto_increment,
   `NAME` varchar(255) default NULL,
   PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Scripts for t1 & t2 data:

<?

for ($i=0; $i <= 50000; $i++)
{
 echo "INSERT INTO t2 (name) VALUES ('test".md5(rand(1,5))."');\n";
}

?>

<?

echo "TRUNCATE TABLE t1;\n";

for ($i=0; $i <= 250000; $i++)
{
 echo "INSERT INTO t1 (id,val) VALUES (".rand(1,50000).",'".(rand(1,10)==10?'val2':'val1')."');\n";
}

echo "ANALYZE TABLE t1;\n";
echo "OPTIMIZE TABLE t1;\n";

?>

EXPLAIN SELECT COUNT(DISTINCT t2.id) FROM t2 JOIN t1 USING(id) WHERE 
t1.val='val1';

Employing a primitive cost model the optimizer sometimes selects not the fastest execution plan.

I checked optimizer behavior with extremally simple test case:
CREATE TABLE `t2` (
`ID` int(11) default NULL,
`ID1` int(11) default NULL,
`SUBNAME` varchar(32) default NULL,
KEY `ID1` (`ID1`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

SELECT cnt2 / cnt1 FROM (SELECT count(*) cnt1 FROM t2) d1, (SELECT count(*) cnt2 FROM t2 WHERE ID1=1) d2;
0.9492 = 94.92%

mysql> EXPLAIN SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
| 1 | SIMPLE | t2 | ref | ID1 | ID1 | 5 | const | 81371 | Using where |
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+

i.e. optimizer wants to use index.

execution time:
SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1 - 1200 ms
SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1 - 260 ms

I think, whatever primitive optimizer we have, it should make right choice in this extremally simple case.

In case with 99% (for table with 1 000 000 rows) selectivity
mysql> select c1/c2 from (select count(*) c1 from test.t2 where ID1=1) t1, (select count(*) c2 from test.t2) t2;
+--------+
| c1/c2  |
+--------+
| 0.9901 |
+--------+
still
mysql> explain select count(SUBNAME) from test.t2 where ID1=1\G                                                 
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: ID1
          key: ID1
      key_len: 5
          ref: const
         rows: 918909
        Extra: Using where

In case with 100% selectivity tablescan is used.

With recent mysql-6.0-codebase we have:

77-52-7-73:6.0-codebase openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.14-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE `t2` (
    -> `ID` int(11) default NULL,
    -> `ID1` int(11) default NULL,
    -> `SUBNAME` varchar(32) default NULL,
    -> KEY `ID1` (`ID1`)
    -> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into t2 values(rand()*10000, 1, '');
Query OK, 1 row affected (0.38 sec)

mysql> insert into t2 select rand()*10000, 1, '' from t2
    -> ;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, '' from t2;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, '' from t2;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, NULL from t2;
Query OK, 8 rows affected (0.02 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, NULL from t2;
Query OK, 16 rows affected (0.01 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, NULL from t2;
Query OK, 32 rows affected (0.00 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, NULL from t2;
Query OK, 64 rows affected (0.01 sec)
Records: 64  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, NULL from t2;
Query OK, 128 rows affected (0.01 sec)
Records: 128  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 2, 'a' from t2;
Query OK, 256 rows affected (0.01 sec)
Records: 256  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, 'a' from t2;
Query OK, 512 rows affected (0.02 sec)
Records: 512  Duplicates: 0  Warnings: 0

mysql> SELECT cnt2 / cnt1 FROM (SELECT count(*) cnt1 FROM t2) d1, (SELECT count(*) cnt2 FROM t2
    -> WHERE ID1=1) d2;
+-------------+
| cnt2 / cnt1 |
+-------------+
|      0.7500 |
+-------------+
1 row in set (0.02 sec)

mysql> EXPLAIN SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
|  1 | SIMPLE      | t2    | ref  | ID1           | ID1  | 5       | const |  829 |       |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
1 row in set (0.36 sec)

So, with 75% selectivity index is used. And maybe this is bad:

mysql> SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
+----------------+
| COUNT(SUBNAME) |
+----------------+
|            520 |
+----------------+
1 row in set (0.01 sec)

mysql> SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1;
+----------------+
| COUNT(SUBNAME) |
+----------------+
|            520 |
+----------------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
+----------------+
| COUNT(SUBNAME) |
+----------------+
|            520 |
+----------------+
1 row in set (0.01 sec)

mysql> insert into t2 select rand()*10000, 1, NULL from t2;Query OK, 1024 rows affected (0.05 sec)
Records: 1024  Duplicates: 0  Warnings: 0

mysql> insert into t2 select rand()*10000, 1, NULL from t2;Query OK, 2048 rows affected (0.11 sec)
Records: 2048  Duplicates: 0  Warnings: 0

mysql> SELECT cnt2 / cnt1 FROM (SELECT count(*) cnt1 FROM t2) d1, (SELECT count(*) cnt2 FROM t2 WHERE ID1=1) d2;
+-------------+
| cnt2 / cnt1 |
+-------------+
|      0.9375 |
+-------------+
1 row in set (0.01 sec)

mysql> insert into t2 select rand()*10000, 1, NULL from t2;Query OK, 4096 rows affected (0.24 sec)
Records: 4096  Duplicates: 0  Warnings: 0

mysql> SELECT cnt2 / cnt1 FROM (SELECT count(*) cnt1 FROM t2) d1, (SELECT count(*) cnt2 FROM t2 WHERE ID1=1) d2;
+-------------+
| cnt2 / cnt1 |
+-------------+
|      0.9688 |
+-------------+
1 row in set (0.03 sec)

mysql> EXPLAIN SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t2    | ALL  | ID1           | NULL | NULL    | NULL | 8192 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

but when selectivity is 96.88% index is NOT used. Can we consider the problem solved?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".