Bug #78407 wrong index being used in subqueries?
Submitted: 11 Sep 2015 12:45 Modified: 29 Apr 2022 5:02
Reporter: Jay Jay Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:7.4.6/5.6.24, 5.7, 8.0 OS:Linux
Assigned to: CPU Architecture:Any

[11 Sep 2015 12:45] Jay Jay
Description:

If I used this query:

select customer_id, type, subtype, value from customer_data use index (customer_id) where identity_id='252666652202606' and customer_id in (select c.customer_id from customer_data c use index (identity_id_2) where
 (c.identity_id='252666652202606' and c.value like 'it%')
  union select r.customer_id from customer_data r use index (identity_id_3) where
 (r.identity_id='252666652202606' and r.value_reverse like 'ti%') ) limit 200

it is very slow.
If I take out the subquery and only run that:

select c.customer_id from customer_data c use index (identity_id_2) where
 (c.identity_id='252666652202606' and c.value like 'it%')
  union select r.customer_id from customer_data r use index (identity_id_3) where
 (r.identity_id='252666652202606' and r.value_reverse like 'ti%')

it returns 239 rows and is very fast.

describe for the subquery:

+----+--------------+------------+-------+---------------+---------------+---------+------+------+-----------------------+
| id | select_type  | table      | type  | possible_keys | key           | key_len | ref  | rows | Extra                 |
+----+--------------+------------+-------+---------------+---------------+---------+------+------+-----------------------+
|  1 | PRIMARY      | c          | range | identity_id_2 | identity_id_2 | 391     | NULL |  101 | Using index condition |
|  2 | UNION        | r          | range | identity_id_3 | identity_id_3 | 391     | NULL |   63 | Using index condition |
| NULL | UNION RESULT | <union1,2> | ALL   | NULL          | NULL          | NULL    | NULL | NULL | Using temporary       |
+----+--------------+------------+-------+---------------+---------------+---------+------+------+-----------------------+

describe for the whole query:

+----+--------------------+---------------+-------+---------------+---------------+---------+------+---------+------------------------------------+
| id | select_type        | table         | type  | possible_keys | key           | key_len | ref  | rows    | Extra                              |
+----+--------------------+---------------+-------+---------------+---------------+---------+------+---------+------------------------------------+
|  1 | PRIMARY            | customer_data | ALL   | NULL          | NULL          | NULL    | NULL | 1424044 | Using where                        |
|  2 | DEPENDENT SUBQUERY | c             | range | identity_id_2 | identity_id_2 | 391     | NULL |     101 | Using index condition; Using where |
|  3 | DEPENDENT UNION    | r             | range | identity_id_3 | identity_id_3 | 391     | NULL |      63 | Using index condition; Using where |
| NULL | UNION RESULT       | <union2,3>    | ALL   | NULL          | NULL          | NULL    | NULL |    NULL | Using temporary                    |
+----+--------------------+---------------+-------+---------------+---------------+---------+------+---------+------------------------------------+

It looks like the outer query is not using the index I am telling it to use. This seems a bit strange since it is in fact the most optimal index to use.
If I remove the "use index (customer_id)" then describe says:
+----+--------------------+---------------+-------+-----------------------------+---------------+---------+-------+-------+------------------------------------+
| id | select_type        | table         | type  | possible_keys               | key           | key_len | ref   | rows  | Extra                              |
+----+--------------------+---------------+-------+-----------------------------+---------------+---------+-------+-------+------------------------------------+
|  1 | PRIMARY            | customer_data | ref   | identity_id_2,identity_id_3 | identity_id_2 | 8       | const | 73808 | Using index condition; Using where |
|  2 | DEPENDENT SUBQUERY | c             | range | identity_id_2               | identity_id_2 | 391     | NULL  |   101 | Using index condition; Using where |
|  3 | DEPENDENT UNION    | r             | range | identity_id_3               | identity_id_3 | 391     | NULL  |    63 | Using index condition; Using where |
| NULL | UNION RESULT       | <union2,3>    | ALL   | NULL                        | NULL          | NULL    | NULL  |  NULL | Using temporary                    |
+----+--------------------+---------------+-------+-----------------------------+---------------+---------+-------+-------+------------------------------------+

How to repeat:
Schema:

CREATE TABLE `customer_data` (
  `customer_id` bigint(20) unsigned NOT NULL,
  `identity_id` bigint(20) unsigned NOT NULL,
  `type` tinyint(3) unsigned NOT NULL,
  `subtype` tinyint(3) unsigned NOT NULL,
  `value` varchar(127) NOT NULL,
  `value_reverse` varchar(127) NOT NULL,
  `sequence` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`customer_id`,`identity_id`,`type`,`subtype`,`sequence`),
  KEY `identity_id_2` (`identity_id`,`value`),
  KEY `identity_id_3` (`identity_id`,`value_reverse`),
  KEY `customer_id` (`customer_id`,`identity_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
[15 Sep 2015 6:54] MySQL Verification Team
Hello Jojiwoij oijweoijroeij,

Thank you for the report.
Could you please provide some subset of data(logical dump taken from mysqldump etc ) to reproduce this issue at our end? If required mark it as private when you upload.

Thanks,
Umesh
[15 Sep 2015 7:48] Jay Jay
The data is confidential, but don't you guys have a script for that or something? :)

#!/usr/bin/perl

for ($i = 0; 8000000 > $i; ++$i)
{
        my $val = "";
        for ($a = int(rand(30) + 1); $a; --$a)
        {
                $val = $val . chr($a + 40);
        }

        for ($ii = int(rand(7) + 1); $ii; --$ii)
        {
                print "insert into customer_data (customer_id, identity_id, type, value, value_reverse, sequence) values (" . $i . ", '9187237261327', " . $ii . ", '" . $val . "', reverse('" . $val . "'), " . $ii . ");\n";
        }

}
[28 Apr 2022 18:45] Stephen Buergler
https://www.db-fiddle.com/f/wQ9yS1Y9WrqV7ruiwBsYu2/3
Here is an example of the right indexes not being used in a subquery.