Bug #72059 Filter rows using index when using LEFT()
Submitted: 17 Mar 2014 12:02 Modified: 17 Mar 2014 12:32
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: filter, INDEX, left, Optimize

[17 Mar 2014 12:02] Daniël van Eeden
Description:
The following are identical:
LEFT(c1,4) = 'test' <-- Doesn't use the index
WHERE c1 LIKE 'test%' <-- Uses the index

Also, impossible conditions are not detected:
LEFT(name,3)='test'

How to repeat:
Scans 24 rows, so it doesn't filter:
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE LEFT(name,4)='test';
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | NULL          | idx_name | 603     | NULL |   24 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

Identical, but only scans 15 rows:
mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE name LIKE 'test%';
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | range | idx_name      | idx_name | 603     | NULL |   15 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

Why scan the table? 
mysql [localhost] {msandbox} (test) > explain SELECT name FROM t1 WHERE LEFT(name,3)='test';
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | NULL          | idx_name | 603     | NULL |   24 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

Suggested fix:
Better optimize queries using the LEFT() function
[17 Mar 2014 12:24] Daniël van Eeden
And when a prefix index is present it gets worse:

mysql [localhost] {msandbox} (test) > show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name_prefix` (`name`(4)),
  KEY `idx_name_prefix2` (`name`(20))
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE name LIKE 'test%';
+----+-------------+-------+------+----------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys                    | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+----------------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | ALL  | idx_name_prefix,idx_name_prefix2 | NULL | NULL    | NULL |   24 | Using where |
+----+-------------+-------+------+----------------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE LEFT(name,4)='test';
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |   24 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
[17 Mar 2014 12:25] Daniël van Eeden
The original table structure:
CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8
[17 Mar 2014 12:32] Daniël van Eeden
Converting to ASCII (i.e. no multibyte chars) doesn't help. 

The LEFT() funtion and the SUBSTRING function (starting at 1) behave the same.

mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE name LIKE 'test%';
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | range | idx_name      | idx_name | 203     | NULL |   15 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE SUBSTRING(name,1,4)='test';
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | NULL          | idx_name | 203     | NULL |   24 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > EXPLAIN SELECT name FROM t1 WHERE LEFT(name,4)='test';
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | NULL          | idx_name | 203     | NULL |   24 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
[31 Mar 2014 12:05] Hartmut Holzgraefe
Tricky as the optimizer would have to know about the inner workings of a function ... so it would either need to be able to do introspection on the actual implementation of functions, or special "magic" exceptions would need to be embedded into the optimizer code for functions like LEFT(), so spreading the implementation of a function across several places in the code ...