Bug #60873 select query using function on index column does not use index
Submitted: 14 Apr 2011 17:58 Modified: 15 Apr 2011 12:42
Reporter: Vincent Riffier Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.49-3 OS:Linux (Debian x64)
Assigned to: CPU Architecture:Any
Tags: FUNCTION, INDEX, Optimizer, SELECT, wrong

[14 Apr 2011 17:58] Vincent Riffier
Description:
This is a bug on the optimizer when using a function in a select query.
I have a table with 10+Million records.
In order to faster the queries based on a date range we store the first id on any given day. Thus, when querying on a datarange we have our boundaries from the primary key id.
Since we moved to Mysql 5.1.49, the optimizer is not using the primary key index  when we use a function for the id range. It only uses it when we specifically use a value for the id.

Here's the explain on both queries.

mysql> desc select camp, count(*) from table1 where id >=date2table1id(20110413) group by camp having count(*) > 20 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table1
         type: index
possible_keys: NULL
          key: INDEX_CAMP_OFFER
      key_len: 10
          ref: NULL
         rows: 13529455
        Extra: Using where
1 row in set (0.00 sec)

mysql> select date2table1id(20110413);
+-----------------------+
| date2leadid(20110413) |
+-----------------------+
|              93412229 |
+-----------------------+
1 row in set (0.00 sec)

mysql> desc select camp, count(*) from table1 where id >=93412229 group by camp having count(*) > 20 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: table1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 84864
        Extra: Using where; Using temporary; Using filesort

show create function date2leadid \G
*************************** 1. row ***************************
            Function: date2table1id
            sql_mode:

     Create Function: CREATE DEFINER=`mysqluser`@`%` FUNCTION `date2table1id`(d date) RETURNS int(11)
begin
  declare id int;
  select table1_id into id from table1_date_index where index_date=d;
  return id;
end
character_set_client: latin1
collation_connection: latin1_swedish_ci
  Database Collation: latin1_swedish_ci

As you can see the second one is using the proper index parsing 80000+ rows instead of millions.

How to repeat:
CREATE TABLE table1_date_index (index_date date NOT NULL, table1_id int(11) NOT NULL, PRIMARY KEY (`index_date`));

CREATE FUNCTION `date2table1id`(d date) RETURNS int(11)
begin
  declare id int;
  select table1_id into id from table1_date_index where index_date=d;
  return id;
end

create table table1(id int primary key, camp int);
insert into table1_date_index (index_date, table1_id) values ('2011-04-13', 1);
insert into table1 (camp) values (1);
insert into table1 (camp) values (2);
desc select camp, count(*) from table1 where id >=0 group by camp;
desc select camp, count(*) from table1 where id >=date2table1id(20110413);

Suggested fix:
Mysql should operate as in previous version and use the primary index even when the result comes from a function.
[14 Apr 2011 18:38] Valeriy Kravchuk
Please, try to define your function as DETERMINISTIC. This may check optimizer's behavior. See http://dev.mysql.com/doc/refman/5.1/en/create-procedure.html for more details.
[15 Apr 2011 12:42] Vincent Riffier
Thank you it worked!