Bug #115227 prepared statement encounter performance degradation compared with normal select
Submitted: 5 Jun 10:45 Modified: 5 Jun 11:22
Reporter: Qingping Zhu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[5 Jun 10:45] Qingping Zhu
Description:
Normal select choose "Index range scan", prepared statement bind parameters choose "Index lookup(REF)", which result in  performance degradation.

Since 2 queries use same where conditions, optimizer should choose same execute plan, but optimizer choose different execute plan for PS and select query.

How to repeat:
-- Prepare data
create table tt(
  id int primary key auto_increment, 
  p_id VARCHAR(36) default '36e0371fc4f8fd5ec48b704865184f7c',
  o_id VARCHAR(36) default '054a579e5ea7415ab3afd2eeb14038c1',
  tag int default 0,
  status int default 0,
  update_time datetime,
  KEY idx_1 (o_id,status,update_time),
  KEY idx_2 (o_id,p_id,tag,status));
  
insert into tt(update_time) values
('2024-01-15'),('2024-01-16'),('2024-01-17'),('2024-01-18'),('2024-01-19'),('2024-01-20'),('2024-01-21'),('2024-01-22'),('2024-01-23'),('2024-01-24'),
('2024-01-25'),('2024-01-26'),('2024-01-27'),('2024-01-28'),('2024-01-29'),('2024-01-30'),('2024-02-01'),('2024-02-02'),('2024-02-03'),('2024-02-04'),
('2024-02-05'),('2024-02-06'),('2024-02-07'),('2024-02-08'),('2024-02-09'),('2024-02-10'),('2024-02-11'),('2024-02-12'),('2024-02-13'),('2024-02-14'),
('2024-02-15'),('2024-02-16'),('2024-02-17'),('2024-02-18'),('2024-02-19'),('2024-02-20'),('2024-02-21'),('2024-02-22'),('2024-02-23'),('2024-02-24'),
('2024-02-25'),('2024-02-26'),('2024-02-27'),('2024-02-28'),('2024-03-01'),('2024-03-02'),('2024-03-03'),('2024-03-04'),('2024-03-05'),('2024-03-06'),
('2024-03-07'),('2024-03-08'),('2024-03-09'),('2024-03-10'),('2024-03-11'),('2024-03-12'),('2024-03-13'),('2024-03-14'),('2024-03-15'),('2024-03-16'),
('2024-03-17'),('2024-03-18'),('2024-03-19'),('2024-03-20'),('2024-03-21'),('2024-03-22'),('2024-03-23'),('2024-03-24'),('2024-03-25'),('2024-03-26');

insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
insert into tt(update_time) select update_time from tt;
analyze table tt;

-- Execute normal select query, only need to scan 100 rows, exuecute time less 1ms
set optimizer_switch='index_condition_pushdown=off';
explain analyze select id, p_id, tag, status, update_time from tt where o_id = '054a579e5ea7415ab3afd2eeb14038c1' and status = 0 and update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59' order by update_time, id limit 0, 100\G

mysql> explain analyze select id, p_id, tag, status, update_time from tt where o_id = '054a579e5ea7415ab3afd2eeb14038c1' and status = 0 and update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59' order by update_time, id limit 0, 100\G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 100 row(s)  (cost=246421 rows=100) (actual time=0.0263..0.345 rows=100 loops=1)
    -> Filter: ((tt.`status` = 0) and (tt.o_id = '054a579e5ea7415ab3afd2eeb14038c1') and (tt.update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59'))  (cost=246421 rows=55542) (actual time=0.0255..0.34 rows=100 loops=1)
        -> Index range scan on tt using idx_1 over (o_id = '054a579e5ea7415ab3afd2eeb14038c1' AND status = 0 AND '2024-02-15 00:00:00' <= update_time <= '2024-02-21 23:59:59')  (cost=246421 rows=55542) (actual time=0.0218..0.264 rows=100 loops=1)

-- Execute prepared statement, need to scan 122,980 rows, execute time around 270ms
set optimizer_switch='index_condition_pushdown=off';
PREPARE stmt1 FROM 'explain analyze select id, p_id, tag, status, update_time from tt where o_id = ? and status = ? and update_time between ? and ? order by update_time, id limit 0, 100';
SET @v1 = '054a579e5ea7415ab3afd2eeb14038c1';
SET @v2 = 0;
SET @v3 = '2024-02-15 00:00:00';
SET @v4 = '2024-02-21 23:59:59';
EXECUTE stmt1 USING @v1,@v2,@v3,@v4;

mysql> EXECUTE stmt1 USING @v1,@v2,@v3,@v4\G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 100 row(s)  (cost=1625 rows=100) (actual time=270..270 rows=100 loops=1)
    -> Filter: (tt.update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59')  (cost=1625 rows=1583) (actual time=270..270 rows=100 loops=1)
        -> Index lookup on tt using idx_1 (o_id='054a579e5ea7415ab3afd2eeb14038c1', status=0)  (cost=1625 rows=142443) (actual time=0.237..264 rows=122980 loops=1)

Suggested fix:
By analysis optimizer trace, I find out that for prepared statement scenario, optimizer do not "change to range from ref" in function "can_switch_from_ref_to_range", since "Item_param::used_tables" return value is not 0 for placeholders ('?') inside prepared statements.

```
static bool can_switch_from_ref_to_range(THD *thd, JOIN_TAB *tab,
                                         enum_order ordering,
                                         bool recheck_range) {
  if ((tab->range_scan() &&                                            // 1)
       tab->position()->key->key == used_index(tab->range_scan())) ||  // 2)
      recheck_range) {
    uint keyparts = 0, length = 0;
    table_map dep_map = 0;
    bool maybe_null = false;

    calc_length_and_keyparts(tab->position()->key, tab,
                             tab->position()->key->key, tab->prefix_tables(),
                             nullptr, &length, &keyparts, &dep_map,
                             &maybe_null);

    if (thd->is_error()) {
      return true;
    }
    if (!maybe_null &&  // 3)
        !dep_map)       // 4)
```
[5 Jun 11:22] MySQL Verification Team
Hi Mr. Zhu,

Thank you for your bug report.

We have repeated your test successfully. The other two EXPLAINs are with default Optimiser hint:

Table	Op	Msg_type	Msg_text
test.t1	analyze	status	OK
*************************** 1. row ***************************
EXPLAIN: -> Limit: 100 row(s)  (cost=246421 rows=100) (actual time=0.0166..0.282 rows=100 loops=1)
    -> Filter: ((t1.`status` = 0) and (t1.o_id = '054a579e5ea7415ab3afd2eeb14038c1') and (t1.update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59'))  (cost=246421 rows=55542) (actual time=0.0161..0.274 rows=100 loops=1)
        -> Index range scan on t1 using idx_1 over (o_id = '054a579e5ea7415ab3afd2eeb14038c1' AND status = 0 AND '2024-02-15 00:00:00' <= update_time <= '2024-02-21 23:59:59')  (cost=246421 rows=55542) (actual time=0.0138..0.209 rows=100 loops=1)

EXPLAIN
-> Limit: 100 row(s)  (cost=1625 rows=100) (actual time=222..222 rows=100 loops=1)\n    -> Filter: (t1.update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59')  (cost=1625 rows=1583) (actual time=222..222 rows=100 loops=1)\n        -> Index lookup on t1 using idx_1 (o_id='054a579e5ea7415ab3afd2eeb14038c1', status=0)  (cost=1625 rows=142443) (actual time=0.208..213 rows=122980 loops=1)\n
*************************** 1. row ***************************
EXPLAIN: -> Limit: 100 row(s)  (cost=246421 rows=100) (actual time=0.0136..0.257 rows=100 loops=1)
    -> Index range scan on t1 using idx_1 over (o_id = '054a579e5ea7415ab3afd2eeb14038c1' AND status = 0 AND '2024-02-15 00:00:00' <= update_time <= '2024-02-21 23:59:59'), with index condition: ((t1.`status` = 0) and (t1.o_id = '054a579e5ea7415ab3afd2eeb14038c1') and (t1.update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59'))  (cost=246421 rows=55542) (actual time=0.0132..0.249 rows=100 loops=1)

EXPLAIN
-> Limit: 100 row(s)  (cost=1625 rows=100) (actual time=53.4..53.4 rows=100 loops=1)\n    -> Index lookup on t1 using idx_1 (o_id='054a579e5ea7415ab3afd2eeb14038c1', status=0), with index condition: (t1.update_time between '2024-02-15 00:00:00' and '2024-02-21 23:59:59')  (cost=1625 rows=142443) (actual time=53.4..53.4 rows=100 loops=1)\n

This is now a verified performance bug for the version 8.0 and higher.

Thank you, especially for your patch !!