MySQL Bugs: #104000: RANGE OPTIMIZER MEMORY USAGE INCREASES WITH MORE SECONDARY INDEXES!

Bug #104000	RANGE OPTIMIZER MEMORY USAGE INCREASES WITH MORE SECONDARY INDEXES!
Submitted:	14 Jun 2021 7:44	Modified:	14 Jun 2021 7:45
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	8.0.25	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Description:
------------
it seems unexpected that having more secondary indexes will lead to high
memory requirement of the range optimizer...

testcase outputs:

+-----------+
| count(id) |
+-----------+
|     34064 |
+-----------+
1 row in set (2 min 27.93 sec)

+---------+------+------------------------------------------------------------
---------------------------------------------------------------------+
| Level   | Code | Message                                                    
                                                                     |
+---------+------+------------------------------------------------------------
---------------------------------------------------------------------+
| Warning | 3170 | Memory capacity of 201326592 bytes for
'range_optimizer_max_mem_size' exceeded. Range optimization was not done for
this query. |
+---------+------+------------------------------------------------------------
---------------------------------------------------------------------+
1 row in set (2 min 27.93 sec)

+--------+--------------+
| ranges | range memory |
+--------+--------------+
|  34064 |    206849320 |
+--------+--------------+
1 row in set (2 min 27.94 sec)

+----------------------------+
| msg                        |
+----------------------------+
| aborting due to limit hit! |
+----------------------------+
1 row in set (2 min 27.95 sec)

Query OK, 0 rows affected (2 min 27.95 sec)

mysql> select ranges,memory,query_lengt from r order by a desc limit 10;
+--------+-----------+-------------+
| ranges | memory    | query_lengt |
+--------+-----------+-------------+
|  34064 | 206849320 |      193317 |
|  34063 | 137896888 |      193311 |
|  34062 | 137896888 |      193305 |
|  34061 | 137896888 |      193299 |
|  34060 | 137896888 |      193293 |
+--------+-----------+-------------+
5 rows in set (0.01 sec)

However if you remove some of the secondary keys,  the failure occurs much
higher.

How to repeat:
Paste this into the mysql client each time to check memory:

\r

set sql_mode='';
set session range_optimizer_max_mem_size=1024*1024*192;

drop table if exists t;
 create table t (
  `id` int unsigned not null auto_increment,
  primary key (`id`),
  key(id),key(id),key(id),key(id),key(id),key(id),key(id),key(id),key(id),
  key(id),key(id),key(id),key(id),key(id),key(id),key(id),key(id),key(id),
  key(id),key(id),key(id),key(id),key(id),key(id),key(id),key(id),key(id),
  key(id)
) engine=innodb default charset=latin1;

truncate table t;
set @id:=0;
replace into t(id) values
(@id:=@id+1),(@id:=@id+1),(@id:=@id+1),(@id:=@id+1),(@id:=@id+1);
replace into t(id) select @id:=@id+1 from t,t a,t b,t c,t d,t e,t f,t g limit
40000;
select count(*) from t;

analyze table t;
drop table if exists r;
create table r(a serial,ranges int, memory int,query_lengt int);

drop procedure if exists p;
delimiter $
create procedure p(min_ranges int,max_ranges int,step int)
begin
  declare v_j int default min_ranges;
  declare v_i int default 0;
  truncate r;
  endme: begin
  repeat
    set v_i:=0;
    set @v_sql='select count(id) from t where id in (';
    repeat
      set @v_sql:=concat(@v_sql,v_i,',');
      set v_i:=v_i+1;
    until v_i > v_j -1 end repeat;
    set @v_sql:=concat(@v_sql,v_i,')');
    -- select @v_sql;
    prepare s from @v_sql;
    execute s;
    show warnings;
    deallocate prepare s;
    
    select HIGH_NUMBER_OF_BYTES_USED as range_memory  from
      performance_schema.memory_summary_by_thread_by_event_name
a,performance_schema.threads b
      where b.processlist_id=connection_id() and a.thread_id=b.thread_id and
event_name='memory/sql/test_quick_select' into @mem;
    
    select v_j as 'ranges', @mem as 'range memory';
    insert into r(ranges,memory,query_lengt) values(v_j,@mem,length(@v_sql));
    if @mem >= @@range_optimizer_max_mem_size then
       select 'aborting due to limit hit!' as 'msg';
       leave endme;
    end if;
    
    set v_j:=v_j+step;
  until v_j >= max_ranges end repeat;
  end endme;
end $
delimiter ;

truncate table r;
call p(34060/*start*/,100000/*max*/,1 /*step*/);
select ranges,memory,query_lengt from r order by a desc limit 10;

Suggested fix:
If you put a bunch more rows in table t, then comment out all the secondary
keys and leave only the PK, then run the testcase like this:

call p(0/*start*/,10000000/*max*/,100000 /*step*/);
select ranges,memory,query_lengt from r order by a desc limit 10;

The failing point is much much higher, I gave up waiting by:

+--------+--------------+
| ranges | range memory |
+--------+--------------+
| 500000 |    137896888 |
+--------+--------------+
1 row in set (8 min 46.00 sec)

To clarify the problem.

The extra memory may or may not be a problem.

The principal problem is unexpectedly using more memory than range_optimizer_max_mem_size.

This triggers queries intended to be very fast as they are doing "point lookups" falling back to doing a table scan.

Imagine the type of query where this has been seen:

SELECT whatever FROM some_table WHERE id IN ( 1, 2, 3, ...., 99999 )

It should be fast but it can be really slow.  Documentation tells us how to estimate memory usage and thus the maximum number of elements in such a list, but that documentation is now wrong. Sudden changes in query performance dependent on the number of parameters provided is clearly troublesome.

I notice this is marked as a duplicate but the original report is not referenced. Could you add a link to the original report of this problem?

Duplicate of internal:

Bug 32863137 - RANGE OPTIMIZER MEMORY USAGE INCREASES WITH MORE SECONDARY INDEXES!