Bug #78752 Provide documentation on how to estimate range_optimizer memory usage
Submitted: 8 Oct 2015 8:49 Modified: 16 Mar 2016 15:01
Reporter: Simon Mudd (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S4 (Feature request)
Version:5.7.9 OS:Any
Assigned to: Paul Dubois CPU Architecture:Any
Tags: documentation, memory usage, range_optimizer_max_mem_size

[8 Oct 2015 8:49] Simon Mudd
Description:
I notice a warning in 5.7.9 of the form:

Memory capacity of 1536000 bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query. at SELECT COUNT(*) FROM some_table WHERE col1 > ? AND col2 IN (1444838,16649,1302857,1482005,...)

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-9.html makes reference to range_optimizer_max_mem_size and bug#72047.

The context is clear: there are too many elements here in the in clause.

We can also adjust the memory setting and hopefully control the input so that we don't trigger this behaviour again.

However, I see no documentation or hints as to "roughly" how I can estimate how "big" I can make a query such that it will fit in the configured range_optimizer_max_mem_size.

How to repeat:
Applications which auto-generate SQL may have too many terms and therefore be able to split a single query query multiple queries and deal with the results themselves,
but they can only do that if they know roughly how to calculate the optimiser memory usage of the queries they want to generate.

In the past I have seen this done to limit queries to "just under" max_allowed_packet so this type of optimisation is quite frequent.

Suggested fix:
Some sort of documentation which gives pointers in this direction would be useful as this allows the DEVs to adjust their queries to fit the configured setting, or the DBAs to adjust the memory setting based on the queries the DEVs are going to generate
[8 Oct 2015 9:47] Shane Bester
With memory instrumentation enabled, this query should show how much the range optimizer did use:

select * from performance_schema.memory_summary_by_thread_by_event_name 
where thread_id=X and event_name='memory/sql/test_quick_select';
[9 Oct 2015 9:51] Simon Mudd
So this change in 5.7.9 comes back to bug#70247.

It seems the previously hard-coded limit select args list was 16000 according to Yoshinori.

The change to use the new default value for range_optimizer_max_mem_size of 1536000 is a different type of setting in the sense that we're limiting based on memory usage rather than argument count.

Some initial testing shows that using 4500 or 8000 entries in a statement of the form WHERE id IN ( 123, 456, 789, .... ) is triggering the memory limit 
and thus a warning in some code I'm using , when previously 5.7.8 did not do that and behaviour did not seem to indicate a change to doing table scans.

In many ways an argument limit of X, whatever X is, is easier to understand and also if devs generate queries they can more easily limit queries if needed
to this configured value.  That's easier then "guestimating" the amount of memory the query will make the range optimiser use, especially as that information is lacking now.

If 4500 items in an IN clause can trigger 1.5MB to be used that seems to indicate that memory consumption by the range optimiser stage is not very optimal. ~300 bytes per item?  So it seems that maybe supporting the old 16,000 value setting would require me to ~5MB of memory for this setting.
If that's per thread that seems like a lot and if we have a large thread count connected to the server any sudden heavy increase in busy threads could
trigger quite a large memory footprint change.  A pile-up might even take the server down if enough threads become active at the same time.

I think therefore that the current value may not be appropriately chosen, but more importantly more information about the range optimizer and it's memory consumption, and the number of terms it will consider prior to another strategy being chosen really needs to be provided, and a comparison made against previous versions.
[22 Oct 2015 18:51] Simon Mudd
I am still a little bit surprised as some initial quick checks seem to indicate that the memory consumption of the range optimizer far exceeds the size of the raw SQL query. I can expect some overhead but this seems to be quite significant. Perhaps it depends on the exact query itself.

This is an example I have been using. A real table, column names have been adjusted for obvious reasons.

CREATE TABLE `mytable` (
  `c1` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c2` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `c3` int(10) unsigned NOT NULL DEFAULT '0',
  `c4` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c5` int(10) unsigned NOT NULL DEFAULT '0',
  `c6` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c7` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c8` tinyint(4) NOT NULL DEFAULT '0',
  `c9` int(10) unsigned NOT NULL DEFAULT '0',
  `c10` int(10) unsigned NOT NULL DEFAULT '0',
  `c11` tinyint(4) NOT NULL DEFAULT '0',
  `c12` tinyint(4) unsigned NOT NULL DEFAULT '0',
  `c13` tinyint(4) NOT NULL DEFAULT '0',
  `c14` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c15` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c16` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c17` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c18` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c19` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `c20` int(10) unsigned NOT NULL DEFAULT '0',
  `c21` int(10) unsigned NOT NULL DEFAULT '0',
  `c22` int(10) unsigned NOT NULL DEFAULT '0',
  `c23` int(10) unsigned NOT NULL DEFAULT '0',
  `c24` int(10) unsigned NOT NULL DEFAULT '0',
  `c25` tinyint(4) NOT NULL DEFAULT '0',
  `c26` int(10) unsigned NOT NULL DEFAULT '0',
  `c27` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`c1`,`c2`,`c3`,`c4`,`c5`,`c6`,`c7`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The real query which showed up in the slow log was ~850 "entries" and 70kB.

The query was of of the form:

DELETE FROM
   `mytable`
WHERE
    ( c1=1 AND c2=87653578 AND c3=9239201 AND c4=1 AND c5=470951 AND c6=1 AND c7=0 )
 OR ( c1=1 AND c2=87653578 AND c3=9239201 AND c4=1 AND c5=470951 AND c6=2 AND c7=0 )
 OR ( c1=1 AND c2=87653578 AND c3=9239201 AND c4=2 AND c5=470951 AND c6=1 AND c7=0 )
 OR ( c1=1 AND c2=87653578 AND c3=9239201 AND c4=2 AND c5=470951 AND c6=2 AND c7=0 )
 OR ( c1=1 AND c2=87653578 AND c3=9239201 AND c4=3 AND c5=470951 AND c6=1 AND c7=0 )
....

and this seemed to trigger a table scan with range_optimizer_max_mem_size set to 4M which seems surprising.

# Query_time: 3841.251569  Lock_time: 0.036377 Rows_sent: 0  Rows_examined: 20636221

These queries should run in a lot less than 1s.

So understanding how this memory usage can be so high would be very important especially if there are a large number of concurrent connections doing this same sort of query.
[23 Oct 2015 9:27] Shane Bester
made a test to measure memory. see next file upload.

Attachment: range_optimizer_memory_vs_number_of_ranges.png (image/png, text), 30.68 KiB.

[23 Oct 2015 9:29] Shane Bester
testcase for printing out memory usage

Attachment: range_memory_usage.sql (application/octet-stream, text), 1.78 KiB.

[16 Mar 2016 15:01] Paul Dubois
Posted by developer:
 
https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html
now has a new subsection
https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#range-optimization-memory-...
8.2.1.3.4 Limiting Memory Use for Range Optimization