MySQL Bugs: #109595: records_in_range does too many disk reads

Bug #109595	records_in_range does too many disk reads
Submitted:	11 Jan 2023 21:56	Modified:	12 Oct 2023 16:24
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	8.0.31	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
While running the insert benchmark, the QPS I can get in one setup is ~1000/second for InnoDB vs ~2500/second for MyRocks and ~7500/second for Postgres. One such result is:
http://smalldatum.blogspot.com/2023/01/insert-benchmark-on-8-core-amd-server.html

I finally had time to debug the root cause. The problem is that InnoDB does between 4 and 5 disk reads per query while MyRocks and Postgres do <= 1. I attached a debugger to MySQL for InnoDB and the IO is with records_in_range on the call stack.

This reproduces for 8.0.31, 5.7.39 and 5.6.51

Setting optimizer_switch='mrr=off' does not fix the problem.

There are 3 types of queries used by this benchmark. Each is covered by one of the 3 secondary indexes and in the command lines I have below for the reproduction, all used "LIMIT 10". Also, all force one of the secondary indexes. 

An example query is:
SELECT cashregisterid,price,customerid
FROM pi1 FORCE INDEX (pi1_registersegment)
WHERE (cashregisterid > 52) ORDER BY cashregisterid, price, customerid LIMIT 10

How to repeat:
The benchmark client is here:
https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py

The following 4 commands reproduce this for a server with 16G of RAM. If you use a server with more RAM then increase the --max_rows value in the first step from 500M to something bigger. The goal is to make the workload IO-bound so it is easier to spot the overhead from records_in_range.

The performance problem shows up on step 4). That step runs for X seconds where: * X = MR / IPS, MR=--max-rows, IPS==--inserts-per-second
* as written below, MR=360000, IPS=100 so X = 3600 seconds

1) Create & load the table with only a PK index
python3 iibench.py --dbms=mysql --db_name=ib --secs_per_report=1 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options="" --unique_checks=1 --bulk_load=0 --max_rows=500000000 --table_name=pi1 --setup --num_secondary_indexes=0 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=1673453781 --dbopt=none

2) Create 3 secondary indexes
python3 iibench.py --dbms=mysql --db_name=ib --secs_per_report=1 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options="" --unique_checks=1 --bulk_load=0 --secondary_at_end --max_rows=100000 --table_name=pi1 --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=1673462156 --dbopt=none

3) Insert more data with secondary indexes in place
python3 iibench.py --dbms=mysql --db_name=ib --secs_per_report=1 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options="" --unique_checks=1 --bulk_load=0 --max_rows=10000000 --table_name=pi1 --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=50 --inserts_per_second=0 --query_threads=0 --seed=1673467146 --dbopt=none

4) Test for QPS with writes done in the background.
python3 iibench.py --dbms=mysql --db_name=ib --secs_per_report=1 --db_host=127.0.0.1 --db_user=root --db_password=pw --engine=innodb --engine_options="" --unique_checks=1 --bulk_load=0 --max_rows=360000 --table_name=pi1 --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=50 --inserts_per_second=100 --query_threads=1 --seed=1673470055 --dbopt=none

Suggested fix:
Not a fix, but a hack that prevents the extra IO and increased InnoDB QPS from ~1000/second to ~4000/second is below. The hack just returns early from handler::multi_range_read_info_const.

diff mysql-8.0.31.orig/sql/handler.cc mysql-8.0.31/sql/handler.cc
6134a6135
>     // if (1) return 10;

The early return for my hack above was added here:
https://github.com/mysql/mysql-server/blob/8.0/sql/handler.cc#L6134

And full call stacks for when the IO occurs is here. I just put a break point on pread with gdb and this is the only time it was reached during the SELECT statements:
5.6.51 -> https://gist.github.com/mdcallag/246d4652b5094aa902d669ef251a00f8
8.0.31 -> https://gist.github.com/mdcallag/c320b776ebb5fcd74f7e9ad6b0646933

A subset of the call stack from when the IO occurs is:
check_quick_select ->
 ha_innobase::multi_range_read_info_const ->
  DsMrr_impl::dsmrr_info_const ->
   handler::multi_range_read_info_const ->
    ha_innobase::records_in_range ->
     btr_estimate_n_rows_in_range ->
      btr_estimate_n_rows_in_range_low ->
       btr_estimate_n_rows_in_range_on_level ->
        buf_page_get_gen

Optimizer trace output from one such query:
5.6.51 -> https://gist.github.com/mdcallag/fe5656bc94d58f0f3b653015fa5d9f25
8.0.31 -> https://gist.github.com/mdcallag/c02f1201df77ef37752cd66c7f326531

The my.cnf:
[mysqld]
sql_mode=no_engine_substitution,no_unsigned_subtraction
default_tmp_storage_engine=MyISAM
tmpdir=/data/m/my
#query_cache_size=0
#query_cache_type=0
# slow_query_log_file=/data/m/my/slow.log
long_query_time=5
# skip_log_bin
log_bin=/data/m/my/binlogs/bl
max_binlog_size=128M
server_id=77
binlog_format=ROW
datadir=/data/m/my/data

innodb_file_per_table=1

innodb_data_home_dir=/data/m/my/data
innodb_data_file_path=ibdata1:512M:autoextend
innodb_log_group_home_dir=/data/m/my/txlogs
slow_query_log=1

performance_schema=1
ssl=0
default_authentication_plugin=mysql_native_password
character_set_server=latin1
collation_server=latin1_swedish_ci
eq_range_index_dive_limit=10

innodb_fast_shutdown=1
innodb_doublewrite=1
innodb_buffer_pool_load_at_startup=OFF
innodb_buffer_pool_dump_at_shutdown=OFF
innodb_purge_threads=1

innodb_flush_log_at_trx_commit=2
sync_binlog=0

loose_innodb_log_writer_threads=OFF

# x1
innodb_dedicated_server=ON
innodb_buffer_pool_size=10G

# x2b
innodb_log_files_in_group=40

# x4
innodb_max_dirty_pages_pct_lwm=80
innodb_max_dirty_pages_pct=90

# x6d
innodb_idle_flush_pct=1
innodb_io_capacity=1000
innodb_io_capacity_max=2000

# x8
innodb_adaptive_hash_index=OFF

# optimizer_switch='mrr=off'

[mysqld_safe]
malloc-lib=/usr/lib/x86_64-linux-gnu/libjemalloc.so

---

Finally, I added some debug printfs and here is example output from them. The diff for the debug printfs is at https://gist.github.com/mdcallag/dbeeb17949952623ead44cb1fe12b68b

From the printf output I am curious if the problem occurs for the lines with "n_pages_read=10".

ZZZ btr_estimate_n_rows_in_range_on_level exact: divergence_level=0, loop=1, n_pages_read=4, n_rows_prev=3
ZZZ btr_estimate_n_rows_in_range_on_level inexact: divergence_level=0, loop=2, n_pages_read=10, n_rows_prev=2225
ZZZ btr_estimate_n_rows_in_range_on_level inexact: divergence_level=0, loop=3, n_pages_read=10, n_rows_prev=771852
ZZZ btr_estimate_n_rows_in_range_low return:1 224783316

Typo for the hack diff:
>     // if (1) return 10;

-->

>     if (1) return 10;

Hello Mark,

Thank you for the detailed report and performance improvement feedback.

Sincerely,
Umesh

Another benchmark result and QPS on the insert benchmark is 1/6th of what it could be if this bug gets fixed. 
http://smalldatum.blogspot.com/2023/05/the-insert-benchmark-mysql-56-57-80-and_18.html

Documented fix as follows in the MySQL 8.0.35 and 8.2.0 changelogs:

    records_in_range performed an excessive number of disk reads for
    insert operations.

    Our thanks to Facebook for the contribution. 

Closed.

https://smalldatum.blogspot.com/2023/11/fixing-bug-109595-makes-mysql-almost-4x.html