Bug #70354 left-sibling page read during descending index scan by btr_cur_latch_leaves
Submitted: 16 Sep 2013 15:37 Modified: 14 Jan 2015 0:24
Reporter: Mark Callaghan Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version: OS:Any
Assigned to: CPU Architecture:Any

[16 Sep 2013 15:37] Mark Callaghan
Description:
In btr_cur_latch_leaves for BTR_SEARCH_PREV the left-sibling page is read & locked. This can be an extra disk read. Our most frequent query is a short descending scan so we will be doing extra IO for that. 

I think this is similar to http://bugs.mysql.com/bug.php?id=61736

How to repeat:
Read the source. I will try to provide benchmark results to show how much extra IO this causes for short range scans.

313 case BTR_SEARCH_PREV:
314 case BTR_MODIFY_PREV:
315 mode = latch_mode == BTR_SEARCH_PREV ? RW_S_LATCH : RW_X_LATCH;
316 /* latch also left brother */
317 left_page_no = btr_page_get_prev(page, mtr);
318 
319 if (left_page_no != FIL_NULL) {
320 get_block = btr_block_get(
321 space, zip_size,
322 left_page_no, mode, cursor->index, mtr);
[24 Sep 2013 13:01] MySQL Verification Team
Hello Mark,

Thank you for the bug report.

Thanks,
Umesh
[24 Sep 2013 13:05] MySQL Verification Team
// 5.6

storage/innobase/btr/btr0cur.cc:
  311  		return;
  312  
  313: 	case BTR_SEARCH_PREV:
  314  	case BTR_MODIFY_PREV:
  315  		mode = latch_mode == BTR_SEARCH_PREV ? RW_S_LATCH : RW_X_LATCH;
  316		/* latch also left brother */
  317		left_page_no = btr_page_get_prev(page, mtr);
  318 
  319           if (left_page_no != FIL_NULL) {
  320 		    get_block = btr_block_get(
  321                     space, zip_size,
  322			  left_page_no, mode, cursor->index, mtr);
  323		    cursor->left_block = get_block;
[30 Sep 2013 9:30] MySQL Verification Team
Hello Mark,

Could you please also provide benchmark results of how much
extra IO this causes for short range scans.

Regards,
Umesh
[2 Oct 2013 17:22] Mark Callaghan
I ran a hacked version of sysbench:
* query was one of:
SELECT c from sbtest1 where id between 1249936 and 1249939 order by id desc
SELECT c from sbtest1 where id between 1249936 and 1249939 order by id asc

* table had 8M rows --> about 2G of data in my case
* innodb buffer pool is 256MB
* innodb_flush_method=O_DIRECT
* restarted mysqld before each test
* ran sysbench for 3 minutes
* sysbench used 1 client connection, client & mysqld were on the same host
* storage is fast PCI-based flash
* range scan read 4 rows

For both asc and desc index scan I get ~4000 pages read/second and ~4000 iostat r/s. However with desc index scan there are more pages read per query so QPS is higher for asc index scan. The pages read per query rates are about 1.0 for the asc index case and 1.15 for the desc index case.

The tables below have 3 columns which are the rates measured per 10-second interval during the test:
* column 1 -> Com_select (selects/second)
* column 2 -> innodb pages_read / second
* column 3 -> iostat r/s

Results for ascending index scan

3881.5 4009.4 3959.50
3952.0 4011.0 3964.10
3927.4 3997.8 3944.90
3771.2 3831.1 3781.20
3939.5 4008.9 3961.20
3922.1 3993.4 3943.00
3925.7 3986.3 3955.70
3822.0 3888.0 3820.70
3933.2 4007.5 3952.40
3653.8 3724.4 3674.30
3902.8 3968.0 3954.90
3891.1 3952.1 3864.60
3938.5 4002.0 3956.90
3927.9 4010.9 3963.10
3925.2 4013.3 3962.70
3540.0 3622.6 3650.30
3582.2 3666.7 3629.80

Results for descending index scan

3425.5 3916.0 3878.20
3429.3 3917.4 3867.70
3370.3 3851.2 3810.20
3366.7 3861.6 3805.60
3392.2 3882.0 3841.80
3433.9 3957.7 3915.70
3412.1 3893.9 3839.80
3537.4 4057.0 3977.80
3691.1 4228.1 4171.20
3701.8 4223.1 4176.90
3686.8 4209.4 4171.80
3706.5 4223.7 4162.60
3691.0 4239.3 4184.90
3565.7 4068.3 4122.50
3614.7 4156.1 3999.30
3685.6 4225.5 4167.10
3654.9 4169.7 4173.20
[2 Oct 2013 17:24] Mark Callaghan
This was my sysbench command line. It uses some options from my sysbench branch. I also hacked sysbench so the --oltp-simple-ranges query used either "order by id asc" or "order by id desc"

s=180; ./sysbench.desc --batch --batch-delay=10 --test=oltp --oltp-table-size=8000000 --max-time=$s --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes --oltp-table-name=sbtest1 --oltp-test-mode=complex --oltp-point-selects=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-non-index-updates=0 --oltp-skip-trx --oltp-dist-type=uniform --oltp-range-size=4 --oltp-connect-delay=0 --percentile=99 --num-threads=1 --oltp-read-only --seed-rng=1 run
[3 Oct 2013 3:22] MySQL Verification Team
Thank you Mark for providing the requested details.

Regards,
Umesh
[14 Jan 2015 0:24] Mark Callaghan
Any chance you will elaborate on this?
[14 Jan 2015 4:59] MySQL Verification Team
Hello Mark,

This seems to be result of some auto update, let me check with the dev's and update you further on this.

Thanks,
Umesh
[14 Jan 2015 5:38] Yasufumi Kinoshita
My previous comment for orabug# seems not replicated to this bug#.

============================================================
[8 Oct 2013 3:58] Yasufumi Kinoshita (YAKINOSH)

I confirmed this behavior was already optimized enough.

BTR_SEARCH_PREV is only used by btr_pcur_move_backward_from_page().
BTR_MODIFY_PREV is used by btr_pcur_move_backward_from_page() and
ibuf_insert().

btr_pcur_move_backward_from_page() is called by btr_pcur_move_to_prev() only
when needed (when the cursor is on infimum record)
ibuf_insert() is based on the previous and next pages are latched also.
============================================================

And I think your point might be wrong.
If the reason is from cursor restoration, I already fixed at
"Bug#17666170 : BTR_PCUR_RESTORE_POSITION CAN TRY OPTIMISTIC RESTORATION FOR BACKWARD CURSOR"
It should be fixed at 5.7.4.

So this bug# seems "Not a Bug" or "Duplicated"