Bug #83470 Reverse scan on a partitioned table does ICP check incorrectly, causing slowdown
Submitted: 20 Oct 2016 18:24 Modified: 16 May 2017 2:53
Reporter: Sergey Petrunya Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Partitions Severity:S3 (Non-critical)
Version:5.7.16-debug-log OS:Any
Assigned to: CPU Architecture:Any
Tags: ICP

[20 Oct 2016 18:24] Sergey Petrunya
Description:
When a query
- reads a partitioned InnoDB table
- does a reverse-ordered range scan (to satisfy ORDER BY DESC)
- uses Index Condition Pushdown

it may take an unreasonably long time to finish.

The reason is that ICP check fails to check for the range bound. The scan walks out of the range being scanned and continues to read index tuples until it reaches the first value in the index.

How to repeat:
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table t1 (
  date_col datetime not null,
  col1 int,
  filler varchar(255),
  key(date_col, col1)
)
partition by range columns(date_col)
(
  partition p0 values less than ('2014-01-01 00:00:00'),
  partition p1 values less than ('2015-01-01 00:00:00'),
  partition p2 values less than ('2016-01-01 00:00:00')
);

insert into t1 select
  date_add('2013-01-01', interval (A.a+10*B.a) day),
  123456,
  repeat('filler-data', 5)
from ten A, ten B, ten C;

insert into t1 select
  date_add('2014-01-01', interval (A.a+10*B.a) day),
  123456,
  repeat('filler-data', 5)
from ten A, ten B, ten C;

## This will return 10 rows
select * from t1 force index(date_col) 
where 
   date_col >= date('2013-04-06') AND date_col < date('2013-04-06' + interval 1 day);

##
## Now, add an index condition that  counts how many times it was checked
##
set @cnt=1;
select * from t1 force index(date_col) 
where 
   col1 + 10000000 > (@cnt:=@cnt+1) and
   date_col >= date('2013-04-06') AND date_col < date('2013-04-06' + interval 1 day)
ORDER BY date_col DESC;

select @cnt;
+------+
| @cnt |
+------+
|  961 |
+------+

And we see that index condition was checked 961 times, even if we just read 10 rows.

EXPLAIN:

MySQL [j15]> explain select * from t1 force index(date_col)  where     col1 + 10000000 > (@cnt:=@cnt+1) and    date_col >= date('2013-04-06') AND date_col < date('2013-04-06' + interval 1 day) ORDER BY date_col DESC;
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t1    | p0         | range | date_col      | date_col | 5       | NULL |   10 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
[20 Oct 2016 18:46] Sergey Petrunya
Reproducible on 5.7.16, too.
[21 Oct 2016 5:47] Umesh Shastry
Hello Sergey,

Thank you for the report and test case.

Thanks,
Umesh
[21 Oct 2016 17:25] Sergey Petrunya
The failure scenario is:

Put a breakpoint in QUICK_SELECT_DESC::get_next 
Observe execution to reach this call:

       file->set_end_range(&min_range, handler::RANGE_SCAN_DESC);

Put a hardware watchpoint on file->end_range 
and see:
  Hardware watchpoint 4: *$a
  Old value = (key_range *) 0x7fff9401b4e8
  New value = (key_range *) 0x0

(gdb) wher
  #0  Partition_helper::ph_index_read_map (this=0x7fff9401b6a0, buf=0x7fff9401b820 "\377\200", key=0x7fff9401f678 "\231\217N", keypart_map=1, find_flag=HA_READ_BEFORE_KEY) at /home/psergey/dev-git/mysql-5.7/sql/partitioning/partition_handler.cc:2499
  #1  0x0000000001ab3c8f in ha_innopart::index_read_map (this=0x7fff9401b3b0, buf=0x7fff9401b820 "\377\200", key=0x7fff9401f678 "\231\217N", keypart_map=1, find_flag=HA_READ_BEFORE_KEY) at /home/psergey/dev-git/mysql-5.7/storage/innobase/handler/ha_innopart.h:1299
  #2  0x0000000000f7a13f in handler::ha_index_read_map (this=0x7fff9401b3b0, buf=0x7fff9401b820 "\377\200", key=0x7fff9401f678 "\231\217N", keypart_map=1, find_flag=HA_READ_BEFORE_KEY) at /home/psergey/dev-git/mysql-5.7/sql/handler.cc:2998
  #3  0x00000000017e50c8 in QUICK_SELECT_DESC::get_next (this=0x7fff94031da0) at /home/psergey/dev-git/mysql-5.7/sql/opt_range.cc:11495

So, Partition_helper that clears end_range, and then ICP check walks off the
end of the range.
[21 Oct 2016 17:49] Sergey Petrunya
Looking at how handler::end_range is set or cleared.

It is set when reading ranges:
- read_range_first() sets it
- then read_range_first() and read_range_next() use it.
- ICP code also uses it

It is set by QUICK_SELECT_DESC: 
- QUICK_SELECT_DESC::get_next sets it
- then it calls index_read_map() or index_last(), followed by index_prev()
- ICP code inside these calls use it.

It is set by Partition_helper::ph_read_range_first
- it sets m_handler->end_key
- Then the code in Partition_helper::handle_unordered_scan_next_partition
  uses it.

Where is handler::end_range cleared? 
- it is done in handler::ha_index_end(), handler::ha_index_init() and 
  other [de]initialization calls.
- within an index scan, it is not cleared:
  - if the scan uses read_range_XXX functions, each read_range_first
    call will set end_range accordingly.
  - if the scan is a reverse scan done by QUICK_SELECT_DESC, then
    QUICK_SELECT_DESC code will call set_end_range() for each new range.
[21 Oct 2016 17:50] Sergey Petrunya
Considering the above, there seems to be absolutely no reason to have

  m_handler->end_range= NULL;

assignment inside these functions:
- Partition_helper::ph_index_read_last_map
- Partition_helper::ph_index_first
- Partition_helper::ph_index_read_map
[25 Oct 2016 14:27] Sergey Petrunya
Suggested patch

Attachment: bug83470.diff (text/x-patch), 4.83 KiB.

[25 Oct 2016 14:29] Sergey Petrunya
Please find the patch attached. I'm offering it under BSD licence.
[21 Dec 2016 11:00] Shane Bester
see also http://bugs.mysql.com/bug.php?id=84107
[16 May 2017 2:53] Jon Stephens
Documented fix in the 5.7.19 changelog, as follows:

    When a query performed a reverse-ordered range scan (to satisfy
    ORDER BY ... DESC) using index condition pushdown on a
    partitioned InnoDB table, it could take an unreasonably long
    time to finish. This was due to the fact that the condition
    pushdown check failed to get the bounds of the range, so that
    the scan continued to read index tuples until it reached the
    first value in the index.

Closed.