MySQL Bugs: #64040: SELECT WHERE pk<N scans full table

Bug #64040	SELECT WHERE pk<N scans full table
Submitted:	16 Jan 2012 9:32	Modified:	4 May 2012 12:30
Reporter:	xiaobin lin (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1,5.5	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	consistent read, For Update, LOCK IN SHARE MODE

Description:
Seem that a select statement with  "FOR UPDATE "  after begin do not create a CONSISTENT view.

How to repeat:
create table tb(a int key)engine=innodb;         |  
insert tb values(10),(11),(13),(20);             |
begin;                                           |
select * from tb where a<15 and a>13 for update; |
--------------------------------------------------------------------------------------------
                                                 |[other thread does] insert tb values(200);                                     | 
--------------------------------------------------------------------------------------------
select * from tb; -- returns 200 too             |

Suggested fix:
CONSISTENT view starts at the first "select " statement

If you replace the FOR UPDATE with LOCK IN SHARE MODE, it will behave in the same way. If you omit the FOR UPDATE, the record (200) will not be displayed by the subsequent SELECT. Also, if you replace the BEGIN with START TRANSACTION WITH CONSISTENT SNAPSHOT, the read view will be created.

This bug was filed after initial discussion. The initial test case had a second non-indexed column, b INT. For some reason, the INSERT would hang if the second column is missing.

Note that even if we fixed this case, we would still get into a similar situation if you replace the SELECT...FOR UPDATE with UPDATE. In that case, the read view would not be created, and a subsequent non-locking SELECT would create the read view and see changes made by a later-committed transaction.

Because read view creation is a somewhat expensive operation, I do not think that we should do it for every transaction.

The INSERT does not hang if the table contains 1000 rows with a>200. This looks like an optimizer bug. The SELECT should not go beyond the record 20 when the WHERE condition is a<15.

I would say that the originally reported bug (locking read does not create a consistent read view) is not a bug. But, the simplified test case revealed what could be considered an optimizer bug: for a small table, it may issue a full table scan even when there is a range condition on the primary key. This makes sense for MyISAM (heap-organized table), but not for InnoDB (index-organized table).

I will try to answer Marko's question about whether this shows a bug
in the optimizer that causes a full table scan when there are few
records in the table. I am using the following simplified test case:

create table t1 (
  a int key)
engine=innodb;

insert into t1 values (13);
#insert into t1 values (20);

select * from t1 where a<15 for update;

drop table t1;

1. Some background:
===================

The optimizer can choose from the following three execution strategies
for executing this query:

1. Table scan ("ALL" written in explain output): "explanation": This
will read all records from the table without using an index

2. Index scan ("index" written in explain output) "explanation": This
will scan the entire index.

3. Range scan ("range" written in explain output): Read a subset of
the table using an index as specified by the range criteria.

2. Execution plan:
=================

Using explain for this query I get:

a. With only 1 record in the table:

  explain select * from t1 where a<15 for update;
  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
  1	SIMPLE	t1	index	PRIMARY	PRIMARY	4	NULL	1	Using where; Using index

The optimizer has chosen to run this query as a "index scan" and will read through the entire index. This causes all records to be locked (including the "supremum record) for update preventing any new records to be inserted into the table.

b. With two records in the table (by inserting the value 20 into the table):

  explain select * from t1 where a<15 for update;
  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
  1	SIMPLE	t1	range	PRIMARY	PRIMARY	4	NULL	1	Using where; Using index

The optimizer has decided to use a range scan ( a < 15 ). Thus it will only scan the index until it has found a record that is larger than 15. Thus, in this case the record with value 20 is the last one that will be locked. It will then be possible for other transactions to insert records with values larger than 20 into the table.

3. Why does the optimizer choose this way?
==========================================

The optimizer estimates/computes the cost for each of these strategies. Here are the cost numbers:

                       Cost (1 record)        Cost (2 records)

a: Table Scan:              3,3                   3,5
b: Index scan:              1,2                   1,401
c: Range scan               1,21                  1,21

Thus, in the case with only one record in the table, the cost of an index scan (1,2) is lower than the cost of a range scan (1,21) (and the optimizer chooses to do a full index scan). When there are two records in the table, the index scan (1,401) is more costly than the range scan (1,21), and the optimizer selects to do a range scan.

Some details about how cost numbers are estimated:
==================================================

Cost for "index scan" is computed in SQL_SELECT::test_quick_select() (in opt_range.cc):

     double key_read_time= 
        param.table->file->index_only_read_time(key_for_use, 
                                                rows2double(records)) +
        records * ROW_EVALUATE_COST;

Roughly this estimates cost as the sum of:

 -IO-cost: the number of blocks in the index (see index_only_read_time())
 -CPU-cost: number of records to process * 0.2;

Cost for "range scan" is computed in handler::multi_range_read_info_const() (in handler.cc). First ha_innodb::records_in_range() is called to get an estimate on number of records in the range will be (in both examples above this returns correctly 1 record). The cost for the range scan is then computes like this:

      cost->add_io(read_time(keyno, n_ranges, total_rows) *
                   Cost_estimate::IO_BLOCK_READ_COST());
    cost->add_cpu(total_rows * ROW_EVALUATE_COST + 0.01);

This gives:

 -IO-cost: the function read_time() is called which gives 1 block as result
 -CPU-cost: number of records to process * 0.2 + 0,01

With the exception of why we add the 0,01 constant to the range estimate this looks as expected.

Conclusions:
============

With the exception of when there is only one record in the table the
optimizer selects to use a range scan (which is correct). With only
one record in the table the optimizer selects to do the query as an
index scan. Performance wise I do not see much issues with running an
index scan versus a range scan in this case so the choice of doing
this as an index scan seems ok to me (setting up the index scan might
even be a bit less costly than doing the index scan). The optimizer
does not take locking cost or locking consequences into account. So I
would not classify this as an optimizer bug. Still, it might be
possible to add a rule to the range optimizer that it should select to
prefer range scan over index scan if there is one for the same index
or adjust the cost model so that it would choose the range scan also
in this case.

The addition of 0.01 to the cost for doing a range scan was added to
handler::multi_range_read_info_const() in the following change set:

  sp1r-sergefp@mysql.com-20070309210824-13110

This change set contains the implementation of MRR.

This change set moved the addition of the 0.01 to the cost from
get_key_scans_params() to handler::multi_range_read_info_const(). The adding
of 0.01 seems to have been introduced as a fix for Bug#10037 in the following
change set:

  sp1r-sergefp@mysql.com-20050425215610-36479

This change set has the following comment on why we add 0.01 to the
cost for a range scan:

  /*
    We can resolve this by only reading through this key.
    0.01 is added to avoid races between range and 'index' scan.
  */
  found_read_time= get_index_only_read_time(param,found_records,keynr) +
                   cpu_cost + 0.01;

The commit message contains:

BUG#10037: A proper fix: Add 0.01 to cost of 'range' scans, don't add 0.01 to
cost of the 'index' scan.