MySQL Bugs: #43011: inconsistent content for 'rows' column in EXPLAIN for index

Bug #43011	inconsistent content for 'rows' column in EXPLAIN for index_merge
Submitted:	19 Feb 2009 10:01	Modified:	15 Sep 2009 8:29
Reporter:	Andrii Nikitin	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.0.75, 5.0., 5.1.	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When usually 'rows' column in result of EXPLAIN command shows 'rows examined', 'index_merge' reports 'columns found'.

Manual states:
"
The rows column indicates the number of rows MySQL believes it must examine to execute the query.
"

See 'How to repeat:' for details.

I believe this is not just 'cosmetic' problem, because data from "rows" column is included into set of criteria used to choose Best execution plan. 
So that may be the root cause of bugs #32254, bug #41660 and bug #42759.

How to repeat:
Run script:

-- --------------------------------------------
-- generate data
create table t(a int, b int, c int, d int) engine=innodb;

insert into t select rand()*1000, rand()*1000, rand()*1000, rand()*1000 
from mysql.help_topic a, mysql.help_topic b, mysql.user c limit 100000;

update t set a=10 where rand() < 0.05;
update t set b=10 where rand() < 0.05;

alter table t add index ia(a), add index ib(b);

explain select count(*) from t where a=10 and b=10\G
flush status;
select sql_no_cache count(*) from t where a=10 and b=10;
show status like "h%";
-- --------------------------------------------

Results of EXPLAIN:
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index_merge
possible_keys: ia,ib
          key: ia,ib
      key_len: 5,5
          ref: NULL
         rows: 360
        Extra: Using intersect(ia,ib); Using where; Using index

Results of 'show status':
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 4     |
| Handler_read_next          | 10130 |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 14    |
+----------------------------+-------+

Note that "Handler_read_next" shows that 10130 rows are examined while "rows" column reports 360. 

Below is script which demonstrates that other join types always report 'Examined rows' in 'rows' column of EXPLAIN according to manual:

-- -------------------------------------------
-- generate data
create table t(a int, b int, c int, d int) engine=innodb;

insert into t select rand()*1000, rand()*1000, rand()*1000, rand()*1000 
from mysql.help_topic a, mysql.help_topic b, mysql.user c limit 100000;

update t set a=10 where rand() < 0.05;
update t set b=10 where rand() < 0.05;
update t set c=10 where rand() < 0.01;
update t set d=10 where rand() < 0.01;

update t set a=20 where rand() < 0.005;
update t set b=NULL where rand() < 0.005;
update t set c=20 where rand() < 0.005;
update t set d=20 where rand() < 0.005;

alter table t add index ia(a), add index ib(b), add index ic(c), add 
index iab(a,b);

-- disable query cache
set query_cache_type=0;

-- demonstrate relation between 'rows' column, returned rows (found_rows()) 
-- and rows examined (Handler_read) 
-- for different join types.

-- ALL --

explain select d from t where d<20;
flush status;
select SQL_CALC_FOUND_ROWS d from t where d<20 LIMIT 1;
select found_rows();
show status like 'handler_read%';

-- INDEX --

explain select a from t where a>50*b;
flush status;
select SQL_CALC_FOUND_ROWS a from t where a>50*b LIMIT 1;
select found_rows();
show status like 'handler_read%';

-- RANGE --

explain select a from t where a between 19 and 20 and d<10;
flush status;
select SQL_CALC_FOUND_ROWS a from t where a between 19 and 20 and d<10 LIMIT 1;
select found_rows();
show status like 'handler_read%';

-- INDEX_MERGE --

explain select a from t ignore index(iab) where a=10 and b=10 and d<10;
flush status;
select SQL_CALC_FOUND_ROWS a from t ignore index(iab) where a=10 and b=10 LIMIT 1;
select found_rows();
show status like 'handler_read%';

-- REF_OR_NULL --

explain select b from t where (b=10 or b is NULL) and d<10;
flush status;
select SQL_CALC_FOUND_ROWS b from t where (b=10 or b is NULL) and d<10 LIMIT 1;
select found_rows();
show status like 'handler_read%';

-- REF --

explain select b from t where b=10 and d<10;
flush status;
select SQL_CALC_FOUND_ROWS b from t where b=10 and d<10 LIMIT 1;
select found_rows();
show status like 'handler_read%';

Suggested fix:
not sure

Inconsistency is obvious:

mysql> explain select count(*) from t where a=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: ia
          key: ia
      key_len: 5
          ref: const
         rows: 4919
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> explain select count(*) from t where b=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: ib
          key: ib
      key_len: 5
          ref: const
         rows: 4758
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> explain select count(*) from t where a=10 and b=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index_merge
possible_keys: ia,ib
          key: ib,ia
      key_len: 5,5
          ref: NULL
         rows: 234
        Extra: Using intersect(ib,ia); Using where; Using index
1 row in set (0.02 sec)

Index merge has to read rows from both indexes and merge them, so in "rows" we should not see, in any case, value smaller than estimation for each individual index.

As other bugs referenced shows, this estimation is not only presented in EXPLAIN results, but is also used when comparing index merge with other possible access pathes.

There is no way this could be related to BUG#41660. This bug is about execution and estimates, while that bug is about malloc discipline to be used at execution phase.

I think the number show in EXPLAIN is correct.  There is place for only one number there, and given the choice, we would put in "The number of records which will be produced by the used access method",  or "The number of records that will remain after the records that were produced by the chosen access method were joined with the records of preceding tables".   Putting the number of enumerated index tuples would be greatly misleading for all join cases.

Here's a patch that introduces switches to disable index_merge or its individual strategies: http://lists.mysql.com/commits/67226 .

It may be used as a makeshift measure before we fix cost model.

Valeriy,

I do not think that this is a bug. Our optimizer calculates the approximate number of rows that will be contained in the intersect. That is very nice, and hence 'rows' column is correct. You can have two indices , each having one million rows, but which intersect in one row only.