| Bug #54398 | Explain output for estimated number of rows varies between runs on InnoDB | ||
|---|---|---|---|
| Submitted: | 10 Jun 2010 11:38 | Modified: | 25 Jun 2010 10:00 |
| Reporter: | Olav Sandstå | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | Tests: Server | Severity: | S3 (Non-critical) |
| Version: | 5.5.5, 5.6.99 | OS: | Any |
| Assigned to: | Jimmy Yang | CPU Architecture: | Any |
| Tags: | pb2 | ||
[10 Jun 2010 12:13]
Olav Sandstå
The test failure is still present if I disable both MRR and ICP when running the test: set session optimizer_switch='mrr=off,index_condition_pushdown=off';
[10 Jun 2010 13:01]
Olav Sandstå
Running the innodb_mrr test from the 6.0 codebase against mysql-next-mr-bugfixing tree also produce the same variation in the key_len field in the explain output: explain select * from t2 force index (d) where d < 10; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range d d 5 NULL 53 Using where +1 SIMPLE t2 range d d 5 NULL 47 Using where Updating the MySQL version to include mysql-next-mr.
[10 Jun 2010 13:03]
Olav Sandstå
Simplified test case for reproducing this against mysql-next-mr:
-- source include/have_innodb.inc
create table t1(a int);
insert into t1 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
# Try a very big rowid
create table t2 (a char(100), b char(100), c char(100), d int,
filler char(10), key(d), primary key (a,b,c)) engine= innodb;
insert into t2 select A.a, B.a, B.a, A.a, 'filler' from t1 A, t1 B;
explain select * from t2 force index (d) where d < 10;
drop table t2;
drop table t1;
[10 Jun 2010 13:22]
Olav Sandstå
This bug is also present in 5.5.5 (using source from mysql-trunk-bugfixing) using the simplified test case.
[10 Jun 2010 13:59]
Olav Sandstå
I am not able to reproduce this behavior when using 5.1.48 (based on the latest source from mysql-5.1-bugteam).
[18 Jun 2010 9:29]
Olav Sandstå
A similar difference in explain output is also reported in Bug#45727.
[18 Jun 2010 13:04]
Øystein Grøvlen
There is a test case in subselect_sj2.test (in mysql-next-mr-opt-backporting) (soon to be renamed subquery_sj_innodb.inc) that fails in a similar way if I set all new optimizer switches to off: set optimizer_switch='semijoin=off,materialization=off,firstmatch=off,loosescan=off,index_condition_pushdown=off,mrr=off'; create table t0 (a int); insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); # Test overflow to MyISAM: set max_heap_table_size=16384; set join_buffer_size= 8192; create table t3 ( a int, b int, key(b), pk1 char(200), pk2 char(200), primary key(pk1, pk2) ) engine=innodb; insert into t3 select A.a + 10*B.a, A.a + 10*B.a, A.a + 10*B.a, A.a + 10*B.a from t0 A, t0 B where B.a <5; # -- [DISABLED Bug#54398] #explain select * from t3 where b in (select a from t0); select * from t3 where b in (select A.a+B.a from t0 A, t0 B where B.a<5);
[18 Jun 2010 15:04]
Calvin Sun
The parameter is called innodb_stats_sample_pages. See http://www.innodb.com/doc/innodb_plugin-1.0/innodb-other-changes.html#innodb-other-changes... for details.
[25 Jun 2010 10:00]
Jimmy Yang
This is not a bug. As mentioned previously, it is due to the #rows comes from statistics.
The number of rows displayed in the explain result comes from a ha_innobase::info call to get innodb's table->stat_n_rows value.
The table->stat_n_rows is normally incremented in row_insert_for_mysql() for each insert. But every once a while, it is adjusted by statistics value. We call dict_update_statistics() during inserts every onces a while:
row_update_statistics_if_needed()
/* Calculate new statistics if 1 / 16 of table has been modified
since the last time a statistics batch was run, or if
stat_modified_counter > 2 000 000 000 (to avoid wrap-around).
We calculate statistics at most every 16th round, since we may have
a counter table which is very small and updated very often. */
if (counter > 2000000000
|| ((ib_int64_t)counter > 16 + table->stat_n_rows / 16)) {
dict_update_statistics(table);
}
}
In btr_estimate_number_of_different_key_vals(), it calls btr_cur_open_at_rnd_pos_func() and page_cur_open_on_rnd_user_rec() to position to a randomly chosen position.
page_cur_open_on_rnd_user_rec()
{
...
rnd = (ulint) (page_cur_lcg_prng() % n_recs);
}
So here comes the randomness, and eventually the estimate could be a few rec more or less than the actual rows inserted, and it becomes increasingly inaccurate as pages increase.
Breakpoint 2, dict_update_statistics_low (table=0x9f4db18, has_dict_mutex=0)
at /home/jy/work/mysql5.5_6/mysql-trunk-innodb/storage/innobase/dict/dict0dict.c:4212
4212 if (table->ibd_file_missing) {
2: table->stat_n_rows = 79
4268 table->stat_n_rows = index->stat_n_diff_key_vals[
2: table->stat_n_rows = 79
(gdb) p index->stat_n_diff_key_vals[2]
$26 = 77
so you see the rows inserted (table->stat_n_rows )gets ret to 77 after dict_update_statistics_low() call.
The test cases should drop selecting on #rows
Thanks
Jimmy

Description: When running the MTR test innodb_mrr against the mysql-next-mr-opt-backporting tree it fails randomly (about 2 out of 3 runs) with the following error: explain select * from t2 force index (d) where d < 10; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range d d 5 NULL 53 Using index condition; Using MRR +1 SIMPLE t2 range d d 5 NULL 47 Using index condition; Using MRR The difference is in the explain output where the key_len field varies between being either 47 or 53. How to repeat: 1. Check out the source code from mysql-next-mr-opt-backporting 2. Compile, eg. cmake . -DWITH_DEBUG=1 -DWITH_INNOBASE_STORAGE_ENGINE=1 make 3. Run test: ./mtr innodb_mrr