Bug #54398 | Explain output for estimated number of rows varies between runs on InnoDB | ||
---|---|---|---|
Submitted: | 10 Jun 2010 11:38 | Modified: | 25 Jun 2010 10:00 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Tests: Server | Severity: | S3 (Non-critical) |
Version: | 5.5.5, 5.6.99 | OS: | Any |
Assigned to: | Jimmy Yang | CPU Architecture: | Any |
Tags: | pb2 |
[10 Jun 2010 11:38]
Olav Sandstå
[10 Jun 2010 12:13]
Olav Sandstå
The test failure is still present if I disable both MRR and ICP when running the test: set session optimizer_switch='mrr=off,index_condition_pushdown=off';
[10 Jun 2010 13:01]
Olav Sandstå
Running the innodb_mrr test from the 6.0 codebase against mysql-next-mr-bugfixing tree also produce the same variation in the key_len field in the explain output: explain select * from t2 force index (d) where d < 10; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range d d 5 NULL 53 Using where +1 SIMPLE t2 range d d 5 NULL 47 Using where Updating the MySQL version to include mysql-next-mr.
[10 Jun 2010 13:03]
Olav Sandstå
Simplified test case for reproducing this against mysql-next-mr: -- source include/have_innodb.inc create table t1(a int); insert into t1 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); # Try a very big rowid create table t2 (a char(100), b char(100), c char(100), d int, filler char(10), key(d), primary key (a,b,c)) engine= innodb; insert into t2 select A.a, B.a, B.a, A.a, 'filler' from t1 A, t1 B; explain select * from t2 force index (d) where d < 10; drop table t2; drop table t1;
[10 Jun 2010 13:22]
Olav Sandstå
This bug is also present in 5.5.5 (using source from mysql-trunk-bugfixing) using the simplified test case.
[10 Jun 2010 13:59]
Olav Sandstå
I am not able to reproduce this behavior when using 5.1.48 (based on the latest source from mysql-5.1-bugteam).
[18 Jun 2010 9:29]
Olav Sandstå
A similar difference in explain output is also reported in Bug#45727.
[18 Jun 2010 13:04]
Øystein Grøvlen
There is a test case in subselect_sj2.test (in mysql-next-mr-opt-backporting) (soon to be renamed subquery_sj_innodb.inc) that fails in a similar way if I set all new optimizer switches to off: set optimizer_switch='semijoin=off,materialization=off,firstmatch=off,loosescan=off,index_condition_pushdown=off,mrr=off'; create table t0 (a int); insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); # Test overflow to MyISAM: set max_heap_table_size=16384; set join_buffer_size= 8192; create table t3 ( a int, b int, key(b), pk1 char(200), pk2 char(200), primary key(pk1, pk2) ) engine=innodb; insert into t3 select A.a + 10*B.a, A.a + 10*B.a, A.a + 10*B.a, A.a + 10*B.a from t0 A, t0 B where B.a <5; # -- [DISABLED Bug#54398] #explain select * from t3 where b in (select a from t0); select * from t3 where b in (select A.a+B.a from t0 A, t0 B where B.a<5);
[18 Jun 2010 15:04]
Calvin Sun
The parameter is called innodb_stats_sample_pages. See http://www.innodb.com/doc/innodb_plugin-1.0/innodb-other-changes.html#innodb-other-changes... for details.
[25 Jun 2010 10:00]
Jimmy Yang
This is not a bug. As mentioned previously, it is due to the #rows comes from statistics. The number of rows displayed in the explain result comes from a ha_innobase::info call to get innodb's table->stat_n_rows value. The table->stat_n_rows is normally incremented in row_insert_for_mysql() for each insert. But every once a while, it is adjusted by statistics value. We call dict_update_statistics() during inserts every onces a while: row_update_statistics_if_needed() /* Calculate new statistics if 1 / 16 of table has been modified since the last time a statistics batch was run, or if stat_modified_counter > 2 000 000 000 (to avoid wrap-around). We calculate statistics at most every 16th round, since we may have a counter table which is very small and updated very often. */ if (counter > 2000000000 || ((ib_int64_t)counter > 16 + table->stat_n_rows / 16)) { dict_update_statistics(table); } } In btr_estimate_number_of_different_key_vals(), it calls btr_cur_open_at_rnd_pos_func() and page_cur_open_on_rnd_user_rec() to position to a randomly chosen position. page_cur_open_on_rnd_user_rec() { ... rnd = (ulint) (page_cur_lcg_prng() % n_recs); } So here comes the randomness, and eventually the estimate could be a few rec more or less than the actual rows inserted, and it becomes increasingly inaccurate as pages increase. Breakpoint 2, dict_update_statistics_low (table=0x9f4db18, has_dict_mutex=0) at /home/jy/work/mysql5.5_6/mysql-trunk-innodb/storage/innobase/dict/dict0dict.c:4212 4212 if (table->ibd_file_missing) { 2: table->stat_n_rows = 79 4268 table->stat_n_rows = index->stat_n_diff_key_vals[ 2: table->stat_n_rows = 79 (gdb) p index->stat_n_diff_key_vals[2] $26 = 77 so you see the rows inserted (table->stat_n_rows )gets ret to 77 after dict_update_statistics_low() call. The test cases should drop selecting on #rows Thanks Jimmy