Bug #53761 RANGE estimation for matched rows may be 200 times different
Submitted: 18 May 2010 19:12 Modified: 1 Nov 2010 21:28
Reporter: Andrii Nikitin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.5.2, 5.5.4-m3, 5.1 OS:Any
Assigned to: Vasil Dimov
Triage: Triaged: D2 (Serious) / R4 (High) / E5 (Major)

[18 May 2010 19:12] Andrii Nikitin
Description:
In 5.5.4 InnoDB Plugin 1.1.0 gives very wrong estimate for RANGE query for compound index from attached table.

In example testcase EXPLAIN shows estimation for 301036 rows, but actually only 14428 rows are returned. (2000% error). 

The problem is not repeatable in 5.1.46, so in real life wrong statistics affect execution plans for much more complex queries when testing 5.5 releases.

mysql > explain select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | test  | range | borders       | borders | 4       | NULL | 301036 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

mysql > select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;
+----------+
| count(*) |
+----------+
|    14428 |
+----------+
1 row in set (0.22 sec)

How to repeat:
1. load attached table test.sql.gz

2. execute following script:

# just in the case
analyze table test;

# following query produces result 14428 , 
# which shows how many rows match condition
select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;

# explain for the same query gives estimation for 301036 rows
explain select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;

# create single column index and check if estimate is better
alter table test add index(b_left);

# now estimate produces 28872 rows, which is acceptable accuracy (single column index)
explain select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;

# force compound index and make sure that estimation is still very wrong 301036
explain select count(*) FROM test use index(borders) WHERE b_left BETWEEN 2580259 AND 2609114;

Suggested fix:
estimation error may be 200% but not 2000%
[18 May 2010 19:32] Andrii Nikitin
data for test table uploaded to
ftp://ftp.mysql.com/pub/mysql/download/test-bug-53761.sql.gz
[8 Jul 2010 13:36] Vasil Dimov
I am able to repeat the exact same problem with 5.1.49:

16:26:43 mysql> select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;
+----------+
| count(*) |
+----------+
|    14428 |
+----------+
1 row in set (0.09 sec)

16:35:08 mysql> explain select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | test  | range | borders       | borders | 4       | NULL | 301036 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

16:35:11 mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.49    |
+-----------+
1 row in set (0.00 sec)

with both builtin innodb and innodb plugin.
[8 Jul 2010 13:42] Vasil Dimov
Breakpoint 1, btr_estimate_n_rows_in_range (index=0x80242d8a8, tuple1=0x8024328a8, 
    mode1=2, tuple2=0x802432ca8, mode2=1) at btr/btr0cur.c:2714
2714		mtr_start(&mtr);
Current language:  auto; currently c
(gdb) n
2716		cursor.path_arr = path1;
(gdb) 
2718		if (dtuple_get_n_fields(tuple1) > 0) {
(gdb) 
2720			btr_cur_search_to_nth_level(index, 0, tuple1, mode1,
(gdb) 
2729		mtr_commit(&mtr);
(gdb) 
2731		mtr_start(&mtr);
(gdb) 
2733		cursor.path_arr = path2;
(gdb) 
2735		if (dtuple_get_n_fields(tuple2) > 0) {
(gdb) 
2737			btr_cur_search_to_nth_level(index, 0, tuple2, mode2,
(gdb) 
2746		mtr_commit(&mtr);
(gdb) 
2750		n_rows = 1;
(gdb) 
2751		diverged = FALSE;	    /* This becomes true when the path is not
(gdb) 
2753		diverged_lot = FALSE;	    /* This becomes true when the paths are
(gdb) 
2755		divergence_level = 1000000; /* This is the level where paths diverged
(gdb) 
2757		for (i = 0; ; i++) {
(gdb) 
2760			slot1 = path1 + i;
(gdb) 
2761			slot2 = path2 + i;
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2792			if (!diverged && slot1->nth_rec != slot2->nth_rec) {
(gdb) 
2794				diverged = TRUE;
(gdb) 
2796				if (slot1->nth_rec < slot2->nth_rec) {
(gdb) 
2797					n_rows = slot2->nth_rec - slot1->nth_rec;
(gdb) 
2799					if (n_rows > 1) {
(gdb) ins n_rows
$1 = 1
(gdb) n
2792			if (!diverged && slot1->nth_rec != slot2->nth_rec) {
(gdb) 
2757		for (i = 0; ; i++) {
(gdb) 
2834		}
(gdb) 
2760			slot1 = path1 + i;
(gdb) 
2761			slot2 = path2 + i;
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2792			if (!diverged && slot1->nth_rec != slot2->nth_rec) {
(gdb) 
2810			} else if (diverged && !diverged_lot) {
(gdb) 
2813				    || slot2->nth_rec > 1) {
(gdb) 
2812				if (slot1->nth_rec < slot1->n_recs
(gdb) 
2815					diverged_lot = TRUE;
(gdb) 
2816					divergence_level = i;
(gdb) 
2818					n_rows = 0;
(gdb) 
2820					if (slot1->nth_rec < slot1->n_recs) {
(gdb) ins n_rows
$2 = 0
(gdb) n
2821						n_rows += slot1->n_recs
(gdb) 
2825					if (slot2->nth_rec > 1) {
(gdb) ins n_rows
$3 = 7
(gdb) n
2826						n_rows += slot2->nth_rec - 1;
(gdb) 
2810			} else if (diverged && !diverged_lot) {
(gdb) ins n_rows
$4 = 323
(gdb) ins slot2->nth_rec
$5 = 317
(gdb) n
2757		for (i = 0; ; i++) {
(gdb) 
2834		}
(gdb) 
2760			slot1 = path1 + i;
(gdb) 
2761			slot2 = path2 + i;
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2792			if (!diverged && slot1->nth_rec != slot2->nth_rec) {
(gdb) 
2810			} else if (diverged && !diverged_lot) {
(gdb) 
2829			} else if (diverged_lot) {
(gdb) 
2831				n_rows = (n_rows * (slot1->n_recs + slot2->n_recs))
(gdb) 
2757		for (i = 0; ; i++) {
(gdb) ins n_rows
$6 = 150518
(gdb) ins slot1->n_recs + slot2->n_recs
$7 = 932
(gdb) n
2834		}
(gdb) 
2760			slot1 = path1 + i;
(gdb) 
2761			slot2 = path2 + i;
(gdb) 
2764			    || slot2->nth_rec == ULINT_UNDEFINED) {
(gdb) 
2763			if (slot1->nth_rec == ULINT_UNDEFINED
(gdb) 
2766				if (i > divergence_level + 1) {
(gdb) 
2771					n_rows = n_rows * 2;
(gdb) 
2778				if (n_rows > index->table->stat_n_rows / 2) {
(gdb) 
2789				return(n_rows);
(gdb) ins n_rows
$8 = 301036
[8 Jul 2010 16:07] Vasil Dimov
Fix the number in the bug title, 301036/14428 = 20.86
[8 Jul 2010 16:12] Vasil Dimov
Also happens with 5.1.49 innodb builtin
[8 Jul 2010 17:36] Andrii Nikitin
Shows 900 times estimation error in 5.5.4 and very close estimation in 5.1.48

Attachment: estim.log (application/octet-stream, text), 5.12 KiB.

[8 Jul 2010 17:38] Andrii Nikitin
Some important notes: 

1. estimation error may be worse than 200 times - initial testcase doesn't demonstrate it (see logs in estim.log above and table below).

2. Similar problems were never reproducible for me with 5.1.48 for provided test table (I haven't tested 5.1.49 because it is not released yet)

3. Problem 100% repeatable in 5.5.4,  so most probably not related to fact that InnoDB estimation is "approximate".

4. Big chance that many users will get different execution plans for complex queries after upgrade.

Testcases results for query:
select count(*) FROM test WHERE b_left BETWEEN :p1 AND :p2;

p1      p2      rows    est5.5.4    est5.1.48	err5.5.4     err5.1.48
2591685 2591689    4    2700        3           675          1.33
2591685 2591759	  38    34200       38          900          1
2591659 2609114 8727    434576      23528       49,79        2.69

script:
explain select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591689;
select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591689;

explain select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591759;
select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591759;

explain select count(*) FROM test WHERE b_left BETWEEN 2591659 AND 2609114;
select count(*) FROM test WHERE b_left BETWEEN 2591659 AND 2609114;
[9 Jul 2010 16:31] Vasil Dimov
Interesting observation:

1. start 5.1/builtin, load the test table as test_b
2. start 5.1/plugin, load the test table as test_p

3. start 5.1/buitlin, shows correct results for test_b and wrong for test_p
4. start 5.1/plugin, shows correct results for test_b and wrong for test_p

So it is about the version which created the tree, not about the version which is running the EXPLAIN SELECT ... command.

I also tried with and old InnoDB Plugin 1.0.6 and it behaves the same way as the latest InnoDB Plugin.
[9 Jul 2010 16:37] Vasil Dimov
-rw-rw----  1 vd-dev  vd-dev  603979776 Jul  9 19:24 var/test/test_b.ibd
-rw-rw----  1 vd-dev  vd-dev  608174080 Jul  9 19:02 var/test/test_p.ibd
[9 Jul 2010 17:55] Vasil Dimov
analyze_index_level(table=test/test_b, index=borders, level=2)
analyze_index_level(): total recs: 8, total pages: 1, n_diff[1]: 8
analyze_index_level(): total recs: 8, total pages: 1, n_diff[2]: 8
analyze_index_level(): total recs: 8, total pages: 1, n_diff[3]: 8

analyze_index_level(table=test/test_b, index=borders, level=1)
analyze_index_level(): total recs: 3822, total pages: 8, n_diff[1]: 3822
analyze_index_level(): total recs: 3822, total pages: 8, n_diff[2]: 3822
analyze_index_level(): total recs: 3822, total pages: 8, n_diff[3]: 3822

analyze_index_level(table=test/test_b, index=borders, level=0)
analyze_index_level(): total recs: 2088663, total pages: 3822, n_diff[1]: 2088663
analyze_index_level(): total recs: 2088663, total pages: 3822, n_diff[2]: 2088663
analyze_index_level(): total recs: 2088663, total pages: 3822, n_diff[3]: 2088663

-----

analyze_index_level(table=test/test_p, index=borders, level=2)
analyze_index_level(): total recs: 9, total pages: 1, n_diff[1]: 9
analyze_index_level(): total recs: 9, total pages: 1, n_diff[2]: 9
analyze_index_level(): total recs: 9, total pages: 1, n_diff[3]: 9

analyze_index_level(table=test/test_p, index=borders, level=1)
analyze_index_level(): total recs: 4111, total pages: 9, n_diff[1]: 4111
analyze_index_level(): total recs: 4111, total pages: 9, n_diff[2]: 4111
analyze_index_level(): total recs: 4111, total pages: 9, n_diff[3]: 4111

analyze_index_level(table=test/test_p, index=borders, level=0)
analyze_index_level(): total recs: 2088663, total pages: 4111, n_diff[1]: 2088663
analyze_index_level(): total recs: 2088663, total pages: 4111, n_diff[2]: 2088663
analyze_index_level(): total recs: 2088663, total pages: 4111, n_diff[3]: 2088663
[16 Aug 2010 14:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115810

3156 Vasil Dimov	2010-08-16
      Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
      
      Improve the range estimation algorithm.
      
      Previously:
      For a given level the algo knows the number of pages in the requested range
      and the number of records on the leftmost and the rightmost page. Then it
      assumes all pages in between contain the average between the two border pages
      and multiplies this average number by the number of intermediate pages.
      
      With this change:
      Same idea, but peek a few (10) of the intermediate pages to get a better
      estimate of the average number of records per page. If there are less than 10
      intermediate pages then all of them will be scanned and the result will be
      precise, not an estimation.
      
      In the bug report one of the examples has a btree with a snippet of the leaf
      level like this:
      page1(899 records), page2(1 record), page3(1 record), page4(1 record)
      so when trying to estimate, the previous algo, assumed there are average
      (899+1)/2=450 records per page which went terribly wrong. With this change
      page2 and page3 will be read and the exact number of records will be returned.
[16 Aug 2010 14:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115812

3156 Vasil Dimov	2010-08-16
      Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
      
      Improve the range estimation algorithm.
      
      Previously:
      For a given level the algo knows the number of pages in the requested range and the n
      
      With this change:
      Same idea, but peek a few (10) of the intermediate pages to get a better estimate of 
      
      In the bug report one of the examples has a btree with a snippet of the leaf level li
      page1(899 records), page2(1 record), page3(1 record), page4(1 record)
      so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45
      Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
      
      Improve the range estimation algorithm.
      
      Previously:
      For a given level the algo knows the number of pages in the requested range
      and the number of records on the leftmost and the rightmost page. Then it
      assumes all pages in between contain the average between the two border pages
      and multiplies this average number by the number of intermediate pages.
      
      With this change:
      Same idea, but peek a few (10) of the intermediate pages to get a better
      estimate of the average number of records per page. If there are less than 10
      intermediate pages then all of them will be scanned and the result will be
      precise, not an estimation.
      
      In the bug report one of the examples has a btree with a snippet of the leaf
      level like this:
      page1(899 records), page2(1 record), page3(1 record), page4(1 record)
      so when trying to estimate, the previous algo, assumed there are average
      (899+1)/2=450 records per page which went terribly wrong. With this change
      page2 and page3 will be read and the exact number of records will be returned.
      
      Approved by:	Sunny (rb://401)
[25 Aug 2010 9:22] Bugs System
Pushed into mysql-5.5 5.5.6-m3 (revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (version source revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (merge vers: 5.5.6-m3) (pib:20)
[30 Aug 2010 8:29] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (version source revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (merge vers: 5.6.1-m4) (pib:21)
[30 Aug 2010 8:33] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (version source revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (pib:21)
[1 Nov 2010 21:28] John Russell
Added to change log:

An EXPLAIN plan for an InnoDB table could vary greatly in the
estimated cost for a BETWEEN clause.