Bug #53761 | RANGE estimation for matched rows may be 200 times different | ||
---|---|---|---|
Submitted: | 18 May 2010 19:12 | Modified: | 1 Nov 2010 21:28 |
Reporter: | Andrii Nikitin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S3 (Non-critical) |
Version: | 5.5.2, 5.5.4-m3, 5.1 | OS: | Any |
Assigned to: | Vasil Dimov | CPU Architecture: | Any |
[18 May 2010 19:12]
Andrii Nikitin
[18 May 2010 19:32]
Andrii Nikitin
data for test table uploaded to ftp://ftp.mysql.com/pub/mysql/download/test-bug-53761.sql.gz
[8 Jul 2010 13:36]
Vasil Dimov
I am able to repeat the exact same problem with 5.1.49: 16:26:43 mysql> select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114; +----------+ | count(*) | +----------+ | 14428 | +----------+ 1 row in set (0.09 sec) 16:35:08 mysql> explain select count(*) FROM test WHERE b_left BETWEEN 2580259 AND 2609114; +----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ | 1 | SIMPLE | test | range | borders | borders | 4 | NULL | 301036 | Using where; Using index | +----+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 1 row in set (0.00 sec) 16:35:11 mysql> select version(); +-----------+ | version() | +-----------+ | 5.1.49 | +-----------+ 1 row in set (0.00 sec) with both builtin innodb and innodb plugin.
[8 Jul 2010 13:42]
Vasil Dimov
Breakpoint 1, btr_estimate_n_rows_in_range (index=0x80242d8a8, tuple1=0x8024328a8, mode1=2, tuple2=0x802432ca8, mode2=1) at btr/btr0cur.c:2714 2714 mtr_start(&mtr); Current language: auto; currently c (gdb) n 2716 cursor.path_arr = path1; (gdb) 2718 if (dtuple_get_n_fields(tuple1) > 0) { (gdb) 2720 btr_cur_search_to_nth_level(index, 0, tuple1, mode1, (gdb) 2729 mtr_commit(&mtr); (gdb) 2731 mtr_start(&mtr); (gdb) 2733 cursor.path_arr = path2; (gdb) 2735 if (dtuple_get_n_fields(tuple2) > 0) { (gdb) 2737 btr_cur_search_to_nth_level(index, 0, tuple2, mode2, (gdb) 2746 mtr_commit(&mtr); (gdb) 2750 n_rows = 1; (gdb) 2751 diverged = FALSE; /* This becomes true when the path is not (gdb) 2753 diverged_lot = FALSE; /* This becomes true when the paths are (gdb) 2755 divergence_level = 1000000; /* This is the level where paths diverged (gdb) 2757 for (i = 0; ; i++) { (gdb) 2760 slot1 = path1 + i; (gdb) 2761 slot2 = path2 + i; (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2792 if (!diverged && slot1->nth_rec != slot2->nth_rec) { (gdb) 2794 diverged = TRUE; (gdb) 2796 if (slot1->nth_rec < slot2->nth_rec) { (gdb) 2797 n_rows = slot2->nth_rec - slot1->nth_rec; (gdb) 2799 if (n_rows > 1) { (gdb) ins n_rows $1 = 1 (gdb) n 2792 if (!diverged && slot1->nth_rec != slot2->nth_rec) { (gdb) 2757 for (i = 0; ; i++) { (gdb) 2834 } (gdb) 2760 slot1 = path1 + i; (gdb) 2761 slot2 = path2 + i; (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2792 if (!diverged && slot1->nth_rec != slot2->nth_rec) { (gdb) 2810 } else if (diverged && !diverged_lot) { (gdb) 2813 || slot2->nth_rec > 1) { (gdb) 2812 if (slot1->nth_rec < slot1->n_recs (gdb) 2815 diverged_lot = TRUE; (gdb) 2816 divergence_level = i; (gdb) 2818 n_rows = 0; (gdb) 2820 if (slot1->nth_rec < slot1->n_recs) { (gdb) ins n_rows $2 = 0 (gdb) n 2821 n_rows += slot1->n_recs (gdb) 2825 if (slot2->nth_rec > 1) { (gdb) ins n_rows $3 = 7 (gdb) n 2826 n_rows += slot2->nth_rec - 1; (gdb) 2810 } else if (diverged && !diverged_lot) { (gdb) ins n_rows $4 = 323 (gdb) ins slot2->nth_rec $5 = 317 (gdb) n 2757 for (i = 0; ; i++) { (gdb) 2834 } (gdb) 2760 slot1 = path1 + i; (gdb) 2761 slot2 = path2 + i; (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2792 if (!diverged && slot1->nth_rec != slot2->nth_rec) { (gdb) 2810 } else if (diverged && !diverged_lot) { (gdb) 2829 } else if (diverged_lot) { (gdb) 2831 n_rows = (n_rows * (slot1->n_recs + slot2->n_recs)) (gdb) 2757 for (i = 0; ; i++) { (gdb) ins n_rows $6 = 150518 (gdb) ins slot1->n_recs + slot2->n_recs $7 = 932 (gdb) n 2834 } (gdb) 2760 slot1 = path1 + i; (gdb) 2761 slot2 = path2 + i; (gdb) 2764 || slot2->nth_rec == ULINT_UNDEFINED) { (gdb) 2763 if (slot1->nth_rec == ULINT_UNDEFINED (gdb) 2766 if (i > divergence_level + 1) { (gdb) 2771 n_rows = n_rows * 2; (gdb) 2778 if (n_rows > index->table->stat_n_rows / 2) { (gdb) 2789 return(n_rows); (gdb) ins n_rows $8 = 301036
[8 Jul 2010 16:07]
Vasil Dimov
Fix the number in the bug title, 301036/14428 = 20.86
[8 Jul 2010 16:12]
Vasil Dimov
Also happens with 5.1.49 innodb builtin
[8 Jul 2010 17:36]
Andrii Nikitin
Shows 900 times estimation error in 5.5.4 and very close estimation in 5.1.48
Attachment: estim.log (application/octet-stream, text), 5.12 KiB.
[8 Jul 2010 17:38]
Andrii Nikitin
Some important notes: 1. estimation error may be worse than 200 times - initial testcase doesn't demonstrate it (see logs in estim.log above and table below). 2. Similar problems were never reproducible for me with 5.1.48 for provided test table (I haven't tested 5.1.49 because it is not released yet) 3. Problem 100% repeatable in 5.5.4, so most probably not related to fact that InnoDB estimation is "approximate". 4. Big chance that many users will get different execution plans for complex queries after upgrade. Testcases results for query: select count(*) FROM test WHERE b_left BETWEEN :p1 AND :p2; p1 p2 rows est5.5.4 est5.1.48 err5.5.4 err5.1.48 2591685 2591689 4 2700 3 675 1.33 2591685 2591759 38 34200 38 900 1 2591659 2609114 8727 434576 23528 49,79 2.69 script: explain select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591689; select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591689; explain select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591759; select count(*) FROM test WHERE b_left BETWEEN 2591685 AND 2591759; explain select count(*) FROM test WHERE b_left BETWEEN 2591659 AND 2609114; select count(*) FROM test WHERE b_left BETWEEN 2591659 AND 2609114;
[9 Jul 2010 16:31]
Vasil Dimov
Interesting observation: 1. start 5.1/builtin, load the test table as test_b 2. start 5.1/plugin, load the test table as test_p 3. start 5.1/buitlin, shows correct results for test_b and wrong for test_p 4. start 5.1/plugin, shows correct results for test_b and wrong for test_p So it is about the version which created the tree, not about the version which is running the EXPLAIN SELECT ... command. I also tried with and old InnoDB Plugin 1.0.6 and it behaves the same way as the latest InnoDB Plugin.
[9 Jul 2010 16:37]
Vasil Dimov
-rw-rw---- 1 vd-dev vd-dev 603979776 Jul 9 19:24 var/test/test_b.ibd -rw-rw---- 1 vd-dev vd-dev 608174080 Jul 9 19:02 var/test/test_p.ibd
[9 Jul 2010 17:55]
Vasil Dimov
analyze_index_level(table=test/test_b, index=borders, level=2) analyze_index_level(): total recs: 8, total pages: 1, n_diff[1]: 8 analyze_index_level(): total recs: 8, total pages: 1, n_diff[2]: 8 analyze_index_level(): total recs: 8, total pages: 1, n_diff[3]: 8 analyze_index_level(table=test/test_b, index=borders, level=1) analyze_index_level(): total recs: 3822, total pages: 8, n_diff[1]: 3822 analyze_index_level(): total recs: 3822, total pages: 8, n_diff[2]: 3822 analyze_index_level(): total recs: 3822, total pages: 8, n_diff[3]: 3822 analyze_index_level(table=test/test_b, index=borders, level=0) analyze_index_level(): total recs: 2088663, total pages: 3822, n_diff[1]: 2088663 analyze_index_level(): total recs: 2088663, total pages: 3822, n_diff[2]: 2088663 analyze_index_level(): total recs: 2088663, total pages: 3822, n_diff[3]: 2088663 ----- analyze_index_level(table=test/test_p, index=borders, level=2) analyze_index_level(): total recs: 9, total pages: 1, n_diff[1]: 9 analyze_index_level(): total recs: 9, total pages: 1, n_diff[2]: 9 analyze_index_level(): total recs: 9, total pages: 1, n_diff[3]: 9 analyze_index_level(table=test/test_p, index=borders, level=1) analyze_index_level(): total recs: 4111, total pages: 9, n_diff[1]: 4111 analyze_index_level(): total recs: 4111, total pages: 9, n_diff[2]: 4111 analyze_index_level(): total recs: 4111, total pages: 9, n_diff[3]: 4111 analyze_index_level(table=test/test_p, index=borders, level=0) analyze_index_level(): total recs: 2088663, total pages: 4111, n_diff[1]: 2088663 analyze_index_level(): total recs: 2088663, total pages: 4111, n_diff[2]: 2088663 analyze_index_level(): total recs: 2088663, total pages: 4111, n_diff[3]: 2088663
[16 Aug 2010 14:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/115810 3156 Vasil Dimov 2010-08-16 Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the number of records on the leftmost and the rightmost page. Then it assumes all pages in between contain the average between the two border pages and multiplies this average number by the number of intermediate pages. With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of the average number of records per page. If there are less than 10 intermediate pages then all of them will be scanned and the result will be precise, not an estimation. In the bug report one of the examples has a btree with a snippet of the leaf level like this: page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=450 records per page which went terribly wrong. With this change page2 and page3 will be read and the exact number of records will be returned.
[16 Aug 2010 14:24]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/115812 3156 Vasil Dimov 2010-08-16 Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the n With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of In the bug report one of the examples has a btree with a snippet of the leaf level li page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45 Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the number of records on the leftmost and the rightmost page. Then it assumes all pages in between contain the average between the two border pages and multiplies this average number by the number of intermediate pages. With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of the average number of records per page. If there are less than 10 intermediate pages then all of them will be scanned and the result will be precise, not an estimation. In the bug report one of the examples has a btree with a snippet of the leaf level like this: page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=450 records per page which went terribly wrong. With this change page2 and page3 will be read and the exact number of records will be returned. Approved by: Sunny (rb://401)
[25 Aug 2010 9:22]
Bugs System
Pushed into mysql-5.5 5.5.6-m3 (revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (version source revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (merge vers: 5.5.6-m3) (pib:20)
[30 Aug 2010 8:29]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (version source revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (merge vers: 5.6.1-m4) (pib:21)
[30 Aug 2010 8:33]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (version source revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (pib:21)
[1 Nov 2010 21:28]
John Russell
Added to change log: An EXPLAIN plan for an InnoDB table could vary greatly in the estimated cost for a BETWEEN clause.