| Bug #84366 | InnoDB index dives do not detect concurrent tree changes, return bogus estimates | ||
|---|---|---|---|
| Submitted: | 29 Dec 2016 6:39 | Modified: | 1 Feb 2018 10:51 |
| Reporter: | Laurynas Biveinis (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
| Version: | 5.6+ | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | index dive, innodb, range optimizer | ||
[29 Dec 2016 7:38]
MySQL Verification Team
Hello Laurynas, Thank you for the report and test case. Verified as described with 5.6.35 source build. Thanks, Umesh
[17 Jan 2017 4:17]
Laurynas Biveinis
Bug 84366 fix for 5.6 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug84366-5.6.patch (application/octet-stream, text), 14.06 KiB.
[17 Jan 2017 4:17]
Laurynas Biveinis
Bug 84366 fix for 5.7 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug84366-5.7.patch (application/octet-stream, text), 11.12 KiB.
[17 Jan 2017 4:17]
Laurynas Biveinis
Bug 84366 fix for 8.0.0 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug84366-8.0.patch (application/octet-stream, text), 11.12 KiB.
[17 Jan 2017 4:22]
Laurynas Biveinis
The contributed fix for 5.6 backports the btr_estimate_n_rows_in_range restart support ("Fix Bug#20618309 ASSERT SLOT1->PAGE_LEVEL == SLOT2->PAGE_LEVEL, BTR_ESTIMATE_N_ROWS_IN_RANGE()") and extends it to restart when the two paths have 1) crossed and 2) point to the same page and 3) that same page apparently has different number of records in the two paths. 5.7 and 8.0.0 patches are smaller as they don't need the backport.
If nothing else, please apply the mysql-test/include/* part, which stabilises the testsuite (spurious index_merge_innodb failures). While the code part makes InnoDB more robust for queries in parallel with purge, the testsuite patch removes the concurrent purge altogether.
[17 Jan 2017 4:24]
Laurynas Biveinis
The patch also adds an early zero return from estimate when paths cross instead of merely resetting n_rows to zero. This looks correct to me but I might be missing something.
[3 Mar 2017 4:57]
Laurynas Biveinis
There is one more index_merge_instability:
main.index_merge_innodb w2 [ fail ]
Test ended at 2017-01-11 08:54:34
CURRENT_TEST: main.index_merge_innodb
--- /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/debug/Host/ubuntu-xenial-32bit/mysql-test/r/index_merge_innodb.result 2017-01-03 15:06:26.917177432 +0300
+++ /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/debug/Host/ubuntu-xenial-32bit/build/mysql-test/var/2/log/index_merge_innodb.reject 2017-01-11 16:54:33.476707245 +0300
@@ -698,7 +698,7 @@
key1 key2 filler1
explain select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
id select_type table type possible_keys key key_len ref rows Extra
-1 SIMPLE t1 index_merge key1,key2,key3,key4 key1,key2,key3,key4 5,5,5,5 NULL 4 Using union(intersect(key1,key2),intersect(key3,key4)); Using where
+1 SIMPLE t1 index_merge key1,key2,key3,key4 key1,key2,key4,key3 5,5,5,5 NULL 4 Using union(intersect(key1,key2),intersect(key4,key3)); Using where
select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
key1 key2 key3 key4 filler1
-1 -1 100 100 key4-key3
mysqltest: Result content mismatch
It is also preceded by a delete from a secondary index, thus purge may be running concurrently with the query. This could be solved with a slow shutdown server restart, but here key3 and key4 are interchangeable, thus I'd just mask them both to something common. Re-uploading the patches with the following snippet added.
diff --git a/mysql-test/include/index_merge_ror.inc b/mysql-test/include/index_merge_ror.inc
index bd2a781a748..f48d47bc67d 100644
--- a/mysql-test/include/index_merge_ror.inc
+++ b/mysql-test/include/index_merge_ror.inc
@@ -178,6 +178,7 @@ select key1,key2,filler1 from t1 where key2=100 and key2=200;
# ROR-union(ROR-intersection) with one of ROR-intersection giving empty
# results
+--replace_result key3 key34 key4 key34
explain select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
diff --git a/mysql-test/r/index_merge_innodb.result b/mysql-test/r/index_merge_innodb.result
index 85d15f77489..5494d86ad5c 100644
--- a/mysql-test/r/index_merge_innodb.result
+++ b/mysql-test/r/index_merge_innodb.result
@@ -696,9 +696,9 @@ update t1 set key1=200,key2=200 where key1=100 and key2=100;
delete from t1 where key1=200 and key2=200;
select key1,key2,filler1 from t1 where key2=100 and key2=200;
key1 key2 filler1
-explain select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
+explain select key1,key2,key34,key34,filler1 from t1 where key1=100 and key2=100 or key34=100 and key34=100;
id select_type table type possible_keys key key_len ref rows Extra
-1 SIMPLE t1 index_merge key1,key2,key3,key4 key1,key2,key3,key4 5,5,5,5 NULL 4 Using union(intersect(key1,key2),intersect(key3,key4)); Using where
+1 SIMPLE t1 index_merge key1,key2,key34,key34 key1,key2,key34,key34 5,5,5,5 NULL 4 Using union(intersect(key1,key2),intersect(key34,key34)); Using where
select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
key1 key2 key3 key4 filler1
-1 -1 100 100 key4-key3
diff --git a/mysql-test/r/index_merge_myisam.result b/mysql-test/r/index_merge_myisam.result
index 505b59a43dc..e193d5d6511 100644
--- a/mysql-test/r/index_merge_myisam.result
+++ b/mysql-test/r/index_merge_myisam.result
@@ -726,9 +726,9 @@ update t1 set key1=200,key2=200 where key1=100 and key2=100;
delete from t1 where key1=200 and key2=200;
select key1,key2,filler1 from t1 where key2=100 and key2=200;
key1 key2 filler1
-explain select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
+explain select key1,key2,key34,key34,filler1 from t1 where key1=100 and key2=100 or key34=100 and key34=100;
id select_type table type possible_keys key key_len ref rows Extra
-1 SIMPLE t1 index_merge key1,key2,key3,key4 key1,key2,key3,key4 5,5,5,5 NULL 152 Using union(intersect(key1,key2),intersect(key3,key4)); Using where
+1 SIMPLE t1 index_merge key1,key2,key34,key34 key1,key2,key34,key34 5,5,5,5 NULL 152 Using union(intersect(key1,key2),intersect(key34,key34)); Using where
select key1,key2,key3,key4,filler1 from t1 where key1=100 and key2=100 or key3=100 and key4=100;
key1 key2 key3 key4 filler1
-1 -1 100 100 key4-key3
[3 Mar 2017 5:01]
Laurynas Biveinis
I cannot upload the refreshed contribution patches, because the existing contributions are Accepted (thanks!). In that case the above comment should contain enough information
[14 Apr 2017 9:37]
Laurynas Biveinis
Bug 84366 fix for 8.0.1
Attachment: bug84366-8.0.1.patch (application/octet-stream, text), 14.42 KiB.
[4 Aug 2017 19:40]
Laurynas Biveinis
Bug 84366 fix for 8.0.2
Attachment: bug84366-8.0.2.patch (application/octet-stream, text), 14.42 KiB.
[1 Feb 2018 10:51]
Laurynas Biveinis
The latest contributed fix applies cleanly and works on 8.0.4 too.
[13 Jun 2018 12:19]
Laurynas Biveinis
Bug 84366 fix for 8.0.11
Attachment: bug84366-8.0.11.patch (application/octet-stream, text), 14.57 KiB.

Description: If something changes the index tree between the two dives in btr_estimate_n_rows_in_range, this goes undetected, making it return bogus row count for the range, resulting in bogus query plans. How to repeat: Apply --- storage/innobase/btr/btr0cur.cc.orig 2016-12-29 07:16:06.000000000 +0200 +++ storage/innobase/btr/btr0cur.cc 2016-12-29 07:16:08.000000000 +0200 @@ -3708,6 +3708,9 @@ mtr_commit(&mtr); + if (!strcmp(index->name, "iC")) + DEBUG_SYNC_C("btr_estimate_n_rows_in_range_between_dives"); + mtr_start(&mtr); cursor.path_arr = path2; Then use the following MTR testcase. It uses purge as the concurrent index tree writer. --source include/have_debug.inc --source include/have_debug_sync.inc --source include/have_innodb.inc create table t3 ( key1 int not null, key2 int not null, INDEX i1(key1), INDEX i2(key2) ) engine=InnoDB; --disable_query_log insert into t3 values (1,1),(2,2); let $1=9; set @d=2; while ($1) { eval insert into t3 select key1+@d, key2+@d from t3; eval set @d=@d*2; dec $1; } --enable_query_log analyze table t3; SET @@GLOBAL.innodb_purge_stop_now=TRUE; alter table t3 add keyC int not null, add index iC(keyC); update t3 set keyC=key1; analyze table t3; --connect(con1,localhost,root,,) explain select * from t3 where key1=1 or key2=2 or keyC=12; SET DEBUG_SYNC = "btr_estimate_n_rows_in_range_between_dives SIGNAL estimate_ready WAIT_FOR estimate_finish"; send explain select * from t3 where key1=1 or key2=2 or keyC=12; connection default; SET DEBUG_SYNC = "now WAIT_FOR estimate_ready"; SET @@GLOBAL.innodb_purge_run_now=TRUE; --source include/wait_innodb_all_purged.inc SET DEBUG_SYNC = "now SIGNAL estimate_finish"; connection con1; reap; explain select * from t3 where key1=1 or key2=2 or keyC=12; disconnect con1; connection default; drop table t3; The testcase above executes the same EXPLAIN SELECT three times: with the iC index tree not purged at all, with it being purged between the two dives, and with the fully clean tree. The first and the last execution use index_merge, and the middle one is a table scan instead. This is reduced from an intermittent main.index_merge_innodb failure: main.index_merge_innodb [ fail ] Test ended at 2016-12-09 07:08:51 CURRENT_TEST: main.index_merge_innodb --- /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/release/Host/centos6-64/mysql-test/r/index_merge_innodb.result 2016-12-09 14:08:06.818000030 +0300 +++ /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/release/Host/centos6-64/build/mysql-test/var/log/index_merge_innodb.reject 2016-12-09 15:08:51.296000042 +0300 @@ -299,7 +299,7 @@ key5=5 or key6=6 or key7=7 or key8=8 or key9=9 or keyA=10 or keyB=11 or keyC=12; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t3 index_merge i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC 4,4,4,4,4,4,4,4,4,4,4,4 NULL 12 Using union(i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC); Using where +1 SIMPLE t3 ALL i1,i2,i3,i4,i5,i6,i7,i8,i9,iA,iB,iC NULL NULL NULL 1024 Using where select * from t3 where key1=1 or key2=2 or key3=3 or key4=4 or key5=5 or key6=6 or key7=7 or key8=8 or mysqltest: Result content mismatch Suggested fix: The bug results in index dive paths with path1[0].nth_rec > path2[0].nth_rec, returning hundreds of rows in the range instead of expected one. Even without concurrent modifications aside, such path crossing is detected in btr_estimate_n_rows_in_range: /* It is possible that slot1->nth_rec >= slot2->nth_rec if, for example, we have a single page tree which contains (inf, 5, 6, supr) and we select where x > 20 and x < 30; in this case slot1->nth_rec will point to the supr record and slot2->nth_rec will point to 6 */ n_rows = 0; But instead of returning zero right there, path tracing continues. Shouldn't it stop and return zero here instead? Then, 5.7 actually catches some of the tree modifications in this function by [1]. It does not catch this particular tree change because both paths still point to the same page. A possible fix is extending [1] with a condition that the same page id node in the two paths should contain the same number of records too - but this might be too aggressive for the trees with heavy write volume, resulting in needless dive restarts and eventual returning arbitrary 10 rows in range fallback constant. Perhaps adding a check that the same page in the two paths should have the same number of records - but only if the paths have crossed - would be reasonable. In any case such fix would be partial only as it's possible to modify the tree and still have the same number of records before and after. [1]: commit 9ac1425053312b98fc9e6a999087efb09e60daec Author: Vasil Dimov <vasil.dimov@oracle.com> Date: Fri Apr 17 15:04:58 2015 +0300 Fix Bug#20618309 ASSERT SLOT1->PAGE_LEVEL == SLOT2->PAGE_LEVEL, BTR_ESTIMATE_N_ROWS_IN_RANGE() Relax a too strict assert. If the tree is changed between both dives for the left boundary and the right boundary, then our markers (slot1 and slot2) could end up on different levels in the tree. If we detect that this has happened - then retry a few times and if still unsuccessful then return an arbitrary number for an estimate. Reviewed-by: Marko M<C3><A4>kel<C3><A4> <marko.makela@oracle.com> Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com> RB: 8570