Bug #103997 | Mis-calculate SJM_POS's prefix_rowcount/prefix_costs | ||
---|---|---|---|
Submitted: | 14 Jun 2021 3:24 | Modified: | 15 Jun 2021 12:58 |
Reporter: | henry liang | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[14 Jun 2021 3:24]
henry liang
[14 Jun 2021 12:56]
MySQL Verification Team
Hi Mr. liang, Thank you for your interesting bug report and for your patch. We need to be able to repeat the behaviour, so we would like to have table and data. Can you please, let us know how can we obtain all we need. Is there an URL for that particular test case and how big is it ??? Also, you are using the ancient release of our 8.0 version, so please, let us know if 8.0.25 has a solution for this problem. Those two lines have not changed, but that is not enough for us to conclude that everything remained the same. Thank you, in advance.
[15 Jun 2021 6:45]
henry liang
I found this issue on my local TPC-H schema, but it is a common problem, I verified using the newest 8.0.24 version and it is still there. For example, I used this simple case: create table t1 (id bigint(11), c1 bigint(11), c2 bigint(11)); create table t2 (id bigint(11), c1 bigint(11), c2 bigint(11)); create table t3 (id bigint(11), c1 bigint(11), c2 bigint(11)); insert into t1 values(1,2,3),(1,2,3),(1,2,3),(1,2,3); insert into t2 select * from t1; insert into t3 select * from t1; set optimizer_switch="materialization=on,semijoin=on,loosescan=off,firstmatch=off,duplicateweedout=off"; explain select * from t1 where t1.c1 in (select t2.c2 from t2,t3 where t2.c2 = t2.c2); In the above example, the plan is like this: | -> Inner hash join (t1.c1 = `<subquery2>`.c2) (cost=31.56 rows=25) -> Table scan on t1 (cost=0.35 rows=9) -> Hash -> Table scan on <subquery2> (cost=0.11..2.81 rows=25) -> Materialize with deduplication (cost=6.11..8.81 rows=25) -> Filter: (t2.c2 is not null) (cost=3.50 rows=25) -> Inner hash join (no condition) (cost=3.50 rows=25) -> Table scan on t3 (cost=0.15 rows=5) -> Hash -> Table scan on t2 (cost=0.75 rows=5) So SJM is the first join table, but in JOIN::setup_semijoin_materialized_table(), sjm's index was calculated using "(tab - join_tab)", which is not correct since tab is allocated from a temporary JOIN_TAB array, in the above example, (tab - join_tab) == 24, but the corrent idx should be 0.
[15 Jun 2021 12:58]
MySQL Verification Team
Hi Mr. liang, Thank you for your contribution. Verified as reported.