Description:
A suboptimal query plan is chosen for executing query 8 in TPC-H benchmark.
$ mysql tpch < 8.sql
EXPLAIN
-> Sort: all_nations.o_year (actual time=49028..49028 rows=2 loops=1)
-> Table scan on <temporary> (actual time=49027..49027 rows=2 loops=1)
-> Aggregate using temporary table (actual time=49027..49027 rows=2 loops=1)
-> Nested loop inner join (cost=797653 rows=2808) (actual time=16.5..49003 rows=4838 loops=1)
-> Nested loop inner join (cost=699638 rows=28080) (actual time=1.53..29514 rows=731894 loops=1)
-> Nested loop inner join (cost=633644 rows=28080) (actual time=1.53..28480 rows=731894 loops=1)
-> Nested loop inner join (cost=567649 rows=28080) (actual time=1.52..26756 rows=731894 loops=1)
-> Nested loop inner join (cost=500635 rows=7134) (actual time=1.09..15988 rows=182721 loops=1)
-> Nested loop inner join (cost=416799 rows=35672) (actual time=1.08..14547 rows=911408 loops=1)
-> Inner hash join (no condition) (cost=313494 rows=35672) (actual time=1.06..2834 rows=911408 loops=1)
-> Filter: (ORDERS.O_ORDERDATE between DATE'1995-01-01' and DATE'1996-12-31') (cost=313492 rows=321078) (actual time=0.621..2631 rows=911408 loops=1)
-> Table scan on ORDERS (cost=313492 rows=2.89e+6) (actual time=0.619..2388 rows=3e+6 loops=1)
-> Hash
-> Filter: (REGION.R_NAME = 'ASIA') (cost=1.5 rows=1) (actual time=0.43..0.433 rows=1 loops=1)
-> Table scan on REGION (cost=1.5 rows=5) (actual time=0.426..0.43 rows=5 loops=1)
-> Single-row index lookup on CUSTOMER using PRIMARY (C_CUSTKEY=ORDERS.O_CUSTKEY) (cost=0.311 rows=1) (actual time=0.0126..0.0126 rows=1 loops=911408)
-> Filter: (n1.N_REGIONKEY = REGION.R_REGIONKEY) (cost=0.25 rows=0.2) (actual time=0.00141..0.00143 rows=0.2 loops=911408)
-> Single-row index lookup on n1 using PRIMARY (N_NATIONKEY=CUSTOMER.C_NATIONKEY) (cost=0.25 rows=1) (actual time=0.00114..0.00117 rows=1 loops=911408)
-> Index lookup on LINEITEM using PRIMARY (L_ORDERKEY=ORDERS.O_ORDERKEY) (cost=1 rows=3.94) (actual time=0.0561..0.0585 rows=4.01 loops=182721)
-> Single-row index lookup on SUPPLIER using PRIMARY (S_SUPPKEY=LINEITEM.L_SUPPKEY) (cost=0.25 rows=1) (actual time=0.00211..0.00215 rows=1 loops=731894)
-> Single-row index lookup on n2 using PRIMARY (N_NATIONKEY=SUPPLIER.S_NATIONKEY) (cost=0.25 rows=1) (actual time=0.00118..0.00121 rows=1 loops=731894)
-> Filter: (PART.P_TYPE = 'SMALL PLATED COPPER') (cost=0.377 rows=0.1) (actual time=0.0265..0.0265 rows=0.00661 loops=731894)
-> Single-row index lookup on PART using PRIMARY (P_PARTKEY=LINEITEM.L_PARTKEY) (cost=0.377 rows=1) (actual time=0.0261..0.0261 rows=1 loops=731894)
0.04s user 0.02s system 0% cpu 49.285 total
If we remove the IF in https://github.com/mysql/mysql-server/blob/trunk/sql/sql_planner.cc#L1870, we can see a performance improvement:
$ mysql tpch < 8.sql
EXPLAIN
-> Sort: all_nations.o_year (actual time=44241..44241 rows=2 loops=1)
-> Table scan on <temporary> (actual time=44241..44241 rows=2 loops=1)
-> Aggregate using temporary table (actual time=44241..44241 rows=2 loops=1)
-> Nested loop inner join (cost=687806 rows=2808) (actual time=16.3..44218 rows=4838 loops=1)
-> Nested loop inner join (cost=681207 rows=2808) (actual time=16.3..44208 rows=4838 loops=1)
-> Nested loop inner join (cost=658854 rows=2808) (actual time=16.3..43572 rows=4838 loops=1)
-> Nested loop inner join (cost=563211 rows=28080) (actual time=1.68..25522 rows=731894 loops=1)
-> Nested loop inner join (cost=496226 rows=7134) (actual time=1.19..14679 rows=182721 loops=1)
-> Nested loop inner join (cost=412389 rows=35672) (actual time=1.17..13153 rows=911408 loops=1)
-> Inner hash join (no condition) (cost=313494 rows=35672) (actual time=1.15..2908 rows=911408 loops=1)
-> Filter: (ORDERS.O_ORDERDATE between DATE'1995-01-01' and DATE'1996-12-31') (cost=313492 rows=321078) (actual time=0.719..2732 rows=911408 loops=1)
-> Table scan on ORDERS (cost=313492 rows=2.89e+6) (actual time=0.717..2489 rows=3e+6 loops=1)
-> Hash
-> Filter: (REGION.R_NAME = 'ASIA') (cost=1.5 rows=1) (actual time=0.418..0.421 rows=1 loops=1)
-> Table scan on REGION (cost=1.5 rows=5) (actual time=0.413..0.417 rows=5 loops=1)
-> Single-row index lookup on CUSTOMER using PRIMARY (C_CUSTKEY=ORDERS.O_CUSTKEY) (cost=0.297 rows=1) (actual time=0.011..0.011 rows=1 loops=911408)
-> Filter: (n1.N_REGIONKEY = REGION.R_REGIONKEY) (cost=0.25 rows=0.2) (actual time=0.00149..0.00151 rows=0.2 loops=911408)
-> Single-row index lookup on n1 using PRIMARY (N_NATIONKEY=CUSTOMER.C_NATIONKEY) (cost=0.25 rows=1) (actual time=0.00119..0.00122 rows=1 loops=911408)
-> Index lookup on LINEITEM using PRIMARY (L_ORDERKEY=ORDERS.O_ORDERKEY) (cost=0.999 rows=3.94) (actual time=0.0565..0.0588 rows=4.01 loops=182721)
-> Filter: (PART.P_TYPE = 'SMALL PLATED COPPER') (cost=0.367 rows=0.1) (actual time=0.0245..0.0245 rows=0.00661 loops=731894)
-> Single-row index lookup on PART using PRIMARY (P_PARTKEY=LINEITEM.L_PARTKEY) (cost=0.367 rows=1) (actual time=0.0241..0.0242 rows=1 loops=731894)
-> Single-row index lookup on SUPPLIER using PRIMARY (S_SUPPKEY=LINEITEM.L_SUPPKEY) (cost=0.873 rows=1) (actual time=0.131..0.131 rows=1 loops=4838)
-> Single-row index lookup on n2 using PRIMARY (N_NATIONKEY=SUPPLIER.S_NATIONKEY) (cost=0.25 rows=1) (actual time=0.00175..0.00179 rows=1 loops=4838)
0.03s user 0.02s system 0% cpu 44.423 total
Both execution time and cost number are reduced by around 10%.
The only difference between two query plans is the joining order of table SUPPLIER and PART. I am wondering whether we can optimize the code to enable the more efficient joining order in default.
Query 8:
EXPLAIN ANALYZE
select
o_year,
sum(
case
when nation = 'INDIA' then volume
else 0
end
) / sum(volume) as mkt_share
from
(
select
extract(
year
from
o_orderdate
) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
PART,
SUPPLIER,
LINEITEM,
ORDERS,
CUSTOMER,
NATION n1,
NATION n2,
REGION
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01'
and date '1996-12-31'
and p_type = 'SMALL PLATED COPPER'
) as all_nations
group by
o_year
order by
o_year;
How to repeat:
TPC-H data scale: 2GB
The data is exactly the same as the one used in https://bugs.mysql.com/bug.php?id=116309
Compile MySQL two versions, one is original and the other is with the changed IF condition. I attach the patch file to remove the code for your reference.
Patch:
diff --git a/sql/sql_planner.cc b/sql/sql_planner.cc
index ad0c639abca..c415ad48b71 100644
--- a/sql/sql_planner.cc
+++ b/sql/sql_planner.cc
@@ -1867,7 +1867,6 @@ bool Join_tab_compare_default::operator()(const JOIN_TAB *jt1,
if (jt1_keydep_jt2 && !jt2_keydep_jt1) return false;
if (jt2_keydep_jt1 && !jt1_keydep_jt2) return true;
- if (jt1->found_records > jt2->found_records) return false;
if (jt1->found_records < jt2->found_records) return true;
return jt1 < jt2;
I observed that the number of cost and execution time fluctuate across executions, so I typically run the query multiple times to get a stable result.