Bug #105315 Switch to hash join when appropriate
Submitted: 25 Oct 2021 14:52 Modified: 26 Oct 2021 17:00
Reporter: Justin Swanhart Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:8.0.27 OS:Any
Assigned to: CPU Architecture:Any

[25 Oct 2021 14:52] Justin Swanhart
Description:
This is related to: https://bugs.mysql.com/bug.php?id=105308

If the query is executed using b-tree indexes on all of the filter/join columns, using the default nested loops plan, this query does not scale as the SSB data set grows.  InnoDB is extremely slow (due to random IO) when the dataset is larger than ram.  If the innodb_adaptive_hash_index is enabled (even partitioned) it slows the query down dramatically if the data set is larger than memory.  Using hash join the query may take less than one minute, where it takes over 14 minutes with indexes/nested loops, and the adaptive hash index. 

Thus recognizing a star schema, and switching to hash join is important. I think MySQL should recognize when an index is larger than memory and will be probed a lot of times, switch to hash join.

How to repeat:
Test the star schema benchmark at data set sizes greater than memory. Set the innodb_buffer_pool_size to 1GB for example, and load the SSB at scale factor 30.  Profile the query time and IO utilization and SHOW INNODB STATUS during the query using indexes and nested loops, then do the same using hash join.
[25 Oct 2021 15:03] Justin Swanhart
Also note that in such a schema with b-tree indexes on the dimension and fact tables, using STRAIGHT_JOIN improves performance a lot, because the fact table is scanned and the dimension tables will fit into memory. If the lookup is done on non-unique index on lineorder, random IO will dominate the query. 

Still hash join is still generally faster.  But switching to scanning the fact table then index lookups into the dimension tables is the best execution plan in a star schema when nested loops are employed.  Hashing a dimension, scanning the fact, then hashing the remaining dimensions is the most efficient mechanism for hash join.
[25 Oct 2021 15:13] Justin Swanhart
For reference, here are the queries for the star schema benchmark.  The queries are the same regardless of the scale factor.

-- Q1.1
select sum(lo_extendedprice*lo_discount) as
revenue
from lineorder join  dim_date on lo_orderdatekey = d_datekey
where
d_year = 1993
and lo_discount between 1 and 3
and lo_quantity < 25;

-- Q1.2
select sum(lo_extendedprice*lo_discount) as
revenue
from lineorder join  dim_date on lo_orderdatekey = d_datekey
where 
d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;

-- Q1.3
select sum(lo_extendedprice*lo_discount) as
revenue
from lineorder join  dim_date on lo_orderdatekey = d_datekey
where 
d_weeknuminyear = 6
and d_year = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;

-- Q2.1
select sum(lo_revenue), d_year, p_brand
from lineorder
join dim_date
  on lo_orderdatekey = d_datekey
join part
  on lo_partkey = p_partkey
join supplier
  on lo_suppkey = s_suppkey
where 
p_category = 'MFGR#12'
and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;

-- Q2.2
select sum(lo_revenue), d_year, p_brand
from lineorder
join dim_date
  on lo_orderdatekey = d_datekey
join part
  on lo_partkey = p_partkey
join supplier
  on lo_suppkey = s_suppkey
where 
p_brand between 'MFGR#2221' and 'MFGR#2228'
and s_region = 'ASIA'
group by d_year, p_brand
order by d_year, p_brand;

-- Q2.3
select sum(lo_revenue), d_year, p_brand
from lineorder
join dim_date
  on lo_orderdatekey = d_datekey
join part
  on lo_partkey = p_partkey
join supplier
  on lo_suppkey = s_suppkey
where 
p_brand= 'MFGR#2239'
and s_region = 'EUROPE'
group by d_year, p_brand
order by d_year, p_brand;

-- Q3.1
select c_nation, s_nation, d_year,
sum(lo_revenue) as revenue
from customer
join lineorder
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join dim_date
  on lo_orderdatekey = d_datekey
where
c_region = 'ASIA'
and s_region = 'ASIA'
and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, revenue desc;

-- Q3.2
select c_city, s_city, d_year, sum(lo_revenue)
as revenue
from customer
join lineorder
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join dim_date
  on lo_orderdatekey = d_datekey
where
c_nation = 'UNITED STATES'
and s_nation = 'UNITED STATES'
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, revenue desc;

-- Q3.3
select c_city, s_city, d_year, sum(lo_revenue)
as revenue
from customer
join lineorder
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join dim_date
  on lo_orderdatekey = d_datekey
where
(c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, revenue desc;

-- Q3.4
select c_city, s_city, d_year, sum(lo_revenue)
as revenue
from customer
join lineorder
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join dim_date
  on lo_orderdatekey = d_datekey
where
(c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_yearmonth = 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, revenue desc;

-- Q4.1
select d_year, c_nation,
sum(lo_revenue - lo_supplycost) as profit
from lineorder
join dim_date 
  on lo_orderdatekey = d_datekey
join customer
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join part
  on lo_partkey = p_partkey
where
c_region = 'AMERICA'
and s_region = 'AMERICA'
and (p_mfgr = 'MFGR#1'
or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;

--Q4.2
select d_year, s_nation, p_category,
sum(lo_revenue - lo_supplycost) as profit
from lineorder
join dim_date 
  on lo_orderdatekey = d_datekey
join customer
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join part
  on lo_partkey = p_partkey
where
c_region = 'AMERICA'
and s_region = 'AMERICA'
and (d_year = 1997 or d_year = 1998)
and (p_mfgr = 'MFGR#1'
or p_mfgr = 'MFGR#2')
group by d_year, s_nation, p_category
order by d_year, s_nation, p_category;

-- Q4.3
select d_year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) as profit
from lineorder
join dim_date 
  on lo_orderdatekey = d_datekey
join customer
  on lo_custkey = c_customerkey
join supplier
  on lo_suppkey = s_suppkey
join part
  on lo_partkey = p_partkey
where
s_nation = 'UNITED STATES'
and (d_year = 1997 or d_year = 1998)
and p_category = 'MFGR#14'
group by d_year, s_city, p_brand
order by d_year, s_city, p_brand;
[25 Oct 2021 16:20] Justin Swanhart
For your convenience, I will EXPLAIN ANALYZE Q4.1 using different hints once the data set finishes loading.
[25 Oct 2021 16:27] Justin Swanhart
Before I do, the 30% index lookup heuristic should probably apply to joins.  If a join needs to look up more than 30% of a table through an index, it should switch to a scan and do a hash join, I think.
[25 Oct 2021 18:48] Justin Swanhart
Q4.4 SF=30 ENGINE=INNODB SYSTEM_MEMORY=64G INNODB_DEDICATED_SERVER=TRUE STORAGE=SAMSUNG 860EVO 1TB formatted to 512GB CPU=i7-9750H(6 core, 12 thread) 
[all table data fits in buffer pool]

with primary keys on each dimension table, and a non-unique key on LO_OrderDateKey :
explain analyze select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder join dim_date    on lo_orderdatekey = d_datekey join customer   on lo_custkey = c_customerkey join supplier   on lo_suppkey = s_suppkey join part   on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;
 -> Sort: dim_date.D_Year, customer.C_Nation  (actual time=761056.706..761056.708 rows=35 loops=1)
    -> Table scan on <temporary>  (actual time=0.001..0.004 rows=35 loops=1)
        -> Aggregate using temporary table  (actual time=761056.682..761056.687 rows=35 loops=1)
            -> Nested loop inner join  (cost=3010237.35 rows=4971) (actual time=0.183..757065.623 rows=2882137 loops=1)
                -> Nested loop inner join  (cost=3001080.03 rows=26164) (actual time=0.126..746441.646 rows=7211980 loops=1)
                    -> Nested loop inner join  (cost=2909506.80 rows=261638) (actual time=0.118..713317.676 rows=36205824 loops=1)
                        -> Nested loop inner join  (cost=1993774.55 rows=2616378) (actual time=0.049..526743.946 rows=179998368 loops=1)
                            -> Table scan on dim_date  (cost=260.35 rows=2556) (actual time=0.027..2.409 rows=2556 loops=1)
                            -> Index lookup on lineorder using LO_OrderDateKey (LO_OrderDateKey=dim_date.D_DateKey)  (cost=677.61 rows=1024) (actual time=0.017..203.427 rows=70422 loops=2556)
                        -> Filter: (supplier.S_Region = 'AMERICA')  (cost=0.25 rows=0) (actual time=0.001..0.001 rows=0 loops=179998372)
                            -> Single-row index lookup on supplier using PRIMARY (S_SuppKey=lineorder.LO_SuppKey)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=179998372)
                    -> Filter: (customer.C_Region = 'AMERICA')  (cost=0.25 rows=0) (actual time=0.001..0.001 rows=0 loops=36205823)
                        -> Single-row index lookup on customer using PRIMARY (C_CustomerKey=lineorder.LO_CustKey)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=36205823)
                -> Filter: ((part.P_MFGR = 'MFGR#1') or (part.P_MFGR = 'MFGR#2'))  (cost=0.25 rows=0) (actual time=0.001..0.001 rows=0 loops=7211980)
                    -> Single-row index lookup on part using PRIMARY (P_PartKey=lineorder.LO_PartKey)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=7211980)
1 row in set (12 min 41.0588 sec)
[25 Oct 2021 18:48] Justin Swanhart
with additional indexes on the filter columns the dimension tables (ie, C_Region, S_region, and P_Mfgr) as well as primary keys in the dimension tables:
explain analyze select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder join dim_date    on lo_orderdatekey = d_datekey join customer   on lo_custkey = c_customerkey join supplier   on lo_suppkey = s_suppkey join part   on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation\G
*************************** 1. row ***************************
EXPLAIN: -> Sort: dim_date.D_Year, customer.C_Nation  (actual time=621327.867..621327.869 rows=35 loops=1)
    -> Table scan on <temporary>  (actual time=0.001..0.004 rows=35 loops=1)
        -> Aggregate using temporary table  (actual time=621327.843..621327.847 rows=35 loops=1)
            -> Nested loop inner join  (cost=3558024.71 rows=328093) (actual time=0.077..617636.411 rows=2882137 loops=1)
                -> Nested loop inner join  (cost=3265132.17 rows=836836) (actual time=0.073..601690.648 rows=14471701 loops=1)
                    -> Nested loop inner join  (cost=2909506.80 rows=1016072) (actual time=0.053..555593.993 rows=36205824 loops=1)
                        -> Nested loop inner join  (cost=1993774.55 rows=2616378) (actual time=0.040..357548.884 rows=179998368 loops=1)
                            -> Table scan on dim_date  (cost=260.35 rows=2556) (actual time=0.022..2.428 rows=2556 loops=1)
                            -> Index lookup on lineorder using LO_OrderDateKey (LO_OrderDateKey=dim_date.D_DateKey)  (cost=677.61 rows=1024) (actual time=0.007..137.274 rows=70422 loops=2556)
                        -> Filter: (supplier.S_Region = 'AMERICA')  (cost=0.25 rows=0) (actual time=0.001..0.001 rows=0 loops=179998372)
                            -> Single-row index lookup on supplier using PRIMARY (S_SuppKey=lineorder.LO_SuppKey)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=179998372)
                    -> Filter: ((part.P_MFGR = 'MFGR#1') or (part.P_MFGR = 'MFGR#2'))  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=0 loops=36205823)
                        -> Single-row index lookup on part using PRIMARY (P_PartKey=lineorder.LO_PartKey)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=36205823)
                -> Filter: (customer.C_Region = 'AMERICA')  (cost=0.25 rows=0) (actual time=0.001..0.001 rows=0 loops=14471701)
                    -> Single-row index lookup on customer using PRIMARY (C_CustomerKey=lineorder.LO_CustKey)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=14471701)

1 row in set (10 min 21.3300 sec)
[25 Oct 2021 18:49] Justin Swanhart
With indexes on the filter columns in the dimensions but ignoring the index on the lineorder table:
explain analyze select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder IGNORE INDEX(LO_OrderDateKey) join dim_date    on lo_orderdatekey = d_datekey join customer   on lo_custkey = c_customerkey join supplier   on lo_suppkey = s_suppkey join part   on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation\G
*************************** 1. row ***************************
EXPLAIN: -> Sort: dim_date.D_Year, customer.C_Nation  (actual time=171884.670..171884.672 rows=35 loops=1)
    -> Table scan on <temporary>  (actual time=0.001..0.003 rows=35 loops=1)
        -> Aggregate using temporary table  (actual time=171884.647..171884.651 rows=35 loops=1)
            -> Inner hash join (part.P_PartKey = lineorder.LO_PartKey)  (cost=1510078583354178.20 rows=1107073605608377) (actual time=168558.148..170064.842 rows=2882137 loops=1)
                -> Filter: ((part.P_MFGR = 'MFGR#1') or (part.P_MFGR = 'MFGR#2'))  (cost=0.01 rows=72977) (actual time=0.028..681.777 rows=399597 loops=1)
                    -> Table scan on part  (cost=0.01 rows=993809) (actual time=0.023..462.241 rows=1000000 loops=1)
                -> Hash
                    -> Inner hash join (customer.C_CustomerKey = lineorder.LO_CustKey)  (cost=660127470179.20 rows=206586820177) (actual time=159742.938..166366.335 rows=7211980 loops=1)
                        -> Filter: (customer.C_Region = 'AMERICA')  (cost=0.01 rows=29918) (actual time=0.026..456.962 rows=179791 loops=1)
                            -> Table scan on customer  (cost=0.01 rows=893261) (actual time=0.019..372.847 rows=900000 loops=1)
                        -> Hash
                            -> Inner hash join (supplier.S_SuppKey = lineorder.LO_SuppKey)  (cost=43265369050.34 rows=206165466) (actual time=126375.215..153306.163 rows=36205824 loops=1)
                                -> Filter: (supplier.S_Region = 'AMERICA')  (cost=0.00 rows=2169) (actual time=0.018..32.872 rows=12068 loops=1)
                                    -> Table scan on supplier  (cost=0.00 rows=59682) (actual time=0.014..26.079 rows=60000 loops=1)
                                -> Hash
                                    -> Inner hash join (lineorder.LO_OrderDateKey = dim_date.D_DateKey)  (cost=42697976837.57 rows=2616378) (actual time=1.225..105904.224 rows=179998368 loops=1)
                                        -> Table scan on lineorder  (cost=510.10 rows=167044886) (actual time=0.005..83218.183 rows=179998368 loops=1)
                                        -> Hash
                                            -> Table scan on dim_date  (cost=260.60 rows=2556) (actual time=0.012..0.887 rows=2556 loops=1)

1 row in set (2 min 52.3806 sec)

-- Using nested loops (no IGNORE INDEX on lineorder):
explain analyze select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder join dim_date    on lo_orderdatekey = d_datekey join customer   on lo_custkey = c_customerkey join supplier   on lo_suppkey = s_suppkey join part   on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation\G
*************************** 1. row ***************************
EXPLAIN: -> Sort: dim_date.D_Year, customer.C_Nation  (actual time=411485.787..411485.789 rows=35 loops=1)
    -> Table scan on <temporary>  (actual time=0.001..0.003 rows=35 loops=1)
        -> Aggregate using temporary table  (actual time=411485.762..411485.766 rows=35 loops=1)
            -> Inner hash join (part.P_PartKey = lineorder.LO_PartKey)  (cost=1510036006161205.80 rows=1107073692698608) (actual time=408309.679..409738.631 rows=2882137 loops=1)
                -> Filter: ((part.P_MFGR = 'MFGR#1') or (part.P_MFGR = 'MFGR#2'))  (cost=0.01 rows=72977) (actual time=0.023..657.214 rows=399597 loops=1)
                    -> Table scan on part  (cost=0.01 rows=993809) (actual time=0.019..444.329 rows=1000000 loops=1)
                -> Hash
                    -> Inner hash join (customer.C_CustomerKey = lineorder.LO_CustKey)  (cost=617431535687.80 rows=206586836429) (actual time=399891.160..406247.318 rows=7211980 loops=1)
                        -> Filter: (customer.C_Region = 'AMERICA')  (cost=0.01 rows=29918) (actual time=0.028..441.847 rows=179791 loops=1)
                            -> Table scan on customer  (cost=0.01 rows=893261) (actual time=0.021..360.250 rows=900000 loops=1)
                        -> Hash
                            -> Inner hash join (supplier.S_SuppKey = lineorder.LO_SuppKey)  (cost=569386032.21 rows=206165482) (actual time=367683.461..393811.077 rows=36205824 loops=1)
                                -> Filter: (supplier.S_Region = 'AMERICA')  (cost=0.00 rows=2169) (actual time=0.011..31.691 rows=12068 loops=1)
                                    -> Table scan on supplier  (cost=0.00 rows=59682) (actual time=0.007..25.285 rows=60000 loops=1)
                                -> Hash
                                    -> Nested loop inner join  (cost=1993774.80 rows=2616378) (actual time=0.029..335066.083 rows=179998368 loops=1)
                                        -> Table scan on dim_date  (cost=260.60 rows=2556) (actual time=0.010..5.503 rows=2556 loops=1)
                                        -> Index lookup on lineorder using LO_OrderDateKey (LO_OrderDateKey=dim_date.D_DateKey)  (cost=677.61 rows=1024) (actual time=0.008..128.563 rows=70422 loops=2556)

1 row in set (6 min 51.9636 sec)

These results show that switching from nested loops to hash join is significantly faster.
[25 Oct 2021 18:50] Justin Swanhart
Of course, when LO_OrderDateKey index does not fit into memory, nested loops completely falls off the cliff.  I stopped the query after 30 minutes.
That was with innodb_buffer_pool_size=8GB
[25 Oct 2021 18:55] Justin Swanhart
Note, I dropped the primary keys on the dimension table for the last two queries, only the filter columns remain. If I remove all indexes then #105308 happens.
[25 Oct 2021 20:20] Justin Swanhart
Also note that it doesn't look like the filters gets pushed down into the dimension table scan via ICP, and they could be.
[25 Oct 2021 20:37] Justin Swanhart
If ICP pushed down the s_region='AMERICA' filter, then only 12068 tuples would be produced for the supplier scan, for example:
select count(*) from supplier where s_region='AMERICA';
+----------+
| count(*) |
+----------+
|    12068 |
+----------+
1 row in set (0.0056 sec)

WARP pushes down these expressions via ECP, since it is a scan of the PRIMARY KEY all constant filters should be able to be pushed down into the scan.
[26 Oct 2021 8:31] MySQL Verification Team
Hello Justin,

Thank you for the feature request.

regards,
Umesh
[26 Oct 2021 17:00] Justin Swanhart
INNODB_BUFFER_POOL_SIZE=8GB

One more bad plan scenario:
A filter column exists in the fact table, but it is not the join filter.  In this case, the only index visible to the optimizer is an index that I added on LO_Quantity, and I added a filter to Q4.1 to demonstrate the problem.

-- I killed this query after 21 minutes
explain analyze select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder IGNORE INDEX(LO_OrderDateKey) join dim_date    on lo_orderdatekey = d_datekey join customer   on lo_custkey = c_customerkey join supplier   on lo_suppkey = s_suppkey join part   on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') and lo_quantity = 1 group by d_year, c_nation order by d_year, c_nation\G
^C-- query aborted
*************************** 1. row ***************************
EXPLAIN: -> Sort: dim_date.D_Year, customer.C_Nation  (actual time=1261688.549..1261688.549 rows=0 loops=1)
    -> Table scan on <temporary>  (never executed)
        -> Aggregate using temporary table  (actual time=1261688.547..1261688.547 rows=0 loops=1)
            -> Inner hash join (part.P_PartKey = lineorder.LO_PartKey)  (cost=69996894200862.80 rows=51308713584918) (actual time=1261688.547..1261688.547 rows=0 loops=1)
                -> Filter: ((part.P_MFGR = 'MFGR#1') or (part.P_MFGR = 'MFGR#2'))  (cost=0.01 rows=72977) (never executed)
                    -> Table scan on part  (cost=0.01 rows=993809) (never executed)
                -> Hash
                    -> Inner hash join (customer.C_CustomerKey = lineorder.LO_CustKey)  (cost=40590975265.66 rows=9574525066) (actual time=1261688.546..1261688.546 rows=0 loops=1)
                        -> Filter: (customer.C_Region = 'AMERICA')  (cost=0.01 rows=29918) (never executed)
                            -> Table scan on customer  (cost=0.01 rows=893261) (never executed)
                        -> Hash
                            -> Inner hash join (supplier.S_SuppKey = lineorder.LO_SuppKey)  (cost=12001639880.60 rows=9554997) (actual time=1261688.546..1261688.546 rows=0 loops=1)
                                -> Filter: (supplier.S_Region = 'AMERICA')  (cost=0.03 rows=2169) (never executed)
                                    -> Table scan on supplier  (cost=0.03 rows=59682) (never executed)
                                -> Hash
                                    -> Nested loop inner join  (cost=11975339623.15 rows=121259) (actual time=4.844..1261479.008 rows=323308 loops=1)
                                        -> Table scan on dim_date  (cost=260.60 rows=2556) (actual time=0.013..0.445 rows=216 loops=1)
                                        -> Filter: (lineorder.LO_OrderDateKey = dim_date.D_DateKey)  (cost=3910997.15 rows=47) (actual time=4.062..5840.096 rows=1497 loops=216)
                                            -> Index lookup on lineorder using LO_Quantity (LO_Quantity=1)  (cost=3910997.15 rows=7741904) (actual time=0.008..5700.603 rows=3598469 loops=216)

1 row in set, 1 warning (21 min 1.6894 sec)
Warning (code 1317): Query execution was interrupted

-- This query uses no indexes and completes in 1.5 minutes.
explain analyze select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder IGNORE INDEX(LO_OrderDateKey,LO_Quantity) join dim_date    on lo_orderdatekey = d_datekey join customer   on lo_custkey = c_customerkey join supplier   on lo_suppkey = s_suppkey join part   on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') and lo_quantity = 1 group by d_year, c_nation order by d_year, c_nation\G
*************************** 1. row ***************************
EXPLAIN: -> Sort: dim_date.D_Year, customer.C_Nation  (actual time=90611.141..90611.143 rows=35 loops=1)
    -> Table scan on <temporary>  (actual time=0.001..0.004 rows=35 loops=1)
        -> Aggregate using temporary table  (actual time=90611.117..90611.121 rows=35 loops=1)
            -> Inner hash join (part.P_PartKey = lineorder.LO_PartKey)  (cost=1786795106.09 rows=6474) (actual time=89925.113..90570.462 rows=57277 loops=1)
                -> Filter: ((part.P_MFGR = 'MFGR#1') or (part.P_MFGR = 'MFGR#2'))  (cost=0.01 rows=72977) (actual time=0.013..594.472 rows=399597 loops=1)
                    -> Table scan on part  (cost=0.01 rows=993809) (actual time=0.011..400.129 rows=1000000 loops=1)
                -> Hash
                    -> Inner hash join (customer.C_CustomerKey = lineorder.LO_CustKey)  (cost=18129655.19 rows=1) (actual time=89387.662..89882.419 rows=143965 loops=1)
                        -> Filter: (customer.C_Region = 'AMERICA')  (cost=0.02 rows=29918) (actual time=0.016..441.047 rows=179791 loops=1)
                            -> Table scan on customer  (cost=0.02 rows=893261) (actual time=0.014..363.845 rows=900000 loops=1)
                        -> Hash
                            -> Inner hash join (supplier.S_SuppKey = lineorder.LO_SuppKey)  (cost=18012919.64 rows=0) (actual time=89055.407..89198.506 rows=723798 loops=1)
                                -> Filter: (supplier.S_Region = 'AMERICA')  (cost=1.62 rows=2169) (actual time=0.014..29.972 rows=12068 loops=1)
                                    -> Table scan on supplier  (cost=1.62 rows=59682) (actual time=0.010..23.876 rows=60000 loops=1)
                                -> Hash
                                    -> Inner hash join (lineorder.LO_OrderDateKey = dim_date.D_DateKey)  (cost=18008983.13 rows=0) (actual time=1.198..88062.163 rows=3600289 loops=1)
                                        -> Filter: (lineorder.LO_Quantity = 1)  (cost=7045.57 rows=1) (actual time=0.036..87548.178 rows=3600289 loops=1)
                                            -> Table scan on lineorder  (cost=7045.57 rows=167044886) (actual time=0.005..80677.611 rows=179998368 loops=1)
                                        -> Hash
                                            -> Table scan on dim_date  (cost=260.60 rows=2556) (actual time=0.010..0.890 rows=2556 loops=1)

1 row in set (1 min 30.6172 sec)