Bug #116777 Performance of TPC-DS Query 47
Submitted: 25 Nov 2024 19:50 Modified: 27 Nov 2024 12:42
Reporter: JINSHENG BA Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:596f0d23 (9.0.0), 8.0.40 OS:Any
Assigned to: CPU Architecture:Any

[25 Nov 2024 19:50] JINSHENG BA
Description:
For the query 47 in TPC-DS benchmark, the performance is:

EXPLAIN
-> Limit: 100 row(s)  (actual time=32870..32870 rows=100 loops=1)
    -> Sort: `(v1.sum_sales - v1.avg_monthly_sales)`, v2.nsum, limit input to 100 row(s) per chunk  (actual time=32870..32870 rows=100 loops=1)
        -> Stream results  (cost=434153 rows=1.05e+6) (actual time=29762..32849 rows=48019 loops=1)
            -> Nested loop inner join  (cost=434153 rows=1.05e+6) (actual time=29762..32813 rows=48019 loops=1)
                -> Nested loop inner join  (cost=68316 rows=101022) (actual time=29762..31305 rows=49940 loops=1)
                    -> Filter: ((v1.d_year = 2000) and (v1.avg_monthly_sales > 0.000000) and ((case when (v1.avg_monthly_sales > 0.000000) then (abs((v1.sum_sales - v1.avg_monthly_sales)) / v1.avg_monthly_sales) else NULL end) > 0.1) and (v1.i_category is not null) and (v1.i_brand is not null) and (v1.s_store_name is not null) and (v1.s_company_name is not null))  (cost=3.38..32958 rows=9764) (actual time=29762..29830 rows=50288 loops=1)
                        -> Table scan on v1  (cost=2.5..2.5 rows=0) (actual time=29762..29794 rows=63745 loops=1)
                            -> Materialize CTE v1 if needed  (cost=0..0 rows=0) (actual time=29762..29762 rows=63745 loops=1)
                                -> Window aggregate: rank() OVER (PARTITION BY item.i_category,item.i_brand,store.s_store_name,store.s_company_name ORDER BY date_dim.d_year,date_dim.d_moy )   (actual time=29004..29078 rows=63745 loops=1)
                                    -> Sort: item.i_category, item.i_brand, store.s_store_name, store.s_company_name, date_dim.d_year, date_dim.d_moy  (actual time=29004..29015 rows=63745 loops=1)
                                        -> Table scan on <temporary>  (cost=2.5..2.5 rows=0) (actual time=28893..28922 rows=63745 loops=1)
                                            -> Temporary table  (cost=0..0 rows=0) (actual time=28893..28893 rows=63745 loops=1)
                                                -> Window aggregate with buffering: avg(```sum(store_sales.ss_sales_price)```) OVER (PARTITION BY item.i_category,item.i_brand,store.s_store_name,store.s_company_name,date_dim.d_year )   (actual time=28346..28761 rows=63745 loops=1)
                                                    -> Sort: item.i_category, item.i_brand, store.s_store_name, store.s_company_name, date_dim.d_year  (actual time=28345..28354 rows=63745 loops=1)
                                                        -> Table scan on <temporary>  (actual time=28167..28273 rows=63745 loops=1)
                                                            -> Aggregate using temporary table  (actual time=28167..28167 rows=63744 loops=1)
                                                                -> Nested loop inner join  (cost=3.26e+6 rows=292938) (actual time=16.3..20123 rows=661185 loops=1)
                                                                    -> Nested loop inner join  (cost=2.8e+6 rows=2.48e+6) (actual time=15.5..15036 rows=2.75e+6 loops=1)
                                                                        -> Inner hash join (no condition)  (cost=21766 rows=210780) (actual time=14.3..105 rows=216000 loops=1)
                                                                            -> Table scan on item  (cost=204 rows=17565) (actual time=0.721..67.6 rows=18000 loops=1)
                                                                            -> Hash
                                                                                -> Table scan on store  (cost=2.2 rows=12) (actual time=13.6..13.6 rows=12 loops=1)
                                                                        -> Filter: ((store_sales.ss_store_sk = store.s_store_sk) and (store_sales.ss_sold_date_sk is not null))  (cost=1.4 rows=11.8) (actual time=0.0481..0.0683 rows=12.7 loops=216000)
                                                                            -> Index lookup on store_sales using PRIMARY (ss_item_sk=item.i_item_sk)  (cost=1.4 rows=118) (actual time=0.0272..0.0601 rows=160 loops=216000)
                                                                    -> Filter: ((date_dim.d_year = 2000) or ((date_dim.d_moy = 12) and (date_dim.d_year = <cache>((2000 - 1)))) or ((date_dim.d_moy = 1) and (date_dim.d_year = <cache>((2000 + 1)))))  (cost=0.0833 rows=0.118) (actual time=0.00174..0.00176 rows=0.24 loops=2.75e+6)
                                                                        -> Single-row index lookup on date_dim using PRIMARY (d_date_sk=store_sales.ss_sold_date_sk)  (cost=0.0833 rows=1) (actual time=0.00151..0.00154 rows=0.976 loops=2.75e+6)
                    -> Filter: (v1.rn = (v1_lag.rn + 1))  (cost=0.25..2.59 rows=10.3) (actual time=0.0164..0.0292 rows=0.993 loops=50288)
                        -> Index lookup on v1_lag using <auto_key1> (i_category=v1.i_category, i_brand=v1.i_brand, s_store_name=v1.s_store_name, s_company_name=v1.s_company_name)  (cost=0.25..2.59 rows=10.3) (actual time=0.00697..0.0283 rows=12.6 loops=50288)
                            -> Materialize CTE v1 if needed (query plan printed elsewhere)  (cost=0..0 rows=0) (never executed)
                -> Filter: (v1.rn = (v1_lead.rn - 1))  (cost=0.25..2.59 rows=10.3) (actual time=0.0199..0.03 rows=0.962 loops=49940)
                    -> Index lookup on v1_lead using <auto_key1> (i_category=v1.i_category, i_brand=v1.i_brand, s_store_name=v1.s_store_name, s_company_name=v1.s_company_name)  (cost=0.25..2.59 rows=10.3) (actual time=0.00698..0.0291 rows=12.6 loops=49940)
                        -> Materialize CTE v1 if needed (query plan printed elsewhere)  (cost=0..0 rows=0) (never executed)

If we apply this patch:

diff --git a/sql/sql_planner.cc b/sql/sql_planner.cc
index ad0c639abca..6a51a03aed1 100644
--- a/sql/sql_planner.cc
+++ b/sql/sql_planner.cc
@@ -155,8 +155,8 @@ double find_cost_for_ref(const THD *thd, TABLE *table, unsigned keyno,
         table->file->index_scan_cost(keyno, 1, num_rows);
     return index_read_cost.total_cost();
   }
-  if (keyno == table->s->primary_key &&
-      table->file->primary_key_is_clustered()) {
+  if (!(keyno == table->s->primary_key &&
+      table->file->primary_key_is_clustered())) {
     const Cost_estimate table_read_cost =
         table->file->read_cost(keyno, 1, num_rows);
     return table_read_cost.total_cost();

We can get a much more efficient query plan (19 seconds) as attached:

The major difference is the second query plan uses ```Single-row index lookup on item```. Similarly, I am wondering whether we can optimize code anywhere to enable the second efficient query plan.

How to repeat:
I dumped the entire database: https://drive.google.com/file/d/1mCwATIQtNwYftxVvMCNz6xgjvRDCrhtq/view?usp=sharing

Compile MySQL in two versions, one is original and the other is with the changed IF condition. Then compare the execution of the below query on both versions.

TPC-DS query 47:

with v1 as(
 select i_category, i_brand,
        s_store_name, s_company_name,
        d_year, d_moy,
        sum(ss_sales_price) sum_sales,
        avg(sum(ss_sales_price)) over
          (partition by i_category, i_brand,
                     s_store_name, s_company_name, d_year)
          avg_monthly_sales,
        rank() over
          (partition by i_category, i_brand,
                     s_store_name, s_company_name
           order by d_year, d_moy) rn
 from item, store_sales, date_dim, store
 where ss_item_sk = i_item_sk and
       ss_sold_date_sk = d_date_sk and
       ss_store_sk = s_store_sk and
       (
         d_year = 2000 or
         ( d_year = 2000-1 and d_moy =12) or
         ( d_year = 2000+1 and d_moy =1)
       )
 group by i_category, i_brand,
          s_store_name, s_company_name,
          d_year, d_moy),
 v2 as(
 select v1.i_category, v1.i_brand
        ,v1.d_year, v1.d_moy
        ,v1.avg_monthly_sales
        ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum
 from v1, v1 v1_lag, v1 v1_lead
 where v1.i_category = v1_lag.i_category and
       v1.i_category = v1_lead.i_category and
       v1.i_brand = v1_lag.i_brand and
       v1.i_brand = v1_lead.i_brand and
       v1.s_store_name = v1_lag.s_store_name and
       v1.s_store_name = v1_lead.s_store_name and
       v1.s_company_name = v1_lag.s_company_name and
       v1.s_company_name = v1_lead.s_company_name and
       v1.rn = v1_lag.rn + 1 and
       v1.rn = v1_lead.rn - 1)
  select  *
 from v2
 where  d_year = 2000 and    
        avg_monthly_sales > 0 and
        case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1
 order by sum_sales - avg_monthly_sales, nsum
 limit 100;
[25 Nov 2024 19:50] JINSHENG BA
The second query plan:

EXPLAIN
-> Limit: 100 row(s)  (actual time=19984..19984 rows=100 loops=1)
    -> Sort: `(v1.sum_sales - v1.avg_monthly_sales)`, v2.nsum, limit input to 100 row(s) per chunk  (actual time=19984..19984 rows=100 loops=1)
        -> Stream results  (cost=254624 rows=1.41e+6) (actual time=16937..19964 rows=48019 loops=1)
            -> Nested loop inner join  (cost=254624 rows=1.41e+6) (actual time=16937..19928 rows=48019 loops=1)
                -> Nested loop inner join  (cost=62901 rows=136006) (actual time=16937..18479 rows=49940 loops=1)
                    -> Filter: ((v1.d_year = 2000) and (v1.avg_monthly_sales > 0.000000) and ((case when (v1.avg_monthly_sales > 0.000000) then (abs((v1.sum_sales - v1.avg_monthly_sales)) / v1.avg_monthly_sales) else NULL end) > 0.1) and (v1.i_category is not null) and (v1.i_brand is not null) and (v1.s_store_name is not null) and (v1.s_company_name is not null))  (cost=3.38..44371 rows=13145) (actual time=16937..17028 rows=50288 loops=1)
                        -> Table scan on v1  (cost=2.5..2.5 rows=0) (actual time=16937..16993 rows=63745 loops=1)
                            -> Materialize CTE v1 if needed  (cost=0..0 rows=0) (actual time=16937..16937 rows=63745 loops=1)
                                -> Window aggregate: rank() OVER (PARTITION BY item.i_category,item.i_brand,store.s_store_name,store.s_company_name ORDER BY date_dim.d_year,date_dim.d_moy )   (actual time=16241..16309 rows=63745 loops=1)
                                    -> Sort: item.i_category, item.i_brand, store.s_store_name, store.s_company_name, date_dim.d_year, date_dim.d_moy  (actual time=16241..16251 rows=63745 loops=1)
                                        -> Table scan on <temporary>  (cost=2.5..2.5 rows=0) (actual time=16129..16160 rows=63745 loops=1)
                                            -> Temporary table  (cost=0..0 rows=0) (actual time=16129..16129 rows=63745 loops=1)
                                                -> Window aggregate with buffering: avg(```sum(store_sales.ss_sales_price)```) OVER (PARTITION BY item.i_category,item.i_brand,store.s_store_name,store.s_company_name,date_dim.d_year )   (actual time=15603..16008 rows=63745 loops=1)
                                                    -> Sort: item.i_category, item.i_brand, store.s_store_name, store.s_company_name, date_dim.d_year  (actual time=15603..15611 rows=63745 loops=1)
                                                        -> Table scan on <temporary>  (actual time=15471..15532 rows=63745 loops=1)
                                                            -> Aggregate using temporary table  (actual time=15471..15471 rows=63745 loops=1)
                                                                -> Nested loop inner join  (cost=980819 rows=39439) (actual time=14.6..7959 rows=661185 loops=1)
                                                                    -> Nested loop inner join  (cost=668636 rows=334481) (actual time=14.3..2941 rows=2.75e+6 loops=1)
                                                                        -> Inner hash join (store_sales.ss_store_sk = store.s_store_sk)  (cost=356453 rows=334481) (actual time=14..2126 rows=2.75e+6 loops=1)
                                                                            -> Filter: (store_sales.ss_sold_date_sk is not null)  (cost=4154 rows=278735) (actual time=0.449..1603 rows=2.88e+6 loops=1)
                                                                                -> Table scan on store_sales  (cost=4154 rows=2.79e+6) (actual time=0.449..1428 rows=2.88e+6 loops=1)
                                                                            -> Hash
                                                                                -> Table scan on store  (cost=2.2 rows=12) (actual time=13.5..13.5 rows=12 loops=1)
                                                                        -> Single-row index lookup on item using PRIMARY (i_item_sk=store_sales.ss_item_sk)  (cost=0.0833 rows=1) (actual time=145e-6..168e-6 rows=1 loops=2.75e+6)
                                                                    -> Filter: ((date_dim.d_year = 2000) or ((date_dim.d_moy = 12) and (date_dim.d_year = <cache>((2000 - 1)))) or ((date_dim.d_moy = 1) and (date_dim.d_year = <cache>((2000 + 1)))))  (cost=0.0833 rows=0.118) (actual time=0.00172..0.00173 rows=0.24 loops=2.75e+6)
                                                                        -> Single-row index lookup on date_dim using PRIMARY (d_date_sk=store_sales.ss_sold_date_sk)  (cost=0.0833 rows=1) (actual time=0.00149..0.00151 rows=0.976 loops=2.75e+6)
                    -> Filter: (v1.rn = (v1_lag.rn + 1))  (cost=0.0363..0.375 rows=10.3) (actual time=0.0161..0.0287 rows=0.993 loops=50288)
                        -> Index lookup on v1_lag using <auto_key1> (i_category=v1.i_category, i_brand=v1.i_brand, s_store_name=v1.s_store_name, s_company_name=v1.s_company_name)  (cost=0.0363..0.375 rows=10.3) (actual time=0.00695..0.0278 rows=12.6 loops=50288)
                            -> Materialize CTE v1 if needed (query plan printed elsewhere)  (cost=0..0 rows=0) (never executed)
                -> Filter: (v1.rn = (v1_lead.rn - 1))  (cost=0.0362..0.375 rows=10.3) (actual time=0.0194..0.0289 rows=0.962 loops=49940)
                    -> Index lookup on v1_lead using <auto_key1> (i_category=v1.i_category, i_brand=v1.i_brand, s_store_name=v1.s_store_name, s_company_name=v1.s_company_name)  (cost=0.0362..0.375 rows=10.3) (actual time=0.00673..0.028 rows=12.6 loops=49940)
                        -> Materialize CTE v1 if needed (query plan printed elsewhere)  (cost=0..0 rows=0) (never executed)
[27 Nov 2024 7:02] MySQL Verification Team
Hello JINSHENG BA,

Thank you for the report and feedback.
I'm trying to reproduce the issue with provided details and get back to you if anything further needed.
I assume you have tested against the default server settings(if this is not the case then please attach the configuration details to the report).

regards,
Umesh
[27 Nov 2024 12:42] MySQL Verification Team
Hello JINSHENG BA,

Thank you for the report and details.
Observed this with 8.0.40 build.

8.0.40 - Joining the test results shortly.
8.4/9.1 - will test later this week/weekend and attach here

regards,
Umesh
[27 Nov 2024 12:42] MySQL Verification Team
8.0.40 test results

Attachment: 116777.results (application/octet-stream, text), 45.49 KiB.