Bug #116309 Performance of TPC-H Query
Submitted: 7 Oct 2024 10:39 Modified: 22 Oct 2024 11:32
Reporter: JINSHENG BA Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:596f0d23 (9.0.0), 9.1.0 OS:Ubuntu
Assigned to: CPU Architecture:Any

[7 Oct 2024 10:39] JINSHENG BA
Description:
A suboptimal query plan is chosen for executing query 16 in TPC-H benchmark. 

$ mysql tpch < 16.sql
EXPLAIN
-> Sort: supplier_cnt DESC, PART.P_BRAND, PART.P_TYPE, PART.P_SIZE  (actual time=8208..8209 rows=24525 loops=1)
    -> Stream results  (cost=10.4e+9 rows=3.54e+6) (actual time=495..8161 rows=24525 loops=1)
        -> Group aggregate: count(distinct PARTSUPP.PS_SUPPKEY)  (cost=10.4e+9 rows=3.54e+6) (actual time=495..8136 rows=24525 loops=1)
            -> Nested loop antijoin  (cost=3.14e+9 rows=31.4e+9) (actual time=494..7932 rows=237868 loops=1)
                -> Nested loop inner join  (cost=324912 rows=1.58e+6) (actual time=468..7718 rows=237992 loops=1)
                    -> Sort: PART.P_BRAND, PART.P_TYPE, PART.P_SIZE  (cost=41975 rows=395604) (actual time=468..484 rows=59498 loops=1)
                        -> Filter: ((PART.P_BRAND <> 'Brand#34') and (not((PART.P_TYPE like 'LARGE BRUSHED%'))) and (PART.P_SIZE in (48,19,12,4,41,7,21,39)))  (cost=41975 rows=395604) (actual time=0.458..407 rows=59498 loops=1)
                            -> Table scan on PART  (cost=41975 rows=395604) (actual time=0.453..325 rows=400000 loops=1)
                    -> Covering index lookup on PARTSUPP using PRIMARY (PS_PARTKEY=PART.P_PARTKEY)  (cost=0.792 rows=3.98) (actual time=0.119..0.121 rows=4 loops=59498)
                -> Single-row index lookup on <subquery2> using <auto_distinct_key> (s_suppkey=PARTSUPP.PS_SUPPKEY)  (cost=6706..6706 rows=1) (actual time=661e-6..661e-6 rows=521e-6 loops=237992)
                    -> Materialize with deduplication  (cost=6703..6703 rows=19929) (actual time=26.1..26.1 rows=10 loops=1)
                        -> Filter: (SUPPLIER.S_SUPPKEY is not null)  (cost=2111 rows=19929) (actual time=0.633..26.1 rows=10 loops=1)
                            -> Filter: (SUPPLIER.S_COMMENT like '%Customer%Complaints%')  (cost=2111 rows=19929) (actual time=0.632..26.1 rows=10 loops=1)
                                -> Table scan on SUPPLIER  (cost=2111 rows=19929) (actual time=0.0232..16.4 rows=20000 loops=1)

0.06s user 0.01s system 0% cpu 8.288 total

If we disable the true branch of the IF condition: https://github.com/mysql/mysql-server/blob/trunk/sql/sql_select.cc#L3261-L3269, we can see a performance improvement:
$ mysql tpch < 16.sql
EXPLAIN
-> Sort: supplier_cnt DESC, PART.P_BRAND, PART.P_TYPE, PART.P_SIZE  (actual time=2314..2316 rows=24525 loops=1)
    -> Stream results  (actual time=2178..2294 rows=24525 loops=1)
        -> Group aggregate: count(distinct PARTSUPP.ps_suppkey)  (actual time=2178..2289 rows=24525 loops=1)
            -> Sort: PART.P_BRAND, PART.P_TYPE, PART.P_SIZE  (actual time=2178..2196 rows=237868 loops=1)
                -> Stream results  (cost=2.51e+9 rows=12.6e+9) (actual time=28.5..1926 rows=237868 loops=1)
                    -> Hash antijoin (`<subquery2>`.s_suppkey = PARTSUPP.PS_SUPPKEY)  (cost=2.51e+9 rows=12.6e+9) (actual time=28.5..1869 rows=237868 loops=1)
                        -> Nested loop inner join  (cost=223097 rows=630252) (actual time=0.329..1806 rows=237992 loops=1)
                            -> Filter: ((PART.P_BRAND <> 'Brand#34') and (not((PART.P_TYPE like 'LARGE BRUSHED%'))) and (PART.P_SIZE in (48,19,12,4,41,7,21,39)))  (cost=43030 rows=158244) (actual time=0.311..661 rows=59498 loops=1)
                                -> Table scan on PART  (cost=43030 rows=395604) (actual time=0.305..563 rows=400000 loops=1)
                            -> Covering index lookup on PARTSUPP using PRIMARY (PS_PARTKEY=PART.P_PARTKEY)  (cost=0.74 rows=3.98) (actual time=0.018..0.0189 rows=4 loops=59498)
                        -> Hash
                            -> Table scan on <subquery2>  (cost=6703..6954 rows=19929) (actual time=28.2..28.2 rows=10 loops=1)
                                -> Materialize with deduplication  (cost=6703..6703 rows=19929) (actual time=28.2..28.2 rows=10 loops=1)
                                    -> Filter: (SUPPLIER.S_SUPPKEY is not null)  (cost=2111 rows=19929) (actual time=1.03..28.2 rows=10 loops=1)
                                        -> Filter: (SUPPLIER.S_COMMENT like '%Customer%Complaints%')  (cost=2111 rows=19929) (actual time=1.03..28.2 rows=10 loops=1)
                                            -> Table scan on SUPPLIER  (cost=2111 rows=19929) (actual time=0.249..17.9 rows=20000 loops=1)

0.04s user 0.02s system 2% cpu 2.389 total

The output result remains the same, but both execution time and cost number are reduced by more than 50%.
The difference between both query plans is that the first uses an index scan while the second uses a table scan.
It seems using an index unexpectedly increases the cost of the query plan.

Query 16:
EXPLAIN ANALYZE
select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    PARTSUPP,
    PART
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#34'
    and p_type not like 'LARGE BRUSHED%'
    and p_size in (48, 19, 12, 4, 41, 7, 21, 39)
    and ps_suppkey not in (
        select
            s_suppkey
        from
            SUPPLIER
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

TPC-H data scale: 2GB

How to repeat:
Compile MySQL in two versions, one is original and the other is with the changed IF condition. I have attached the patch file to remove the code for your reference.

Patch:
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 13584bf3860..b1d17f93b4f 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -3258,21 +3258,10 @@ bool JOIN::setup_semijoin_materialized_table(JOIN_TAB *tab, uint tableno,
       ((uint)tab->idx() == const_tables)
           ? 1.0
           : best_ref[tab->idx() - 1]->position()->prefix_rowcount;
-  if (!sjm_exec->is_scan) {
-    sjm_pos->key = keyuse->begin();  // MaterializeLookup will use the index
-    sjm_pos->read_cost =
-        emb_sj_nest->nested_join->sjm.lookup_cost.total_cost() * fanout;
-    tab->set_keyuse(keyuse->begin());
-    tab->keys().set_bit(0);  // There is one index - use it always
-    tab->set_index(0);
-    sjm_pos->rows_fetched = 1.0;
-    tab->set_type(JT_REF);
-  } else {
     sjm_pos->key = nullptr;  // No index use for MaterializeScan
     sjm_pos->read_cost = tab->read_time * fanout;
     sjm_pos->rows_fetched = static_cast<double>(tab->records());
     tab->set_type(JT_ALL);
-  }
   sjm_pos->set_prefix_join_cost((tab - join_tab), cost_model());
 
   return false;

To facilitate the reproduction, I also provide a docker image of our environment. 
```
docker run -it -p 3306:3306 jinshengba/mysql_tpch_2 /bin/bash
```

In the docker container, you can run the following commands to start the MySQL server:
Starting the original MySQL server:
/app/mysql-server/build/bin/mysqld --user=root --bind-address=0.0.0.0 --datadir=/app/mysql-data

Starting the modified MySQL server:
/app/mysql-server-changed/build/bin/mysqld --user=root --bind-address=0.0.0.0 --datadir=/app/mysql-data

In the host machine, you can connect to the MySQL server in the docker container by running:
mysql -h 127.0.0.1 -P 3306 -uroot -proot tpch < 16.sql

I observed that the number of cost and execution time fluctuate across executions, so I typically run the query multiple times to get a stable result.
[15 Oct 2024 18:42] JINSHENG BA
May I check whether there is any info I can provide to reproduce/analyze this case?
[16 Oct 2024 6:00] MySQL Verification Team
Hello Jinsheng Ba,

Thank you for the report and feedback.
My apologies for the delay. Could you please share exact make options used for the build, MySQL server configurations file used? Also,  I assume scale factor used is 1G(from your other Bug #116271). 

regards,
Umesh
[16 Oct 2024 6:10] JINSHENG BA
I used the default configuration options:
```
git clone https://github.com/mysql/mysql-server
cd mysql-server
git checkout 596f0d23
mkdir build
cd build
cmake
make -j$(nproc)
```

For the data scale, this case is based on 2GB. You can check the data in the docker container.
[16 Oct 2024 6:11] JINSHENG BA
The ".." is missed for cmake.

```
git clone https://github.com/mysql/mysql-server
cd mysql-server
git checkout 596f0d23
mkdir build
cd build
cmake ..
make -j$(nproc)
```
[16 Oct 2024 6:17] MySQL Verification Team
Thank you for the details.

regards,
Umesh
[21 Oct 2024 16:04] JINSHENG BA
May I ask whether it is reproducible? If so, is the MySQL community interested in such cases?
[21 Oct 2024 16:17] MySQL Verification Team
Hi JINSHENG BA,

Thank you  once again for follow up on this.
As development expect non-docker steps, I was trying to setup everything locally and verify on physical boxes. I had issues building TPC-H, which I fixed today(Thanks to George Ma, who had shared a copy and HOW TO on TPC-H in one of his bug report which I'm using here ) and hopefully by tomorrow will be able to verify/reproduce Bug #116309 &  Bug #116241.  Thank you once again and sorry for the delay.

regards,
Umesh
[22 Oct 2024 10:38] MySQL Verification Team
Thank you for being patient with me on this.
I'm not sure how much this patch fixes the issue but observed the behaviour and hope that development team would take a look at this and take a call further on the suggested patch.  I'll be joining the verification results file shortly. Thank you.

Note - Synopsis says "YPC-H Query 8" but How to/Description uses "Query 16", do you mind if I change Synopsis to reflect the same? Thank you.

regards,
Umesh
[22 Oct 2024 10:39] MySQL Verification Team
9.1.0 - test results (with and without the suggested patch)

Attachment: 116309.results.txt (text/plain), 18.30 KiB.

[22 Oct 2024 11:32] JINSHENG BA
>I had issues building TPC-H.

Sorry for that! My apologies for using docker. Actually, I am happy to provide my entire data folder or any other info if you need it.  

>I'm not sure how much this patch fixes the issue but observed the behaviour and hope that development team would take a look at this and take a call further on the suggested patch. 

Thanks for looking into it! Yes, I should clarify that it is not a proposed correct patch. Instead, I want to show that there exists a much more efficient query plan for executing TPC-H benchmark, and I am wondering whether we can optimize the the query optimizer to enable the more efficient query plan in default. I believe, at least, it improves the execution efficiency on the TPC-H benchmark.