Bug #116271 Performance of TPC-H Query 4
Submitted: 30 Sep 2024 20:34 Modified: 14 Nov 2024 22:18
Reporter: JINSHENG BA Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:596f0d23 (9.0.0) OS:Any
Assigned to: CPU Architecture:Any

[30 Sep 2024 20:34] JINSHENG BA
Description:
A possible way to improve the performance of executing query 4 in the TPC-H benchmark.

For the original execution:

$ time mysql tpch < 4.sql
o_orderpriority order_count
1-URGENT        5176
2-HIGH  5311
3-MEDIUM        5164
4-NOT SPECIFIED 5182
5-LOW   5210
mysql  0.04s user 0.01s system 0% cpu 8.764 total

If we run the query with EXPLAIN ANALYZE:
$ time mysql tpch < 4_explain.sql
EXPLAIN
-> Sort: ORDERS.O_ORDERPRIORITY  (actual time=15345..15345 rows=5 loops=1)
    -> Table scan on <temporary>  (actual time=15345..15345 rows=5 loops=1)
        -> Aggregate using temporary table  (actual time=15345..15345 rows=5 loops=1)
            -> Nested loop semijoin  (cost=184975 rows=108984) (actual time=2.96..15330 rows=26043 loops=1)
                -> Filter: ((ORDERS.O_ORDERDATE >= DATE'1995-01-01') and (ORDERS.O_ORDERDATE < <cache>((DATE'1995-01-01' + interval '3' month))))  (cost=78009 rows=82410) (actual time=2.37..2845 rows=28396 loops=1)
                    -> Table scan on ORDERS  (cost=78009 rows=741842) (actual time=2.36..2792 rows=750000 loops=1)
                -> Filter: (LINEITEM.L_COMMITDATE < LINEITEM.L_RECEIPTDATE)  (cost=1.19 rows=1.32) (actual time=0.44..0.44 rows=0.917 loops=28396)
                    -> Index lookup on LINEITEM using PRIMARY (L_ORDERKEY=ORDERS.O_ORDERKEY)  (cost=1.19 rows=3.97) (actual time=0.439..0.439 rows=1.45 loops=28396)
mysql  0.03s user 0.01s system 0% cpu 15.395 total

If we remove the lines 3388, 3390-3302 in sql_select.cc: https://github.com/mysql/mysql-server/blob/trunk/sql/sql_select.cc#L3388, we can see a performance improvement:

$ time mysql tpch < 4.sql
o_orderpriority order_count
1-URGENT        5176
2-HIGH  5311
3-MEDIUM        5164
4-NOT SPECIFIED 5182
5-LOW   5210
mysql  0.03s user 0.01s system 1% cpu 4.191 total

$ time mysql tpch < 4_explain.sql
-> Sort: ORDERS.O_ORDERPRIORITY  (actual time=8103..8103 rows=5 loops=1)
    -> Table scan on <temporary>  (actual time=8103..8103 rows=5 loops=1)
        -> Aggregate using temporary table  (actual time=8103..8103 rows=5 loops=1)
            -> Nested loop semijoin  (cost=70262 rows=0) (actual time=7.53..8089 rows=26043 loops=1)
                -> Filter: ((ORDERS.O_ORDERDATE >= DATE'1995-01-01') and (ORDERS.O_ORDERDATE < <cache>((DATE'1995-01-01' + interval '3' month))))  (cost=0 rows=0) (actual time=7.16..2176 rows=28396 loops=1)
                    -> Table scan on ORDERS  (cost=0 rows=741842) (actual time=7.15..2124 rows=750000 loops=1)
                -> Filter: (LINEITEM.L_COMMITDATE < LINEITEM.L_RECEIPTDATE)  (cost=1.13 rows=1.32) (actual time=0.208..0.208 rows=0.917 loops=28396)
                    -> Index lookup on LINEITEM using PRIMARY (L_ORDERKEY=ORDERS.O_ORDERKEY)  (cost=1.13 rows=3.97) (actual time=0.208..0.208 rows=1.45 loops=28396)
mysql  0.03s user 0.01s system 0% cpu 8.149 total

Query 4:
```
select
    o_orderpriority,
    count(*) as order_count
from
    ORDERS
where
    o_orderdate >= date '1995-01-01'
    and o_orderdate < date '1995-01-01' + interval '3' month
    and exists (
        select
            *
        from
            LINEITEM
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;
```

For other queries, such as 5 12 13 14, we also observed a performance improvement.

How to repeat:
Compile MySQL in two versions, one is original and the other is with the changed IF condition. I have attached the patch file to remove the code for your reference.

diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 13584bf3860..05272d98a2d 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -3385,11 +3385,7 @@ bool make_join_readinfo(JOIN *join, uint no_jbuf_after) {
               static_cast<double>(table->file->stats.records);
           if (tab->position()->filter_effect != COND_FILTER_STALE) {
             // Constant condition moves to filter_effect:
-            if (tab->position()->rows_fetched == 0)  // avoid division by zero
               tab->position()->filter_effect = 0.0f;
-            else
-              tab->position()->filter_effect *= static_cast<float>(
-                  rows_w_const_cond / tab->position()->rows_fetched);
           }
         }
         break;
[2 Oct 2024 12:55] Øystein Grøvlen
I am not able to reproduce this.
Which scale factor is used, and what indexes have been created on the tables?
What are the settings for innodb_buffer_pool_size and innodb_adaptive_hash_index?
[2 Oct 2024 19:12] JINSHENG BA
Thanks for looking into it!

I am sorry that you cannot reproduce it in your environment.

I had a try just now. I believe you can observe the number of cost in query plans is always reduced. However, for execution time, I can observe the execution time difference on my PC but cannot observe it on another PC. I have not figured out the reason. Not sure it is due to cache. I will report again if I can reproduce it.

For the information you asked:
TPC-H: 1GB.

mysql> SELECT @@innodb_adaptive_hash_index;
+------------------------------+
| @@innodb_adaptive_hash_index |
+------------------------------+
|                            0 |
+------------------------------+
1 row in set (0.00 sec)

mysql> show index from ORDERS;
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table  | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| ORDERS |          0 | PRIMARY    |            1 | O_ORDERKEY  | A         |     2887534 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| ORDERS |          1 | ORDERS_FK1 |            1 | O_CUSTKEY   | A         |      197859 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.01 sec)

mysql> show index from LINEITEM;
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| LINEITEM |          0 | PRIMARY  |            1 | L_ORDERKEY   | A         |     2848228 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| LINEITEM |          0 | PRIMARY  |            2 | L_LINENUMBER | A         |    11525904 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+