| Bug #116271 | Performance of TPC-H Query 4 | ||
|---|---|---|---|
| Submitted: | 30 Sep 2024 20:34 | Modified: | 14 Nov 2024 22:18 |
| Reporter: | JINSHENG BA | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server | Severity: | S5 (Performance) |
| Version: | 596f0d23 (9.0.0) | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[2 Oct 2024 12:55]
Øystein Grøvlen
I am not able to reproduce this. Which scale factor is used, and what indexes have been created on the tables? What are the settings for innodb_buffer_pool_size and innodb_adaptive_hash_index?
[2 Oct 2024 19:12]
JINSHENG BA
Thanks for looking into it! I am sorry that you cannot reproduce it in your environment. I had a try just now. I believe you can observe the number of cost in query plans is always reduced. However, for execution time, I can observe the execution time difference on my PC but cannot observe it on another PC. I have not figured out the reason. Not sure it is due to cache. I will report again if I can reproduce it. For the information you asked: TPC-H: 1GB. mysql> SELECT @@innodb_adaptive_hash_index; +------------------------------+ | @@innodb_adaptive_hash_index | +------------------------------+ | 0 | +------------------------------+ 1 row in set (0.00 sec) mysql> show index from ORDERS; +--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression | +--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | ORDERS | 0 | PRIMARY | 1 | O_ORDERKEY | A | 2887534 | NULL | NULL | | BTREE | | | YES | NULL | | ORDERS | 1 | ORDERS_FK1 | 1 | O_CUSTKEY | A | 197859 | NULL | NULL | | BTREE | | | YES | NULL | +--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ 2 rows in set (0.01 sec) mysql> show index from LINEITEM; +----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression | +----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+ | LINEITEM | 0 | PRIMARY | 1 | L_ORDERKEY | A | 2848228 | NULL | NULL | | BTREE | | | YES | NULL | | LINEITEM | 0 | PRIMARY | 2 | L_LINENUMBER | A | 11525904 | NULL | NULL | | BTREE | | | YES | NULL | +----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

Description: A possible way to improve the performance of executing query 4 in the TPC-H benchmark. For the original execution: $ time mysql tpch < 4.sql o_orderpriority order_count 1-URGENT 5176 2-HIGH 5311 3-MEDIUM 5164 4-NOT SPECIFIED 5182 5-LOW 5210 mysql 0.04s user 0.01s system 0% cpu 8.764 total If we run the query with EXPLAIN ANALYZE: $ time mysql tpch < 4_explain.sql EXPLAIN -> Sort: ORDERS.O_ORDERPRIORITY (actual time=15345..15345 rows=5 loops=1) -> Table scan on <temporary> (actual time=15345..15345 rows=5 loops=1) -> Aggregate using temporary table (actual time=15345..15345 rows=5 loops=1) -> Nested loop semijoin (cost=184975 rows=108984) (actual time=2.96..15330 rows=26043 loops=1) -> Filter: ((ORDERS.O_ORDERDATE >= DATE'1995-01-01') and (ORDERS.O_ORDERDATE < <cache>((DATE'1995-01-01' + interval '3' month)))) (cost=78009 rows=82410) (actual time=2.37..2845 rows=28396 loops=1) -> Table scan on ORDERS (cost=78009 rows=741842) (actual time=2.36..2792 rows=750000 loops=1) -> Filter: (LINEITEM.L_COMMITDATE < LINEITEM.L_RECEIPTDATE) (cost=1.19 rows=1.32) (actual time=0.44..0.44 rows=0.917 loops=28396) -> Index lookup on LINEITEM using PRIMARY (L_ORDERKEY=ORDERS.O_ORDERKEY) (cost=1.19 rows=3.97) (actual time=0.439..0.439 rows=1.45 loops=28396) mysql 0.03s user 0.01s system 0% cpu 15.395 total If we remove the lines 3388, 3390-3302 in sql_select.cc: https://github.com/mysql/mysql-server/blob/trunk/sql/sql_select.cc#L3388, we can see a performance improvement: $ time mysql tpch < 4.sql o_orderpriority order_count 1-URGENT 5176 2-HIGH 5311 3-MEDIUM 5164 4-NOT SPECIFIED 5182 5-LOW 5210 mysql 0.03s user 0.01s system 1% cpu 4.191 total $ time mysql tpch < 4_explain.sql -> Sort: ORDERS.O_ORDERPRIORITY (actual time=8103..8103 rows=5 loops=1) -> Table scan on <temporary> (actual time=8103..8103 rows=5 loops=1) -> Aggregate using temporary table (actual time=8103..8103 rows=5 loops=1) -> Nested loop semijoin (cost=70262 rows=0) (actual time=7.53..8089 rows=26043 loops=1) -> Filter: ((ORDERS.O_ORDERDATE >= DATE'1995-01-01') and (ORDERS.O_ORDERDATE < <cache>((DATE'1995-01-01' + interval '3' month)))) (cost=0 rows=0) (actual time=7.16..2176 rows=28396 loops=1) -> Table scan on ORDERS (cost=0 rows=741842) (actual time=7.15..2124 rows=750000 loops=1) -> Filter: (LINEITEM.L_COMMITDATE < LINEITEM.L_RECEIPTDATE) (cost=1.13 rows=1.32) (actual time=0.208..0.208 rows=0.917 loops=28396) -> Index lookup on LINEITEM using PRIMARY (L_ORDERKEY=ORDERS.O_ORDERKEY) (cost=1.13 rows=3.97) (actual time=0.208..0.208 rows=1.45 loops=28396) mysql 0.03s user 0.01s system 0% cpu 8.149 total Query 4: ``` select o_orderpriority, count(*) as order_count from ORDERS where o_orderdate >= date '1995-01-01' and o_orderdate < date '1995-01-01' + interval '3' month and exists ( select * from LINEITEM where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority; ``` For other queries, such as 5 12 13 14, we also observed a performance improvement. How to repeat: Compile MySQL in two versions, one is original and the other is with the changed IF condition. I have attached the patch file to remove the code for your reference. diff --git a/sql/sql_select.cc b/sql/sql_select.cc index 13584bf3860..05272d98a2d 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -3385,11 +3385,7 @@ bool make_join_readinfo(JOIN *join, uint no_jbuf_after) { static_cast<double>(table->file->stats.records); if (tab->position()->filter_effect != COND_FILTER_STALE) { // Constant condition moves to filter_effect: - if (tab->position()->rows_fetched == 0) // avoid division by zero tab->position()->filter_effect = 0.0f; - else - tab->position()->filter_effect *= static_cast<float>( - rows_w_const_cond / tab->position()->rows_fetched); } } break;