Bug #103272 condition_fanout_filter downgrade performance on HDD
Submitted: 10 Apr 2021 5:22 Modified: 12 Apr 2021 12:51
Reporter: haochen he Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.22 OS:Ubuntu (20.10-Desktop)
Assigned to: CPU Architecture:x86 (AMD Ryzen 9 3900XT)
Tags: performance

[10 Apr 2021 5:22] haochen he
Description:
MySQL startup configuration:

  bin/mysqld_safe --user=mysql --optimizer-switch=condition_fanout_filter=on/off --innodb-flush-method=O_DIRECT &

Query to run (TPC-H query-19)
  
  https://github.com/catarinaribeir0/queries-tpch-dbgen-mysql/blob/master/19.sql

Observation:

  condition_fanout_filter=on
     28.82 seconds

  condition_fanout_filter=off
      9.85 seconds

Using profile when turning on the parameter, I found it is very strange that the time is not spent in CPU:

+--------------------------------+-----------+----------+------------+--------------+---------------+
| Status                         | Duration  | CPU_user | CPU_system | Block_ops_in | Block_ops_out |
+--------------------------------+-----------+----------+------------+--------------+---------------+
| starting                       |  0.000143 | 0.000073 |   0.000068 |            0 |             0 |
| Executing hook on transaction  |  0.000005 | 0.000003 |   0.000002 |            0 |             0 |
| starting                       |  0.000009 | 0.000005 |   0.000005 |            0 |             0 |
| checking permissions           |  0.000005 | 0.000002 |   0.000002 |            0 |             0 |
| checking permissions           |  0.000005 | 0.000003 |   0.000002 |            0 |             0 |
| Opening tables                 |  0.000084 | 0.000043 |   0.000040 |            0 |             0 |
| init                           |  0.000007 | 0.000004 |   0.000004 |            0 |             0 |
| System lock                    |  0.000010 | 0.000005 |   0.000004 |            0 |             0 |
| optimizing                     |  0.000044 | 0.000023 |   0.000022 |            0 |             0 |
| statistics                     |  0.000054 | 0.000028 |   0.000025 |            0 |             0 |
| preparing                      |  0.000044 | 0.000023 |   0.000021 |            0 |             0 |
| executing                      | 28.824988 | 0.637214 |   0.248319 |       323232 |             0 |
| end                            |  0.000018 | 0.000008 |   0.000007 |            0 |             0 |
| query end                      |  0.000006 | 0.000003 |   0.000003 |            0 |             0 |
| waiting for handler commit     |  0.000011 | 0.000006 |   0.000005 |            0 |             0 |
| closing tables                 |  0.000011 | 0.000006 |   0.000005 |            0 |             0 |
| freeing items                  |  0.000037 | 0.000020 |   0.000017 |            0 |             0 |
| cleaning up                    |  0.000012 | 0.000006 |   0.000005 |            0 |             0 |
+--------------------------------+-----------+----------+------------+--------------+---------------+
with EXPLAIN
+----+-------------+----------+------------+------+---------------+--------------+---------+---------------------+--------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key          | key_len | ref                 | rows   | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+--------------+---------+---------------------+--------+----------+-------------+
|  1 | SIMPLE      | PART     | NULL       | ALL  | PRIMARY       | NULL         | NULL    | NULL                | 198000 |     1.33 | Using where |
|  1 | SIMPLE      | LINEITEM | NULL       | ref  | LINEITEM_FK2  | LINEITEM_FK2 | 4       | tpch.PART.P_PARTKEY |     30 |     0.67 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+---------------------+--------+----------+-------------+

So maybe it is because of IO.
But after I profile the query with "condition_fanout_filter=off", I found the IO read are larger:

+--------------------------------+----------+----------+------------+--------------+---------------+
| Status                         | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out |
+--------------------------------+----------+----------+------------+--------------+---------------+
| starting                       | 0.000138 | 0.000125 |   0.000013 |            0 |             0 |
| Executing hook on transaction  | 0.000006 | 0.000005 |   0.000000 |            0 |             0 |
| starting                       | 0.000009 | 0.000008 |   0.000001 |            0 |             0 |
| checking permissions           | 0.000006 | 0.000005 |   0.000001 |            0 |             0 |
| checking permissions           | 0.000005 | 0.000004 |   0.000000 |            0 |             0 |
| Opening tables                 | 0.000084 | 0.000077 |   0.000008 |            0 |             0 |
| init                           | 0.000008 | 0.000007 |   0.000000 |            0 |             0 |
| System lock                    | 0.000010 | 0.000008 |   0.000001 |            0 |             0 |
| optimizing                     | 0.000044 | 0.000040 |   0.000004 |            0 |             0 |
| statistics                     | 0.000045 | 0.000041 |   0.000005 |            0 |             0 |
| preparing                      | 0.000041 | 0.000037 |   0.000003 |            0 |             0 |
| executing                      | 9.852750 | 5.524909 |   0.392290 |      1784064 |             0 |
| end                            | 0.000015 | 0.000013 |   0.000001 |            0 |             0 |
| query end                      | 0.000005 | 0.000004 |   0.000000 |            0 |             0 |
| waiting for handler commit     | 0.000009 | 0.000008 |   0.000001 |            0 |             0 |
| closing tables                 | 0.000010 | 0.000009 |   0.000001 |            0 |             0 |
| freeing items                  | 0.000027 | 0.000040 |   0.000003 |            0 |             0 |
| cleaning up                    | 0.000009 | 0.000008 |   0.000001 |            0 |             0 |
+--------------------------------+----------+----------+------------+--------------+---------------+
with EXPLAIN:
+----+-------------+----------+------------+--------+---------------+---------+---------+-------------------------+---------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                     | rows    | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+-------------------------+---------+----------+-------------+
|  1 | SIMPLE      | LINEITEM | NULL       | ALL    | LINEITEM_FK2  | NULL    | NULL    | NULL                    | 5937589 |   100.00 | Using where |
|  1 | SIMPLE      | PART     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | tpch.LINEITEM.L_PARTKEY |       1 |   100.00 | Using where |
+----+-------------+----------+------------+--------+---------------+---------+---------+-------------------------+---------+----------+-------------+

*** However ***
When either not using O_DIRECT or change HDD to an SSD, this disappears.

So why this happens?
Thanks!

How to repeat:
above

Suggested fix:
unknown
[11 Apr 2021 19:04] Øystein Grøvlen
Which query plan is best will depend on how disk-bound the query is.  Often, a primary index will work better than a secondary index when the query is disk-bound.  In this case, this means that the join order (part, lineitem) is best when data is in memory, while the join order (lineitem, part) is better when data is on disk.

Which join order is chosen will, as you show, depend on the filtering estimate. Since it seems you have not created any histograms on the involved columns, the optimizer will have to make a guess on the filtering effect.

Anyhow, if you want joins using secondary indexes to perform better in a disk-bound setting, you should turn on batched-key access:
set optimizer_switch='batched_key_access=on,mrr_cost_based=off';
[12 Apr 2021 12:51] MySQL Verification Team
Hi Mr. he,

Thank you, very much, on your bug report.

However, we do agree with our ex-colleague Grovlen and his explanations.

Hence, this is not a bug.