Bug #100471 | optimizer doesn't consider indexes over FTS when ICP will speed up access | ||
---|---|---|---|
Submitted: | 9 Aug 2020 11:41 | Modified: | 10 Aug 2020 12:57 |
Reporter: | Justin Swanhart | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.0.21 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[9 Aug 2020 11:41]
Justin Swanhart
[9 Aug 2020 11:44]
Justin Swanhart
Note that the rows estimate is wrong by a lot for the ICP access: mysql> select count(*) from lineorderi force index (lo_orderdatekey_2) where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G *************************** 1. row *************************** count(*): 118598 1 row in set (0.66 sec) The optimizer appears not to be considering the filters past the range on LO_OrderDateKey for the rows estimate.
[10 Aug 2020 11:32]
Frederic Descamps
Hi Justin, Would be nice to see the output of EXPLAIN ANALYZE too. Additionally, did you check the values of mysql.server_cost and mysql.engine_cost ? Cheers,
[10 Aug 2020 12:07]
Justin Swanhart
-- with ICP mysql> explain analyze select sum(lo_extendedprice * lo_discount) as revenue from lineorderi force index (lo_orderdatekey_2) where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G *************************** 1. row *************************** EXPLAIN: -> Aggregate: sum((lineorderi.LO_ExtendedPrice * lineorderi.LO_Discount)) (actual time=595.407..595.408 rows=1 loops=1) -> Index range scan on lineorderi using LO_OrderDateKey_2, with index condition: ((lineorderi.LO_OrderDateKey between 19930101 and 19931231) and (lineorderi.LO_Discount between 1 and 3) and (lineorderi.LO_Quantity < 25)) (cost=831924.26 rows=1848720) (actual time=0.027..563.768 rows=118598 loops=1) 1 row in set (0.60 sec) -- without ICP (no FORCE INDEX) mysql> explain analyze select sum(lo_extendedprice * lo_discount) as revenue from lineorderi where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G *************************** 1. row *************************** EXPLAIN: -> Aggregate: sum((lineorderi.LO_ExtendedPrice * lineorderi.LO_Discount)) (actual time=3948.115..3948.115 rows=1 loops=1) -> Filter: ((lineorderi.LO_OrderDateKey between 19930101 and 19931231) and (lineorderi.LO_Discount between 1 and 3) and (lineorderi.LO_Quantity < 25)) (cost=590305.29 rows=67224) (actual time=0.029..3921.234 rows=118598 loops=1) -> Table scan on lineorderi (cost=590305.29 rows=5808653) (actual time=0.015..2760.146 rows=6001215 loops=1) 1 row in set (3.95 sec)
[10 Aug 2020 12:13]
Justin Swanhart
The row estimate for the ICP is about 31% of the table: mysql> select 1848720 / 6000000 * 100; +-------------------------+ | 1848720 / 6000000 * 100 | +-------------------------+ | 30.8120 | +-------------------------+ 1 row in set (0.00 sec) But with ICP the actual rows examined are only 118598 which is 2% of the table. MySQL weighs the index access heavier than the table scan without taking into account that ICP will reduce the number of rows in the scan from 30% to 2%. If it properly estimated the ICP scan, then the row estimate would be 2% and the index would be used.
[10 Aug 2020 12:57]
MySQL Verification Team
Hello Justin, Thank you for the report. regards, Umesh
[10 Aug 2020 12:59]
MySQL Verification Team
MySQL Server 8.0.21 test results
Attachment: 100471_8.0.21.results.txt (text/plain), 6.94 KiB.