Bug #100471 optimizer doesn't consider indexes over FTS when ICP will speed up access
Submitted: 9 Aug 2020 11:41 Modified: 10 Aug 2020 12:57
Reporter: Justin Swanhart Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.21 OS:Any
Assigned to: CPU Architecture:Any

[9 Aug 2020 11:41] Justin Swanhart
Description:
This test uses the SSB dataset at scale factor 1.  You can generate the data using http://github.com/greenlion/ssb-dbgen
make
./dbgen -T l
Note the file is | separated.

Here is the table:
CREATE TABLE `lineorderi` (
  `LO_OrderKey` bigint DEFAULT NULL,
  `LO_LineNumber` tinyint DEFAULT NULL,
  `LO_CustKey` int DEFAULT NULL,
  `LO_PartKey` int DEFAULT NULL,
  `LO_SuppKey` int DEFAULT NULL,
  `LO_OrderDateKey` int DEFAULT NULL,
  `LO_OrderPriority` varchar(15) DEFAULT NULL,
  `LO_ShipPriority` char(1) DEFAULT NULL,
  `LO_Quantity` tinyint DEFAULT NULL,
  `LO_ExtendedPrice` int DEFAULT NULL,
  `LO_OrdTotalPrice` int DEFAULT NULL,
  `LO_Discount` int DEFAULT NULL,
  `LO_Revenue` int DEFAULT NULL,
  `LO_SupplyCost` int DEFAULT NULL,
  `LO_Tax` tinyint DEFAULT NULL,
  `LO_CommitDateKey` int DEFAULT NULL,
  `LO_ShipMode` varchar(10) DEFAULT NULL,
  KEY `LO_OrderDateKey` (`LO_OrderDateKey`),
  KEY `LO_OrderDateKey_2` (`LO_OrderDateKey`,`LO_Quantity`,`LO_Discount`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

analyze table lineorderi; 

How to repeat:
mysql> explain select sum(lo_extendedprice*lo_discount) as revenue from lineorderi  where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineorderi
   partitions: NULL
         type: ALL
possible_keys: LO_OrderDateKey,LO_OrderDateKey_2
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5808653
     filtered: 1.16
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> select sum(lo_extendedprice*lo_discount) as revenue from lineorderi  where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G
*************************** 1. row ***************************
revenue: 446031203850
1 row in set (3.50 sec)

Now with ICP:
mysql> explain select sum(lo_extendedprice*lo_discount) as revenue from lineorderi force index (lo_orderdatekey_2) where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity <
25\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineorderi
   partitions: NULL
         type: range
possible_keys: LO_OrderDateKey_2
          key: LO_OrderDateKey_2
      key_len: 7
          ref: NULL
         rows: 1848720
     filtered: 3.70
        Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

mysql> select sum(lo_extendedprice*lo_discount) as revenue from lineorderi force index (lo_orderdatekey_2) where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G
*************************** 1. row ***************************
revenue: 446031203850
1 row in set (0.55 sec)

Note that using ICP is about 3 seconds (approximately 6x) faster than using a FTS.

Suggested fix:
Use better stats?
[9 Aug 2020 11:44] Justin Swanhart
Note that the rows estimate is wrong by a lot for the ICP access:
mysql> select count(*) from lineorderi force index (lo_orderdatekey_2) where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G
*************************** 1. row ***************************
count(*): 118598
1 row in set (0.66 sec)

The optimizer appears not to be considering the filters past the range on LO_OrderDateKey for the rows estimate.
[10 Aug 2020 11:32] Frederic Descamps
Hi Justin, 

Would be nice to see the output of EXPLAIN ANALYZE too.

Additionally, did you check the values of mysql.server_cost and mysql.engine_cost ?

Cheers,
[10 Aug 2020 12:07] Justin Swanhart
-- with ICP
mysql> explain analyze select sum(lo_extendedprice * lo_discount) as revenue from lineorderi force index (lo_orderdatekey_2) where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: sum((lineorderi.LO_ExtendedPrice * lineorderi.LO_Discount))  (actual time=595.407..595.408 rows=1 loops=1)
    -> Index range scan on lineorderi using LO_OrderDateKey_2, with index condition: ((lineorderi.LO_OrderDateKey between 19930101 and 19931231) and (lineorderi.LO_Discount between 1 and 3) and (lineorderi.LO_Quantity < 25))  (cost=831924.26 rows=1848720) (actual time=0.027..563.768 rows=118598 loops=1)
1 row in set (0.60 sec)

-- without ICP (no FORCE INDEX)
mysql> explain analyze select sum(lo_extendedprice * lo_discount) as revenue from lineorderi where lo_orderdatekey between 19930101 and 19931231 and LO_discount between 1 and 3 and LO_quantity < 25\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: sum((lineorderi.LO_ExtendedPrice * lineorderi.LO_Discount))  (actual time=3948.115..3948.115 rows=1 loops=1)
    -> Filter: ((lineorderi.LO_OrderDateKey between 19930101 and 19931231) and (lineorderi.LO_Discount between 1 and 3) and (lineorderi.LO_Quantity < 25))  (cost=590305.29 rows=67224) (actual time=0.029..3921.234 rows=118598 loops=1)
        -> Table scan on lineorderi  (cost=590305.29 rows=5808653) (actual time=0.015..2760.146 rows=6001215 loops=1)
1 row in set (3.95 sec)
[10 Aug 2020 12:13] Justin Swanhart
The row estimate for the ICP is about 31% of the table:
mysql> select 1848720 / 6000000 * 100;
+-------------------------+
| 1848720 / 6000000 * 100 |
+-------------------------+
|                 30.8120 |
+-------------------------+
1 row in set (0.00 sec)

But with ICP the actual rows examined are only 118598 which is 2% of the table.  MySQL weighs the index access heavier than the table scan without taking into account that ICP will reduce the number of rows in the scan from 30% to 2%.  If it properly estimated the ICP scan, then the row estimate would be 2% and the index would be used.
[10 Aug 2020 12:57] MySQL Verification Team
Hello Justin,

Thank you for the report.

regards,
Umesh
[10 Aug 2020 12:59] MySQL Verification Team
MySQL Server 8.0.21 test results

Attachment: 100471_8.0.21.results.txt (text/plain), 6.94 KiB.