Bug #105801 ha_innopart::records_in_range spend too much cpu cost
Submitted: 5 Dec 2021 3:35 Modified: 8 Dec 2021 13:25
Reporter: raolh rao Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Partitions Severity:S5 (Performance)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[5 Dec 2021 3:35] raolh rao
Description:
Since 5.7 version, partition table' code has been refactored, and much logic has been changed, including estimate records' number in a range function, which is used by optimizer.

In early 5.6 version, the partition table's estimated function is ha_partition::records_in_range, while sine 5.7 it use another function named ha_innpart::records_in_range. this two function has different logic to estimate records in a range of a partition table. the latter is much high cost.

If a query's used parttions number is N, ha_innpart::records_in_range need scan all N parttions to calculate total records. but ha_partition:: record_in_range only scan (logN - 1) partitions to estimate total records. so ha_innpart::records_in_range need use more cpu cost. that will lead huge performane difference in high concurency situation.

For example, if a query use 128 partition. ha_innpart::records_in_range need scan 128 partition, but ha_partition:: record_in_range only scan (log128 -1) = 6 partiton, 

This two function's logic like this:
```
ha_innpart::records_in_range:
  for (part_id = m_part_info->get_first_used_partition(part_id);
        part_id < m_tot_parts;
        part_id = m_part_info->get_next_used_partition(part_id)) {
     n = btr_estimate_n_rows_in_range(index ...)
     n_rows += n;
     ...
}
```

```
ha_partition::records_in_range:
  min_rows_to_check= min_rows_for_estimate();
  while ((part_id= get_biggest_used_partition(&partition_index)))
  {
     rows= m_file[part_id]->records_in_range(inx, min_key, max_key);
     estimated_rows+= rows;
     if (checked_rows >= min_rows_to_check)
       return;
}

```

How to repeat:
sysbench simple range query test on partition table, use 200 threads' concurrency.

Suggested fix:
ha_innpart::records_in_range also only scan logN-1 partiton instead N partitions
[6 Dec 2021 13:17] MySQL Verification Team
HI,

Thank you Mr. rao, for creating this report.

However, we can not accept the report based only on one single case. Optimiser has to be come with the best solution for MOST of the queries, out there in the world, not just for one particular query.

Hence, we would require several full test cases with totally different queries to be optimised, so that new method of calculating records in the range provides significantly worse performance than the old algorithm. Also, 200 concurrent threads do not just test the optimiser, but also the entire MySQL server. Hence, we need several  heavy and completely different single-thread queries that will always run slower on 5.7 and 8.0.

Then and only then we shall be able to test your hypothesis.
[6 Dec 2021 13:42] MySQL Verification Team
Hi,

We also have to know what is the CPU usage with those 200 threads. If it is very high, then please try 20 threads or similar and then compare with 5.6. If it is it using 10% or 50% of the CPU on checking those ranges on 20 concurrent threads, then this could be considered a performance bug.

Last, but not least, please check whether condition is pushed down to storage engine or not. You can use EXPLAIN for that purpose.
[7 Dec 2021 2:02] raolh rao
I upload the cpu usage picture for ha_innopart::records_in_range. the total cpu usage is 100%, while ha_innopart::records_in_range is 70%.

this is the test case:

CREATE TABLE `userid_10781986` (
  `cid` bigint(20) NOT NULL,
  `type` int(11) NOT NULL,
  `uid` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  `upid` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  `ctime` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`cid`,`type`,`uid`,`upid`) USING BTREE,
  KEY `upid` (`upid`,`cid`,`type`,`uid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=DYNAMIC
/*!50100 PARTITION BY KEY (cid,`type`,uid)
(PARTITION p0 ENGINE = InnoDB,
 PARTITION p1 ENGINE = InnoDB,
 PARTITION p10 ENGINE = InnoDB,
 PARTITION p100 ENGINE = InnoDB,
 PARTITION p101 ENGINE = InnoDB,
 PARTITION p102 ENGINE = InnoDB,
 PARTITION p103 ENGINE = InnoDB,
 PARTITION p104 ENGINE = InnoDB,
 PARTITION p105 ENGINE = InnoDB,
 PARTITION p106 ENGINE = InnoDB,
 PARTITION p107 ENGINE = InnoDB,
 PARTITION p108 ENGINE = InnoDB,
 PARTITION p109 ENGINE = InnoDB,
 PARTITION p11 ENGINE = InnoDB,
 PARTITION p110 ENGINE = InnoDB,
 PARTITION p111 ENGINE = InnoDB,
 PARTITION p112 ENGINE = InnoDB,
 PARTITION p113 ENGINE = InnoDB,
 PARTITION p114 ENGINE = InnoDB,
 PARTITION p115 ENGINE = InnoDB,
 PARTITION p116 ENGINE = InnoDB,
 PARTITION p117 ENGINE = InnoDB,
 PARTITION p118 ENGINE = InnoDB,
 PARTITION p119 ENGINE = InnoDB,
 PARTITION p12 ENGINE = InnoDB,
 PARTITION p120 ENGINE = InnoDB,
 PARTITION p121 ENGINE = InnoDB,
 PARTITION p122 ENGINE = InnoDB,
 PARTITION p123 ENGINE = InnoDB,
 PARTITION p124 ENGINE = InnoDB,
 PARTITION p125 ENGINE = InnoDB,
 PARTITION p126 ENGINE = InnoDB,
 PARTITION p127 ENGINE = InnoDB,
 PARTITION p13 ENGINE = InnoDB,
 PARTITION p14 ENGINE = InnoDB,
 PARTITION p15 ENGINE = InnoDB,
 PARTITION p16 ENGINE = InnoDB,
 PARTITION p17 ENGINE = InnoDB,
 PARTITION p18 ENGINE = InnoDB,
 PARTITION p19 ENGINE = InnoDB,
 PARTITION p2 ENGINE = InnoDB,
 PARTITION p20 ENGINE = InnoDB,
 PARTITION p21 ENGINE = InnoDB,
 PARTITION p22 ENGINE = InnoDB,
 PARTITION p23 ENGINE = InnoDB,
 PARTITION p24 ENGINE = InnoDB,
 PARTITION p25 ENGINE = InnoDB,
 PARTITION p26 ENGINE = InnoDB,
 PARTITION p27 ENGINE = InnoDB,
 PARTITION p28 ENGINE = InnoDB,
 PARTITION p29 ENGINE = InnoDB,
 PARTITION p3 ENGINE = InnoDB,
 PARTITION p30 ENGINE = InnoDB,
 PARTITION p31 ENGINE = InnoDB,
 PARTITION p32 ENGINE = InnoDB,
 PARTITION p33 ENGINE = InnoDB,
 PARTITION p34 ENGINE = InnoDB,
 PARTITION p35 ENGINE = InnoDB,
 PARTITION p36 ENGINE = InnoDB,
 PARTITION p37 ENGINE = InnoDB,
 PARTITION p38 ENGINE = InnoDB,
 PARTITION p39 ENGINE = InnoDB,
 PARTITION p4 ENGINE = InnoDB,
 PARTITION p40 ENGINE = InnoDB,
 PARTITION p41 ENGINE = InnoDB,
 PARTITION p42 ENGINE = InnoDB,
 PARTITION p43 ENGINE = InnoDB,
 PARTITION p44 ENGINE = InnoDB,
 PARTITION p45 ENGINE = InnoDB,
 PARTITION p46 ENGINE = InnoDB,
 PARTITION p47 ENGINE = InnoDB,
 PARTITION p48 ENGINE = InnoDB,
 PARTITION p49 ENGINE = InnoDB,
 PARTITION p5 ENGINE = InnoDB,
 PARTITION p50 ENGINE = InnoDB,
 PARTITION p51 ENGINE = InnoDB,
 PARTITION p52 ENGINE = InnoDB,
 PARTITION p53 ENGINE = InnoDB,
 PARTITION p54 ENGINE = InnoDB,
 PARTITION p55 ENGINE = InnoDB,
 PARTITION p56 ENGINE = InnoDB,
 PARTITION p57 ENGINE = InnoDB,
 PARTITION p58 ENGINE = InnoDB,
 PARTITION p59 ENGINE = InnoDB,
 PARTITION p6 ENGINE = InnoDB,
 PARTITION p60 ENGINE = InnoDB,
 PARTITION p61 ENGINE = InnoDB,
 PARTITION p62 ENGINE = InnoDB,
 PARTITION p63 ENGINE = InnoDB,
 PARTITION p64 ENGINE = InnoDB,
 PARTITION p65 ENGINE = InnoDB,
 PARTITION p66 ENGINE = InnoDB,
 PARTITION p67 ENGINE = InnoDB,
 PARTITION p68 ENGINE = InnoDB,
 PARTITION p69 ENGINE = InnoDB,
 PARTITION p7 ENGINE = InnoDB,
 PARTITION p70 ENGINE = InnoDB,
 PARTITION p71 ENGINE = InnoDB,
 PARTITION p72 ENGINE = InnoDB,
 PARTITION p73 ENGINE = InnoDB,
 PARTITION p74 ENGINE = InnoDB,
 PARTITION p75 ENGINE = InnoDB,
 PARTITION p76 ENGINE = InnoDB,
 PARTITION p77 ENGINE = InnoDB,
 PARTITION p78 ENGINE = InnoDB,
 PARTITION p79 ENGINE = InnoDB,
 PARTITION p8 ENGINE = InnoDB,
 PARTITION p80 ENGINE = InnoDB,
 PARTITION p81 ENGINE = InnoDB,
 PARTITION p82 ENGINE = InnoDB,
 PARTITION p83 ENGINE = InnoDB,
 PARTITION p84 ENGINE = InnoDB,
 PARTITION p85 ENGINE = InnoDB,
 PARTITION p86 ENGINE = InnoDB,
 PARTITION p87 ENGINE = InnoDB,
 PARTITION p88 ENGINE = InnoDB,
 PARTITION p89 ENGINE = InnoDB,
 PARTITION p9 ENGINE = InnoDB,
 PARTITION p90 ENGINE = InnoDB,
 PARTITION p91 ENGINE = InnoDB,
 PARTITION p92 ENGINE = InnoDB,
 PARTITION p93 ENGINE = InnoDB,
 PARTITION p94 ENGINE = InnoDB,
 PARTITION p95 ENGINE = InnoDB,
 PARTITION p96 ENGINE = InnoDB,
 PARTITION p97 ENGINE = InnoDB,
 PARTITION p98 ENGINE = InnoDB,
 PARTITION p99 ENGINE = InnoDB) */;

select cid, type, uid, upid, ctime from userid_10781986 where upid in ('571359603338510336') and cid = 10781986;

table data:

+----------+------+--------------------+--------------------+---------------------+
| cid      | TYPE | uid                | upid               | ctime               |
+----------+------+--------------------+--------------------+---------------------+
| 10781986 |    1 | 02354872173914     | 752306765017854208 | 2021-11-03 23:39:15 |
| 10781986 |    1 | 06246146143150     | 762359303628893952 | 2021-11-17 20:31:51 |
| 10781986 |    1 | 10000000000266000  | 417975669989857664 | 2020-07-30 16:43:03 |
| 10781986 |    1 | 10000000000266079  | 417975702999030144 | 2020-07-30 16:43:07 |
| 10781986 |    1 | 10000000000266179  | 417976246656326016 | 2020-07-30 16:44:11 |
| 10781986 |    1 | 100000000010569062 | 681838893649169408 | 2021-07-29 18:12:10 |
| 10781986 |    1 | 100000000010602462 | 681868074344844289 | 2021-07-29 19:10:09 |
| 10781986 |    1 | 100000000010702418 | 681843515042368512 | 2021-07-29 18:21:21 |
| 10781986 |    1 | 100000000010702468 | 681867053677743104 | 2021-07-29 19:08:07 |
| 10781986 |    1 | 100000000011169062 | 681838898564893696 | 2021-07-29 18:12:11 |
+----------+------+--------------------+--------------------+---------------------+

I use sysbench to do the benchmark test. 

 ./sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=sbtest --mysql-password=Gauss_234 --mysql-db=test --events=0 --time=600   --threads=200 --percentile=95 --report-interval=1 upid run

but other tools is ok, for example mysqlslap.
[7 Dec 2021 2:18] raolh rao
Single thread query's response time is also degradation, you can check it. our test machine has 16 cpu cores, and this case is a very simple range query. so It is not a very corner use case I think.
[7 Dec 2021 13:13] MySQL Verification Team
Hi Mr. rao,

We need data in the format of mysqldump, which means with INSERTs .....

Also, we do not have that `upid` script with our sysbench tool, so it would be very much necessary to have it as well.

We are waiting on your feedback.
[8 Dec 2021 1:43] raolh rao
sysbench test script

Attachment: upid.lua (application/octet-stream, text), 462.93 KiB.

[8 Dec 2021 1:44] raolh rao
create table script

Attachment: userid_10781986.sql (application/octet-stream, text), 6.97 KiB.

[8 Dec 2021 13:25] MySQL Verification Team
Hi Mr. rao,

Thank you very much for your feedback.

We were able to repeat the behaviour.

Verified as performance improvement bug.