Bug #111225 Inaccurate execution plan generation for primary key range query
Submitted: 31 May 2023 13:23 Modified: 6 Jun 2023 3:42
Reporter: shang canfang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:MySQL8.0.18,MySQL8.0.28 OS:Linux
Assigned to: CPU Architecture:Any
Tags: optimizer; execution plan;

[31 May 2023 13:23] shang canfang
Description:
In a simple case where the index is relatively small and the row length is long, for a simple query 'select id from sbtest1 where id >= 20 order by id limit 1', if the cost of scanning the index with a full scan is smaller than that of a range scan, the execution plan will skip the range and use index+primary instead.

such as:
mysql> explain select id from sbtest1 where id >= 20 order by id limit 1;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | sbtest1 | NULL       | index | PRIMARY,k_1   | PRIMARY | 4       | NULL |    3 |    98.99 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

the type is "index",the storage will scan from id=1 util id = 20.

in opt_range.cc:
func test_quick_select
 ```
/* Calculate cost of full index read for the shortest covering index */
    if (!head->covering_keys.is_clear_all())
    {
      int key_for_use= find_shortest_key(head, &head->covering_keys);
      Cost_estimate key_read_time=
        param.table->file->index_scan_cost(key_for_use, 1,
                                           static_cast<double>(records));
      key_read_time.add_cpu(cost_model->row_evaluate_cost(
        static_cast<double>(records)));

      bool chosen= false;
      if (key_read_time < cost_est)
      {
        cost_est= key_read_time;
        chosen= true;
      }

      Opt_trace_object trace_cov(trace,
                                 "best_covering_index_scan",
                                 Opt_trace_context::RANGE_OPTIMIZER);
      trace_cov.add_utf8("index", head->key_info[key_for_use].name).
        add("cost", key_read_time).add("chosen", chosen);
      if (!chosen)
        trace_cov.add_alnum("cause", "cost");
    }

    TABLE_READ_PLAN *best_trp= NULL;
    TRP_GROUP_MIN_MAX *group_trp;
    Cost_estimate best_cost= cost_est;
```
here, the cost of scanning the index with a full table scan is used as the baseline,then calculate the range scan cost,if range cost is larger than baseline,will not choose range plan

```
        /* Get best 'range' plan and prepare data for making other plans */
        if ((range_trp= get_key_scans_params(&param, tree, FALSE, TRUE,
                                             &best_cost)))
        {
          best_trp= range_trp;
          best_cost= best_trp->cost_est;
        }
```

so best_trp is null. finally use scan with no range for primary

How to repeat:
CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` varchar(15380) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
  `pad` char(60) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
);

construct 1000 rows data

then execute 'explain select id from sbtest1 where id >= 20 order by id limit 1;'

Suggested fix:
    Cost_estimate best_cost= cost_est;  
only when the range condition is not the primary key, the index cost is used as the baseline for comparison
[31 May 2023 13:32] MySQL Verification Team
HI Mr. canfang,

Thank you for your bug report.

However, 8.0.18 is a very old release of 8.0. 

So, please first try 8.0.33, our latest release.

Second, please provide us with. a full test case. That means a CREATE TABLE, INSERT with all data and a fully query.

Next, please explain, in great detail, why do you think that searching by primary key is wrong !!!!!

When you do all of this with 8.0.33, with all info supplied as above, create a new bug report. Please, do include all the info that we asked above.

Unsupported.
[31 May 2023 13:35] MySQL Verification Team
Hi Mr. canfang,

Actually, the explanation for scanning is very simple.

You have ORDER BY and it is MUCH faster to read by indexed column then to create, insert into a temporary table and then read from it.

This is very well known fact.

Hence, no need to create new report, since the method used is faster.

If you want to use another method, you can always use Optimiser hints, as described in our Reference Manual.

Not a bug.
[6 Jun 2023 3:42] shang canfang
Sorry, I didn't express myself clearly. You didn't understand what I meant. But Innodb does not have much impact, it only has the issue of type index when the condition value is very small, which does not affect the time consumption. But Rocksdb doesn't work, so I mentioned the bug to MyRocks, although I think changing the SQL optimizer would be more appropriate. https://github.com/facebook/mysql-5.6/issues/1315
[6 Jun 2023 12:35] MySQL Verification Team
Hi Mr. Canfang,

We so not know what is RocksDB and we do not support it.

What we wrote in our previous comments is very much true for our server. Hence, read them more carefully, please ......

Not a bug.