MySQL Bugs: #115379: ha_innobase::read_time , primary key, rows <=2, return rows

Bug #115379	ha_innobase::read_time , primary key, rows <=2, return rows
Submitted:	19 Jun 2024 7:00	Modified:	19 Jun 2024 9:53
Reporter:	XIAOJING LI	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	8.0.18	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	cost model, sql optimizer

Description:
/** Calculate the time it takes to read a set of ranges through an index
 This enables us to optimise reads for clustered indexes.
 @return estimated time measured in disk seeks */

double ha_innobase::read_time(
    uint index,   /*!< in: key number */
    uint ranges,  /*!< in: how many ranges */
    ha_rows rows) /*!< in: estimated number of rows in the ranges */
{
  ha_rows total_rows;

  if (index != table->s->primary_key) {
    /* Not clustered */
    return (handler::read_time(index, ranges, rows));
  }

  if (rows <= 2) {
    return ((double)rows);
  }

  /* Assume that the read time is proportional to the scan time for all
  rows + at most one seek per range. */

  double time_for_scan = scan_time();

  if ((total_rows = estimate_rows_upper_bound()) < rows) {
    return (time_for_scan);
  }

  return (ranges + (double)rows / (double)total_rows * time_for_scan);
}

when the key is primary, and rows<=2 , why the cost is equal when
ranges=2 and ranges=200000 in multi-range senarios ? not reasonable

  if (rows <= 2) {
    return ((double)rows);
  }

when the rows is very small, it maybe seek ranges times , not rows times.

How to repeat:
only a question

Suggested fix:
  if (rows <= 2) {
    return ((double)rows);
  }

==>

  
  if (rows <= 2) {
    return ((double)ranges);
  }

Hi Mr. Li,

Thank you for your bug report.

The code that you quote is actually made based on the empirical measurements and experiments. These are all internal researches and not available to the public.

Besides empirical measurements, that were done by our company on all of our storage engines, post-graduate studies on the Optimiser also  teach the same.

Simply, reading blocks of disk segments and i-nodes sequentially is always much faster then doing lot's of seeks and jumping around the index.

Also, do not forget that InnoDB's primary key is actually an embedded key.

Not a bug.