| Bug #115379 | ha_innobase::read_time , primary key, rows <=2, return rows | ||
|---|---|---|---|
| Submitted: | 19 Jun 2024 7:00 | Modified: | 19 Jun 2024 9:53 |
| Reporter: | XIAOJING LI | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) |
| Version: | 8.0.18 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | cost model, sql optimizer | ||
[19 Jun 2024 9:53]
MySQL Verification Team
Hi Mr. Li, Thank you for your bug report. The code that you quote is actually made based on the empirical measurements and experiments. These are all internal researches and not available to the public. Besides empirical measurements, that were done by our company on all of our storage engines, post-graduate studies on the Optimiser also teach the same. Simply, reading blocks of disk segments and i-nodes sequentially is always much faster then doing lot's of seeks and jumping around the index. Also, do not forget that InnoDB's primary key is actually an embedded key. Not a bug.

Description: /** Calculate the time it takes to read a set of ranges through an index This enables us to optimise reads for clustered indexes. @return estimated time measured in disk seeks */ double ha_innobase::read_time( uint index, /*!< in: key number */ uint ranges, /*!< in: how many ranges */ ha_rows rows) /*!< in: estimated number of rows in the ranges */ { ha_rows total_rows; if (index != table->s->primary_key) { /* Not clustered */ return (handler::read_time(index, ranges, rows)); } if (rows <= 2) { return ((double)rows); } /* Assume that the read time is proportional to the scan time for all rows + at most one seek per range. */ double time_for_scan = scan_time(); if ((total_rows = estimate_rows_upper_bound()) < rows) { return (time_for_scan); } return (ranges + (double)rows / (double)total_rows * time_for_scan); } when the key is primary, and rows<=2 , why the cost is equal when ranges=2 and ranges=200000 in multi-range senarios ? not reasonable if (rows <= 2) { return ((double)rows); } when the rows is very small, it maybe seek ranges times , not rows times. How to repeat: only a question Suggested fix: if (rows <= 2) { return ((double)rows); } ==> if (rows <= 2) { return ((double)ranges); }