Bug #75695 Use of worst_seeks in find_best_ref() can lead to wrong query plan
Submitted: 30 Jan 2015 8:36 Modified: 17 Apr 2015 19:53
Reporter: Olav Sandstå Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.5 OS:Any
Assigned to: CPU Architecture:Any

[30 Jan 2015 8:36] Olav Sandstå
Description:
There is a bug in how the optimizer uses the worst_seeks variable when
limiting the cost of a key lookup. The variable worst_seeks is
initialized to have a _cost estimate_ but it is used as a _block
estimate_ when calculating the cost of key lookup in
Optimize_table_order::find_best_ref(). This does not have any impact
with the default cost constant values since the cost of a block access
is 1.0 but if we change the cost of block access this can impact the
cost of doing ref access versus other access methods.

Example: For a query like this:

  SELECT i2 FROM t1 WHERE i1 = 1 AND i2 = 1;

if we have an index on i1, this query can be executed using ref access. But if we double the values for all cost constants, due to the error in use of worst_seeks, the cost for ref access is more than doubled and can cause this query to be executed as a table scan (see details in the how-to-repeat section).

How to repeat:
Repro case:

CREATE TABLE t1 (
  pk INTEGER PRIMARY KEY,
  i1 INTEGER,
  i2 INTEGER,
  KEY(i1)
) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1, 1, 1), (2, 1, 1),(3, 1, 1),(4, 1, 1),
                      (5, 2, 1), (6, 2, 1),(7, 2, 1),(8, 2, 1),
                      (9, 3, 1), (10, 3, 1),(11, 3, 1),(12, 3, 1);

ANALYZE TABLE t1;

# Verify that this is executed using ref access
EXPLAIN FORMAT=JSON SELECT i2 FROM t1 WHERE i1 = 1 AND i2 = 1;

# Multiply all relevant constants for the server by two:

UPDATE mysql.server_cost
SET cost_value=0.4
WHERE cost_name="row_evaluate_cost";

UPDATE mysql.engine_cost
SET cost_value=2.0
WHERE cost_name="io_block_read_cost";

FLUSH OPTIMIZER_COSTS;

# In a new connection, re-run the same query.
# This should still be executes as ref-access and the query cost should
# be two times as high as previous. But the query cost is increased by more
# and this has caused the query plan to change to use table scan.
EXPLAIN FORMAT=JSON SELECT i2 FROM t1 WHERE i1 = 1 AND i2 = 1;

Suggested fix:
Change the code in Optimize_table_order::find_best_ref() to use worst_seeks as a cost estimate instead of a block estimate.
[17 Apr 2015 19:53] Paul DuBois
Noted in 5.7.8, 5.8.0 changelogs.

Nonoptimal cost estimates for key lookups could cause some queries to
be executed with a table scan rather than key lookups.