Bug #29443 cost calculations are off
Submitted: 29 Jun 2007 13:28 Modified: 14 Oct 2010 12:30
Reporter: Matthew Lord Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.19 OS:Any
Assigned to: CPU Architecture:Any
Tags: bfsm_2007_07_19, bfsm_2007_07_19, bfsm_2007_10_18, bfsm_2007_10_18, cost, cost, explain, explain, Optimizer, Optimizer, partitioning, partitioning

[29 Jun 2007 13:28] Matthew Lord
Description:
The optimizer is choosing what it estimates to be the best execution path but
when using force index on a different index it's exponentially faster.

Perhaps there's an improvement we can make in the cost calculation in this case?

How to repeat:
source tables.txt

source pages.txt

source session_summary.txt

source query-force.txt

source query.txt
[24 Jul 2007 12:53] Sergey Petrunya
My server version: 5.1-bk, compiled with BUILD/compile-pentium-debug-max on a 32-bit Linux.

Tip changeset: 
ChangeSet@1.2553, 2007-07-22 01:58:37+05:00, gshchepa@gleb.loc +2 -0
  Merge gleb.loc:/home/uchum/work/bk/5.0-opt
  into  gleb.loc:/home/uchum/work/bk/5.1-opt
  MERGE: 1.1810.2871.37

Extra arguments used when loading data: --innodb-buffer-pool-size=1G --innodb-additional-mem-pool-size=160M 

Extra arguments used when running queries: none.
[24 Jul 2007 12:56] Sergey Petrunya
Setting status to Can't repeat. Please set back to verified when you have the steps needed to repeat. 

(If you observe the effect on your instance of loaded data, it is an option that you make available somewhere on the build/support machines and we debug it there)
[25 Jul 2007 10:46] Sergey Petrunya
Failed to repeat with the provided my.cnf used with 5.1.20-beta-log official community binary.

(Both queries became faster but have the same speed, EXPLAINs are the same)
[27 Jul 2007 10:55] Sergey Petrunya
Okay I have managed to repeat. The trick was to do 

SET time_zone='-4:00'
[30 Jul 2007 14:12] Sergey Petrunya
Analysis:
The problem is caused by those lines in  best_extension_by_limited_search():

        if (join->sort_by_table &&
            join->sort_by_table !=
            join->positions[join->const_tables].table->table)
          /* We have to make a temp table */
          current_read_time+= current_record_count;

This "if" is taken for the "fast" join order but is not taken for the "slow" join order. Added cost leads the optimizer to conclude that the "fast" join order is more expensive than "slow". 

In fact, both fast and slow join orders use a temporary table so the if-branch should be either taken, for both of them, or not taken for either of them.

The value of join->sort_by_table is produced by get_sort_by_table(). Its return value is interpreted as if it was possible to avoid use of temporary table in the "slow" join order.