Bug #28338 Optimizer doesn't choose the correct plan on a 3 table join
Submitted: 9 May 2007 20:50 Modified: 16 Oct 2008 16:58
Reporter: Morgan Tocker
Status: Verified
Category:Server: Optimizer Severity:S2 (Serious)
Version:5.0.42-BK, 5.0.37 OS:Any
Assigned to: Target Version:
Tags: cost model
Triage: Triaged: D3 (Medium) / R5 (Severe) / E5 (Major)

[9 May 2007 20:50] Morgan Tocker
Description:
Some testing on a few different hosts hows that the optimizer chooses a plan that takes
longer to execute.  The workaround is to use a straight join.

With 1000 repetitions (no concurrency):

Straight join execution time:
Avg 0.10352
Min 0.1
Max 0.13
Median 0.1

Optimizer's default path execution time:
Avg 0.17616
Min 0.17
Max 0.22
Median 0.18

(On customer test the straight join was almost 3 times faster).

How to repeat:
Testcase attached.
[9 May 2007 20:56] Morgan Tocker
Verified in 5.0.37, 4.0.14.
[14 May 2007 17:40] Igor Babaev
The current MySQL cost mode does not allow to make a proper choice.
To be fixed in 6.0+
[14 May 2007 17:41] Igor Babaev
The current MySQL cost model does not allow to make a proper choice.
To be fixed in 6.0+
[22 Dec 2008 22:10] Sergey Petrunya
The problem repeats when I remove ORDER BY/GROUP BY from the query (hence ORDER BY/GROUP
BY optimization doesn't seem to be involved).
[24 Dec 2008 0:35] Sergey Petrunya
With FORCE INDEX
================
Query time 0.26 sec
#, table, time spent inside handler:

experiment  60,662,522
vial        84,740,964
isotope     13,582,829

Cost formulas
~~~~~~~~~~~~~
TABLE, COST FORMULA, COST
experiment, scan_time(), 123.72
vial, (records=14616)*(rec_per_key=1), 14616
isotope, (records=14616)*(rec_per_key=1), 14616
<output cardinality penalty>, (records=14616)/(TIME_FOR_COMPARE=5), 2923.2
TOTAL,, 32278

Cost breakdown
~~~~~~~~~~~~~~
TABLE, COST FRACTION, RUNTIME FRACTION   
experiment	0.004	0.382
vial		0.498	0.533
isotope		0.498	0.085

Without FORCE INDEX
===================
Query time 0.30 sec

isotope        114,279 
vial        17,687,968
experiment 267,008,092

Cost formulas
~~~~~~~~~~~~~
TABLE, COST FORMULA, COST
isotope,  scan_time(),  2.03
vial,     (records=7)*min((rec_per_key=144), s->worst_seeks=60), 420
experiment, (records=1008)*min((rec_per_key=14), s->worst_seeks=64), 14112
<output cardinality penalty>, (records=14112)/(TIME_FOR_COMPARE=5), 2822
TOTAL,, 17356

Cost breakdown
~~~~~~~~~~~~~~
TABLE, COST FRACTION, RUNTIME FRACTION   
isotope		0.000	0.000
vial		0.029	0.062
experiment	0.971	0.937

Analysis
========
Cost breakdown for query without FORCE INDEX shows good agreement between
tables' fractions of cost and execution time. 

For query with FORCE index we can see that table 'isotope' has 49% of cost
but only 8.5% of actual execution. The table has 7 records so it seems that
the cost is grossly overestimated.
The `experiment` table took 38% of runtime and 4% of cost. This looks like
an underestimate.
[24 Dec 2008 0:38] Sergey Petrunya
===Preliminary conclusion===
The primary cause of the problem is that the join optimizer can grossly overestimate cost
of ref access to a table that only has a few records.
[24 Dec 2008 0:39] Sergey Petrunya
Profiling was performed using this dtrace script:

#!/usr/sbin/dtrace -s 

mysql$target:mysqld:*:select_start
{
  self->do_trace= 1;
}

pid$target:mysqld:ha_myisam*open*:entry
{
  printf("%d -> %s", arg0, copyinstr(arg1));
  names[arg0]= copyinstr(arg1);
}

pid$target:mysqld:ha_myisam*:entry
/!self->ts && self->do_trace/
{ 
  /* printf("argument: %p\n", arg0); */
  self->ts= timestamp;
  self->thisptr= arg0;
  /* self->thisptr= names[arg0]; */
}

pid$target:mysqld:ha_myisam*:return 
/self->ts/
{
   @time[self->thisptr]= sum(timestamp - self->ts);
   self->ts= 0;
}