Bug #28338 Optimizer doesn't choose the correct plan on a 3 table join
Submitted: 9 May 2007 18:50 Modified: 18 Jan 2016 9:58
Reporter: Morgan Tocker Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0.42-BK, 5.0.37 OS:Any
Assigned to: CPU Architecture:Any
Tags: cost model

[9 May 2007 18:50] Morgan Tocker
Description:
Some testing on a few different hosts hows that the optimizer chooses a plan that takes longer to execute.  The workaround is to use a straight join.

With 1000 repetitions (no concurrency):

Straight join execution time:
Avg 0.10352
Min 0.1
Max 0.13
Median 0.1

Optimizer's default path execution time:
Avg 0.17616
Min 0.17
Max 0.22
Median 0.18

(On customer test the straight join was almost 3 times faster).

How to repeat:
Testcase attached.
[9 May 2007 18:56] Morgan Tocker
Verified in 5.0.37, 4.0.14.
[14 May 2007 15:40] Igor Babaev
The current MySQL cost mode does not allow to make a proper choice.
To be fixed in 6.0+
[14 May 2007 15:41] Igor Babaev
The current MySQL cost model does not allow to make a proper choice.
To be fixed in 6.0+
[22 Dec 2008 21:10] Sergey Petrunya
The problem repeats when I remove ORDER BY/GROUP BY from the query (hence ORDER BY/GROUP BY optimization doesn't seem to be involved).
[23 Dec 2008 23:35] Sergey Petrunya
With FORCE INDEX
================
Query time 0.26 sec
#, table, time spent inside handler:

experiment  60,662,522
vial        84,740,964
isotope     13,582,829

Cost formulas
~~~~~~~~~~~~~
TABLE, COST FORMULA, COST
experiment, scan_time(), 123.72
vial, (records=14616)*(rec_per_key=1), 14616
isotope, (records=14616)*(rec_per_key=1), 14616
<output cardinality penalty>, (records=14616)/(TIME_FOR_COMPARE=5), 2923.2
TOTAL,, 32278

Cost breakdown
~~~~~~~~~~~~~~
TABLE, COST FRACTION, RUNTIME FRACTION   
experiment	0.004	0.382
vial		0.498	0.533
isotope		0.498	0.085

Without FORCE INDEX
===================
Query time 0.30 sec

isotope        114,279 
vial        17,687,968
experiment 267,008,092

Cost formulas
~~~~~~~~~~~~~
TABLE, COST FORMULA, COST
isotope,  scan_time(),  2.03
vial,     (records=7)*min((rec_per_key=144), s->worst_seeks=60), 420
experiment, (records=1008)*min((rec_per_key=14), s->worst_seeks=64), 14112
<output cardinality penalty>, (records=14112)/(TIME_FOR_COMPARE=5), 2822
TOTAL,, 17356

Cost breakdown
~~~~~~~~~~~~~~
TABLE, COST FRACTION, RUNTIME FRACTION   
isotope		0.000	0.000
vial		0.029	0.062
experiment	0.971	0.937

Analysis
========
Cost breakdown for query without FORCE INDEX shows good agreement between
tables' fractions of cost and execution time. 

For query with FORCE index we can see that table 'isotope' has 49% of cost
but only 8.5% of actual execution. The table has 7 records so it seems that
the cost is grossly overestimated.
The `experiment` table took 38% of runtime and 4% of cost. This looks like
an underestimate.
[23 Dec 2008 23:38] Sergey Petrunya
===Preliminary conclusion===
The primary cause of the problem is that the join optimizer can grossly overestimate cost of ref access to a table that only has a few records.
[23 Dec 2008 23:39] Sergey Petrunya
Profiling was performed using this dtrace script:

#!/usr/sbin/dtrace -s 

mysql$target:mysqld:*:select_start
{
  self->do_trace= 1;
}

pid$target:mysqld:ha_myisam*open*:entry
{
  printf("%d -> %s", arg0, copyinstr(arg1));
  names[arg0]= copyinstr(arg1);
}

pid$target:mysqld:ha_myisam*:entry
/!self->ts && self->do_trace/
{ 
  /* printf("argument: %p\n", arg0); */
  self->ts= timestamp;
  self->thisptr= arg0;
  /* self->thisptr= names[arg0]; */
}

pid$target:mysqld:ha_myisam*:return 
/self->ts/
{
   @time[self->thisptr]= sum(timestamp - self->ts);
   self->ts= 0;
}
[18 Jan 2016 9:58] Øystein Grøvlen
I tried the test case with MySQL 5.7, and the query without hint now gives slightly better performance than with the hint:

mysql> select left(digest_text, 40), count_star, sys.format_time(avg_timer_wait) from performance_schema.events_statements_summary_by_digest where digest_text like '%isotope%';
+------------------------------------------+------------+---------------------------------+
| left(digest_text, 40)                    | count_star | sys.format_time(avg_timer_wait) |
+------------------------------------------+------------+---------------------------------+
| SELECT STRAIGHT_JOIN ( SUM ( `amtSol` )  |         20 | 31.74 ms                        |
| SELECT ( SUM ( `amtSol` ) + SUM ( `amtLi |         20 | 28.55 ms                        |
+------------------------------------------+------------+---------------------------------+
2 rows in set (0,00 sec)

Closing the bug since it can no longer be reproduced.