Bug #58225 Extreme performance degradation with joins
Submitted: 16 Nov 2010 11:21 Modified: 27 Jul 2011 17:07
Reporter: Udo Schwedt Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.49, 5.1.54, 5.5.7 OS:Microsoft Windows (Server 2003)
Assigned to: CPU Architecture:Any
Tags: regression
Triage: Triaged: D2 (Serious) / R4 (High) / E4 (High)

[16 Nov 2010 11:21] Udo Schwedt
Description:
During preparation for upgrading our application from 5.0.30SP1 to 5.1.49 when encountered a serious performance problem with certain JOINs in InnoDB tables. 
While the statement takes less than 4 seconds with MySQL 5.0.30 SP1, it needs more than 1,300 seconds with MySQL 5.1.49.
We were able to reproduce this problem; we will attach a dump and the respective statement.
We found that adding a "straight join" hint leads to a faster execution (< 1 second with both versions), but we are not sure whether this is a feasible workaround (see private comment).

How to repeat:
Import the attached dump into clean 5.0.30SP1. Execute the attached statement => quick execution. 
Import the attached dump into clean 5.1.49. Execute the attached statement => slow execution. 

Suggested fix:
Change execution plan of 5.1.x to get similar response time as in 5.0.30SP1 for joins similar to the above mentioned one.
[16 Nov 2010 11:36] Valeriy Kravchuk
Looks like your dump is not complete:

openxs@ubuntu:/home2/openxs/dbs/5.1$ bin/mysql --no-defaults -uroot mww < ~/Desktop/DemoSQL.sql 
ERROR 1146 (42S02) at line 1: Table 'mww.POSTENFOLGE' doesn't exist

Please, send the results of EXPLAIN for the query with and without STRAIGHT_JOIN hint in 5.1.x.
[16 Nov 2010 13:59] Miguel Solorzano
DemoDump.sql it was imported first and demosql.sql gives the message of missed table. So DemoDump.sql attached is incomplete. Thanks in advance.
[16 Nov 2010 22:47] Sveta Smirnova
Thank you for the feedback.

I still get same error:

$time ./bin/mysql -uroot -S /tmp/mysql_ssmirnova.sock test <~/blade12/src/bugs/bug58225/DemoSQL_fixed.sql 
ERROR 1146 (42S02) at line 4: Table 'test.POSTENFOLGE' doesn't exist
[18 Nov 2010 17:17] Valeriy Kravchuk
I was able to repeat this on current mysql-5.1 tree (5.1.54) using the scripts uploaded in private comments. But as the following workaround helps:

macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.1.54-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> source ~/Downloads/DemoSQL_fixed.sql
^CCtrl-C -- sending "KILL QUERY 12" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100): Query execution was interrupted
mysql> set session optimizer_search_depth=0;
Query OK, 0 rows affected (0.00 sec)

mysql> source ~/Downloads/DemoSQL_fixed.sql
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| SUM(
		(
			(
				(
					(
						bestellpostenT0.ekPreis * wareneingangpostenT0.menge
					) * 
					(
						1-
							(
								IFNULL(
										bestellpostenT1.rabatt, 0							
									) /100 
							)
					)
				) + 
				(
					(
						bestellposte | X_abteilung |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
|                                                                                                                                                                                                                                   472.6250000000000000000000000 | 11          |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
1 row in set (7.70 sec)

I think this is a duplicate of/closely related to Bug #41740. I let developers decide is it really a duplicate.
[18 Nov 2010 18:32] Valeriy Kravchuk
5.5 is also affected:

macbook-pro:5.5 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.7-rc-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> source ~/Downloads/DemoSQL_fixed.sql
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| SUM(
		(
			(
				(
					(
						bestellpostenT0.ekPreis * wareneingangpostenT0.menge
					) * 
					(
						1-
							(
								IFNULL(
										bestellpostenT1.rabatt, 0							
									) /100 
							)
					)
				) + 
				(
					(
						bestellposten | X_abteilung |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
|                                                                                                                                                                                                                                    472.6250000000000000000000000 | 11          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
1 row in set (55 min 38.99 sec)
[16 Jan 2011 20:35] Ole John Aske
The proposed fix for bug#59326 seems to fix, or at least improve, the situation for this bug alot. It is 'fixed' in the sense that the query plan converges against a more optimal QEP when the 'optimizer_search_depth' is increased. The cost for doing the optimization with a big 'optimizer_search_depth' is also significantly reduced - However, this can still be huge.

I ran a few test with and without this fix with different 
optimizer_searc_depth: With the fix we converged into an
execution time of ~0.01s with any 'optimizer_search_depth > 2'
Without the fix we converged into a more *unoptimal* query
plan executing in ~1.7s with 'optimizer_search_depth > 2'.
Furthermore the optimization itself took an exponential longer
time.:
.........................................

                              w/ fix                wo/ fix
optimizer_search_depth  optimize  opt+exec    optimize:  opt+exec
1                        0.01s     0.02s         0.01s    0.02s
2                        0.01s     1.50s         0.02s    1.47s
4                        0.11s     0.13s         0.10s    1.84s
6                        0.69s     0.70s         1.73s    3.47s
8                        0.97s     0.97s        21.89s   23.06s
10                       6.69s     6.75s      3m02.37s  3m04.04s
11             	        18.46s	  18.10s(!)      ~OO      ~OO
[17 Jan 2011 23:11] Ole John Aske
See updated comments on bug#59326!
[1 Feb 2011 9:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130112

3552 Ole John Aske	2011-02-01
      Another addendum fix related to bug#59326, and particularly 
      its relatives/duplicates bug#41740 and bug#58225.
      
      Fixes the problem that complex queries containing a lot of EQ_REF'ed 
      joined tables takes 'forever' to produce a query plan.
      
      
      The fix depends on the fact that when a table is
      joined by an unique key there is a 1::1 relation
      between the rows being joined. Assuming we have multiple
      such 1::1 (star-)joined relations in a sequence, 
      without other join types inbetween. Then all of 
      these 'eq_ref-joins' will be calculated to return the excact 
      same #rows and having identical 'cost' (or 'read_time').
      
      This leads to that we can append such a contigous sequence
      of eq_ref-joins to a partial plan in any order without 
      affecting the total cost of the query plan. Exploring the
      different permutations of these eq_refs in the 'greedy' 
      optimizations will simply be a waste of precious CPU cycles.
      
      This chain of eq_refs can further be handled as a single
      entity wrt. the full 'greedy' exploration of the possible
      join plans. This will reduce the 'N' in the O(N!) complexity
      of the full greedy search.
      
      The function eq_ref_extension_by_limited_search() has been 
      implemented doing as described above. Once an EQ_REF joined 
      table is encountered, it is invoked to append any remaining 
      EQ_REFs to the join plan. This chain of EQ_REFs will then 
      act like a single entity in the further exploration of possible
      join plans.
      
      It should be noted that best_extension_by_limited_search()
      will still explore the other plans starting with non-EQ_REF 
      operations as possibly better alternatives to the EQ_REF-extended 
      query plan.
      
      Usage of eq_ref_extension_by_limited_search() is 
      controlled by 'optimizer_prune_level=1'. 
      
      This fix also introduce the function plan_is_complete() which
      contains common functionality between best_extension_by_limited_search()
      and eq_ref_extension_by_limited_search()
      
      greedy_optimizer.test has been extensively extended with new testcases 
      containing lots of EQ_REF's.
      
      No existing test/query plans are expected to change as result of this fix.
      It should only produce the 'best' plan (a lot!) faster. - Like the 
      testcase for bug#58225 which now completes the QEP in ~0.05s in my
      DEBUG compiled sandbox (With 'optimizer_search_depth= 62'). This
      previously took  63min+42.437s :-o
[1 Feb 2011 10:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130114

3553 Ole John Aske	2011-02-01
      Last addendum fix related to bug#59326, and particularly
      its relatives/duplicates bug#41740 and bug#58225. 
      
      If a filesort is required by the final result, we can include
      the cost for the filesort in the pruning of 'bad' plans, and
      possible prune the plan at an earlier stage than today.
      
      NOTE1: The 'sort_cost' is calculated with logic similar
             to the one in plan_is_complete() which adds the sort_cost
             to the best plan.
      
      NOTE2: The 'sort_cost' is calculated as '= current_record_count'.
             This is not entirely correct when used in the pruning logic
             as we do not yet #rows in the *final* result which is what 
             we have to sort. However, it should be a safe assumption that
             'final #rows' >= 'intermediat #rows'. So this prune-calculation
             should always be on the safe side.
      
      NOTE3: Calculating 'sort_cost= current_record_count' is a bad
             estimate as most sort algorithms are O(n*log(n)).
      
      This fix reduce the query compile time for the infamous bug#58225 problem
      further from 50ms to ~20ms
[1 Feb 2011 13:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130125

3584 Ole John Aske	2011-02-01
      Collection of all fixes for bug#59326 and its relatives/duplicates bug#41740 and bug#58225.
      
      Commited to the branch 'mysql-trunk-opt-bug59326' on request from QA.
[2 Feb 2011 7:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130178

3587 Ole John Aske	2011-02-02
      Collection of all fixes for bug#59326 and its relatives/duplicates bug#41740 and bug#58225.
      
      Commited to the branch 'mysql-trunk-opt-bug59326' on request from QA.
[27 Jul 2011 17:07] Paul Dubois
Noted in 5.6.3 changelog.

For queries with many eq_ref joins, the optimizer took excessive time
to develop an execution plan.