Bug #76247 Regression case with semijoin - query with many INs stuck on statistics state
Submitted: 10 Mar 2015 17:33 Modified: 19 Aug 2015 13:16
Reporter: Przemyslaw Malkowski Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.6 5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: semijoin optimizer_prune_level

[10 Mar 2015 17:33] Przemyslaw Malkowski
Description:
A query with many IN clauses and subselects, may run into never-ending "statistics" state consuming 100% of CPU core. This happens when semijoin optimizer switch is enabled. Doing an EXPLAIN is enough to trigger this problem. The same query explain performs immediately on MySQL 5.5, 5.1 and 5.0. Problem exists on 5.6 and 5.7.

The problem is reproducible for a specific data set (which unfortunately cannot be made public) using default optimizer settings. Unfortunately we failed to prepare public testing data, where this would be reproducible, however when the optimizer_prune_level is disabled it is reproducible even on empty tables.

Tested on:
        @@version: 5.1.73, @@version_comment: MySQL Community Server (GPL),
        @@version: 5.5.42-log, @@version_comment: MySQL Community Server (GPL),
        @@version: 5.6.23-log, @@version_comment: MySQL Community Server (GPL),
        @@version: 5.7.5-m15, @@version_comment: MySQL Community Server (GPL).

Using 
set optimizer_switch='semijoin=off';
lets to workaround the problem.

How to repeat:
Please find the test case in attachment due to query and table definitions size.

An example transaction details from show engine innodb status:

---TRANSACTION 0, not started
mysql tables in use 36, locked 0
MySQL thread id 20, OS thread handle 0x7f2d7f3b7700, query id 120 localhost root statistics
EXPLAIN SELECT tbl10_.id as col_0_0_, tbl10_.lock_version as col_1_0_, tbl10_.name as col_2_0_, substring(tbl10_.description, 1, 201) as col_3_0_, tbl10_.actual_close_date as col_4_0_, tbl10_.status as col_5_0_, table21_.id as col_6_0_, table21_.first_name as col_7_0_, table21_.last_name as col_8_0_, table21_.name as col_9_0_, person3_.id as col_10_0_, person3_.first_name as col_11_0_, person3_.last_name as col_12_0_  FROM table1 tbl10_  INNER JOIN table2 table21_ on tbl10_.table2_id=table21_.id  LEFT OUTER JOIN table3 user2_ on tbl10_.owner_id=user2_.id  LEFT OUTER JOIN table2 person3_ on use
[10 Mar 2015 17:36] Przemyslaw Malkowski
test case

Attachment: bug_report_76247.txt (text/plain), 38.84 KiB.

[11 Mar 2015 15:14] Sinisa Milivojevic
I see from your attached test case that you are using InnoDB tables. Are you using persistent stats for the indices on your tables ???

If the answer is "Yes", then your bug is a duplicate of the bug # 70617. The only difference is that your query requires  many, many times more work on the index statistics collection.

In that case, a cause of your bug (and # 70617) is that, In 5.6.13 some changes are made on the persistent statistics estimation, that led to spending lots of time in statistics collection.
[11 Mar 2015 15:34] Przemyslaw Malkowski
Sinisa, yes, it was left with default. However when I disabled it, the problem still takes place:

mysql> set global innodb_stats_persistent=off;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%pers%';
+--------------------------------------+-------+
| Variable_name                        | Value |
+--------------------------------------+-------+
| innodb_stats_persistent              | OFF   |
| innodb_stats_persistent_sample_pages | 20    |
+--------------------------------------+-------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT tbl10_.id as col_0_0_, tbl10_.lock_version as col_1_0_, tbl....
....

-- second session:

mysql> show processlist\G
*************************** 1. row ***************************
     Id: 32
   User: root
   Host: localhost
     db: prune
Command: Query
   Time: 30
  State: statistics
   Info: EXPLAIN SELECT tbl10_.id as col_0_0_, tbl10_.lock_version as col_1_0_, tbl10_.name as col_2_0_, subs
[11 Mar 2015 15:55] Sinisa Milivojevic
Please, in order to clear up the matter, can you do the test in the following steps:

* set persistent stats to 0 or off
* create tables as per your .sql file
* fill the tables
* run ANALYZE on each of the tables
* run the query and let us know how much time does gathering statistics take

Thanks in advance.
[11 Mar 2015 17:13] Shane Bester
Smaller testcase based on the file uploaded by reporter:

-----
set optimizer_prune_level=0;
set optimizer_search_depth=default;
set optimizer_switch=default;
drop table if exists `t1`,`t2`;
create table `t1` (`id` int primary key) engine=innodb;
create table `t2` (`id` int,`b` int,`d` int,`c` int,primary key (`id`),key (`c`)) engine=innodb;
set @s:=concat(	'explain select 1 from t1 where ',
repeat('(id in(select d from t2 where b<>1 and c=1)) and\n',30),
'(id in(select d from t2 where b<>1 and c=1));');
select @s;
prepare t from @s;
execute t;
set optimizer_switch='semijoin=off';
prepare t from @s;
execute t;
------

Optimize_table_order::best_access_path+0x3d [f:\git\mysql-5.6\sql\sql_planner.cc @ 407]
Optimize_table_order::best_extension_by_limited_search+0x25d [f:\git\mysql-5.6\sql\sql_planner.cc @ 1852]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::best_extension_by_limited_search+0x52b [f:\git\mysql-5.6\sql\sql_planner.cc @ 1988]
Optimize_table_order::greedy_search+0xf6 [f:\git\mysql-5.6\sql\sql_planner.cc @ 1476]
Optimize_table_order::choose_table_order+0x177 [f:\git\mysql-5.6\sql\sql_planner.cc @ 1147]
make_join_statistics+0x13a5 [f:\git\mysql-5.6\sql\sql_optimizer.cc @ 3766]
JOIN::optimize+0x657 [f:\git\mysql-5.6\sql\sql_optimizer.cc @ 364]
mysql_execute_select+0x33 [f:\git\mysql-5.6\sql\sql_select.cc @ 1086]
mysql_explain_unit+0x148 [f:\git\mysql-5.6\sql\opt_explain.cc @ 2134]
explain_query_expression+0x61 [f:\git\mysql-5.6\sql\opt_explain.cc @ 2035]
execute_sqlcom_select+0xe3 [f:\git\mysql-5.6\sql\sql_parse.cc @ 5087]
[11 Mar 2015 18:04] Sinisa Milivojevic
This seems like a bug in semi-join optimization. 

Can you also send us rows for your tables ???
[11 Mar 2015 18:18] Sinisa Milivojevic
A smaller test case was produced, which required no data. So, we do not need any additional info from you.

Even with zero rows in each table, with semijoin=ON, query was stuck for a long, long time, until it was killed.

This is a verified bug now. It seems that there is possibly an eternal loop somewhere in the code ...

Fully verified.
[11 Mar 2015 18:26] Sinisa Milivojevic
Thank you for your contribution !
[11 Mar 2015 19:06] Roy Lyseng
What happens if you reduce the search depth for the join planner?

E.g use this statement before the SELECT statement:

set optimizer_search_depth=8;

Especially with a large number of tables with almost the same number of rows may cause the planner to take "forever".
[11 Mar 2015 19:54] Przemyslaw Malkowski
Sinisa, thank you for confirming this! As I said in the initial report, "even on empty tables", so I had no rows at all in the test case.

Roy, using optimizer_search_depth=8 doesn't seem to change much, I can't wait hours to see if the explain ever ends, also it's against the ecology to let the CPU core burn that long ;) Then using optimizer_search_depth=4 lets the explain to finish in 3 seconds :) The test case is quick and easy enough if you want to play.
[12 Mar 2015 11:34] Roy Lyseng
Thank you for the update, however I do not think this is a bug.

The query has a very poorly written set of subqueries that are really hard for the optimizer to handle efficiently. The query is on the form:

SELECT ...
FROM t1 INNER JOIN t2 ON jc(t1, t2)  LEFT JOIN t3 ON jc(t1,t3)  LEFT JOIN t4 ON jc( t3, t4)
WHERE wc(t1, t2, t3, t4) AND
           t1.id IN (SELECT it.t1_id FROM it WHERE wc1(it)) AND
           t1.id IN (SELECT it.t1_id FROM it WHERE wc2(it)) AND
           ...;

The query contains 32 similar subqueries. Since all of them are against the same table and have similar where conditions, the optimizer will calculate their costs as similar. In order to properly optimize a query with 36 tables, the optimizer either has to use a very low search depth, or use very aggressive pruning of possible plans. Such pruning is impossible when all choices available for the optimizer have the same cost. Thus, we need workarounds for this query to perform decently:

- Use STRAIGHT_JOIN. I think this is quite good, since the tables anyway may be accessed in the lexical order.

- Use OPTIMIZER_SEARCH_DEPTH. This is the recommended strategy when the optimizer uses excessive time and regular pruning is not sufficient.

- Turn off semi-join. This is a good choice here, since all semi-joins anyway would use the FirstMatch strategy, and is the same strategy used when semi-join is disabled.

- Consolidate the subqueries. Notice that all subqueries have the same inner table, the same selected column and the same outer expression. Thus, all subqueries can be rewritten into one by combining the inner conditions with AND. Ie, this query should be equivalent to the one above:

SELECT ...
FROM t1  INNER JOIN t2 ON jc(t1, t2)  LEFT JOIN t3 ON jc(t1,t3)  LEFT JOIN t4 ON jc( t3, t4)
WHERE wc(t1, t2, t3, t4) AND
           t1.id IN (SELECT it.t1_id FROM it WHERE wc1(it) AND wc2(it));
 
This query seems machine generated, so I am not sure how helpful this advice is, though. Long-term we might be able to detect this kind of query pattern and rewrite the query automatically.
[12 Mar 2015 14:08] Sinisa Milivojevic
I still consider this as a bug , since optimizer hangs in statistics gathering, even when there are 0 rows , no rows what so ever, in the tables.

Also, the workarounds are OK, but those are not a solution to the bug.

I am only changing the severity to "Performance".
[12 Mar 2015 14:49] Roy Lyseng
Sinisa, it does not hang, the optimizer is simply evaluating plans. Providing empty input tables is both unrealistic and gives the optimizer the worst possible baseline for providing a plan, since all costs look more or less equal and pruning bad plans becomes impossible.
[22 Jul 2015 13:53] Sinisa Milivojevic
First of all, our research shows that when optimizer_prune_level is 0, what you experience with your particular query is expected behavior.

First, what happens when optimizer_prune_level is set 0. The issue is with subqueries, which are pulledup and converted to semi-joins so that tables in subqueries are added to outer query which participate in join order searching. Our Join order searching is exhaustive it produces all permutations of possible plan before selecting the plan (n!). From the test case we can see there is nearly 30+! plans need to be produced to select the best plan among them when pruning is disabled, hence the delay. If we enable pruning optimization is quite fast !!!

However, you report that you get similar behavior even when prune level is greater then zero. That would be a candidate bug. But, we need to be able to repeat the situation which you describe. Hence, in order to proceed on that plan, we need a test case that would reproduce the bug on every possible run.

In short, we are waiting on your fully reproducible test case in order to proceed.
[19 Aug 2015 12:51] Przemyslaw Malkowski
I am afraid I failed to prepare a reproducible test case with optimizer_prune_level=1. Even if I tried with 10M rows in each table, explain is always fast. Also I don't have access to the original data where it happened.

However I still think this is a regression comparing to 5.5, where with optimizer_prune_level=0 it is always fast. For 5.6, using the simplified test case, I am getting 36 seconds with 10 subqueries and 7 min 19.72 sec with 11 subqueries. Slow or very slow statistics stage happens with empty tables as well as with some data in them.
[19 Aug 2015 13:16] Sinisa Milivojevic
Hi,

When you manage to find a fully reproducible test case, we will start working on it immediately. 

In that case, create a new bug report.