Bug #93188 Optimiser chose incorrect join order inside of derived table
Submitted: 14 Nov 2018 11:16 Modified: 16 Nov 2018 14:43
Reporter: Lobanov Mikhail Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: derived table

[14 Nov 2018 11:16] Lobanov Mikhail
Description:
After upgrading production servers to 5.7 version, we had an issue when one query execution took almost 20 minutes, which disabled our MySQL servers for a while.

The problematic query is:

 -- ------------
UPDATE norep_procedure_participants filter
INNER JOIN (
	SELECT
		ab.lot_id,
	COUNT(*) participants_count
	FROM auctions_bets ab
	INNER JOIN norep_last_bets_for_board_buy temp
		ON ab.id = temp.ab_id
	WHERE
		ab.status = 1
	GROUP BY ab.lot_id
) AS num
	ON filter.lot_id = num.lot_id
		AND filter.lot_type IN (4)
SET filter.participants_count = num.participants_count
 -- ------------

Tables which start with "norep" are temporary, if it matters.
Query worked fast and nice before we updated production servers to 5.7.22.

Google said we have to set option "SET SESSION optimizer_switch=‘derived_merge=off’;" but it didn't make any difference.

As workaround we split the query and went away from derived table to explicit temporary, then it worked well.

After some investigation we found that MySQL optimiser's internal SELECT joined auctions_bets table first (many rows) instead of norep_last_bets_for_board_buy (1 row), unlike usual. Moreover, query starts to work great again if we simply change join order from:

FROM auctions_bets ab
INNER JOIN norep_last_bets_for_board_buy temp

to:

FROM norep_last_bets_for_board_buy temp
INNER JOIN auctions_bets ab

How to repeat:
As far as I know, update + join with derived table must be enough to reproduce, but I can post some info about other tables on demand.
[14 Nov 2018 13:08] Sinisa Milivojevic
Hi,

Thank you for your bug report.

I fully understand what you are writing about. However, in order that we verify that this bug truly occurs, we need a fully repeatable test case. That means the definition and contents of all the tables involved in that query.  You can reduce the test case size by sending only sufficient number of rows that will reproduce this test case. You can also omit the columns that are not used in the query.

When you construct your test case and you compress the dump of the tables, you can upload it by using the "Files" tab in this bug report. That way, only Oracle employees can see your data.

Many thanks in advance.
[14 Nov 2018 13:19] Sinisa Milivojevic
One additional notice.

We do not require a test case for the UPDATE, but only for the SELECT that creates a derived table.
[14 Nov 2018 14:57] Lobanov Mikhail
Bunch of queries to reproduce

Attachment: test.sql (application/octet-stream, text), 249.14 KiB.

[14 Nov 2018 15:01] Lobanov Mikhail
Attached .sql file with full situation, just in case.
[14 Nov 2018 16:42] Sinisa Milivojevic
Hi,

Actually, 5.7 does an excellent job:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	temp	NULL	index	PRIMARY	PRIMARY	4	NULL	55	100.00	Using index; Using temporary; Using filesort
1	SIMPLE	ab	NULL	eq_ref	PRIMARY,lot_id	PRIMARY	4	test.temp.ab_id	1	10.00	Using where

This means that table `ab` is resolved first by the WHERE clause. Table `temp` is resolved by scanning through all the index in order to search for the rows, then a temporary table is used to perform a join and aggregating results , while filesort is used to resolve automatic ORDER BY. If you add at the end of SELECT ... ORDER BY NULL, filesort will disappear.

Table `ab` is first in join order rand table `temp` is second. The order in which tables are printed is not relevant.

Actually mysql-8.0 does the same thing, although it optimises away implicit ORDER BY, as it is truly not needed.

Hence, both versions take the correct join order, where table `ab` is first in the join order and table `temp` is second. It is just the way that EXPLAIN is printed that has puzzled you ....
[15 Nov 2018 8:01] Lobanov Mikhail
That's all right;
I got the same results with EXPLAIN SELECT, but still:

 -- Query 1:
 -- Execution time: [0.3464] <-- slow. In our case, test_bets table has more than 6 million rows, so the execution time is much greater.
UPDATE temporary_table_1 filter
INNER JOIN (
	SELECT
            ab.lot_id,
            COUNT(*) participants_count
        FROM test_bets ab
        INNER JOIN temporary_table_2 temp
            ON ab.id = temp.ab_id
        WHERE
            ab.status = 1
        GROUP BY ab.lot_id
) AS num
ON filter.lot_id = num.lot_id
    AND filter.lot_type IN (4)
SET filter.participants_count = num.participants_count

 -- Query 2:
 -- Execution time: [0.0009] <-- faster than light
UPDATE temporary_table_1 filter
INNER JOIN (
    SELECT
        ab.lot_id,
        COUNT(*) participants_count
    FROM temporary_table_2 temp
    INNER JOIN test_bets ab
        ON ab.id = temp.ab_id
    WHERE
        ab.status = 1
    GROUP BY ab.lot_id
) AS num
ON filter.lot_id = num.lot_id
    AND filter.lot_type IN (4)
SET filter.participants_count = num.participants_count
 -- ----

The only difference is the order of joining inside inner SELECT.
And again: we have fixed the query of course, the bug is that before 5.7 query worked well.
[15 Nov 2018 13:47] Sinisa Milivojevic
Hi,

A difference in speed that you have presented can not be due to the wrong join order.

Difference is probably due to other threads running in the same time or other processes running on the system. 

Most important of all, the EXPLAIN does not show ANY difference.
[15 Nov 2018 14:09] Evgeny Shichenko
Explain: temp query first in query

Attachment: explain_temp_first.json (application/json, text), 4.10 KiB.

[15 Nov 2018 14:09] Evgeny Shichenko
Explain: ab table query first in query

Attachment: explain_ab_first.json (application/json, text), 3.93 KiB.

[15 Nov 2018 14:14] Sinisa Milivojevic
HI,

I already wrote that EXPLAIN , as well as the trace, will print the tables in order in which they are used and not in the order in which they are joined.
[15 Nov 2018 14:14] Evgeny Shichenko
Hi all!
Explain format=json different for both cases. I added files with it.
The problem exactly in table order. We could reproduce this problem locally without any query load
[15 Nov 2018 14:25] Sinisa Milivojevic
The order in which tables are printed in EXPLAIN is not related to the join order.

You have not yet sent me the EXPLAIN that shows that the order in the join has changed. You have only presented the order in which tables are printed.
[15 Nov 2018 14:28] Evgeny Shichenko
https://www.db-fiddle.com/f/mP5WkyFmK4Wo6G3WroaB7B/0

I removed "temporary" word from tables, db-fiddle doesn't see them in query
[15 Nov 2018 14:37] Evgeny Shichenko
I don't understand

explain_temp_first.json shows 
"rows_examined_per_scan": 55 from temporary tables
"rows_examined_per_scan": 1 from ab

explain_ab_first.json shows
"rows_examined_per_scan": 600 from ab
"rows_examined_per_scan": 1 from temporary tables

Isn't it a proof?
Besides, the time difference on our production server was like 10 second vs 0.00..  We just changed table order in query
[15 Nov 2018 15:02] Evgeny Shichenko
In db-fiddle you can choose mysql 5.6
Explain is the same for both queries.
[15 Nov 2018 16:20] Sinisa Milivojevic
The output from EXPLAIN and the associated traces

Attachment: trace.txt (text/plain), 19.72 KiB.

[15 Nov 2018 16:22] Sinisa Milivojevic
I have just uploaded both, the EXPLAINs and the traces which prove that there is no bug.

I do not know what is db-fiddle, but I do know it is not our product, so it is irrelevant.
[15 Nov 2018 22:32] Evgeny Shichenko
The problem happens during UPDATE only 
Could you explain and trace whole UPDATE query?
[16 Nov 2018 13:59] Sinisa Milivojevic
Hi,

I can't do that for UPDATE, since I do not have the table for UPDATE.
[16 Nov 2018 14:23] Lobanov Mikhail
Hello, you have one in attached test.sql
[16 Nov 2018 14:41] Sinisa Milivojevic
EXPLAINs and traces for UPDATE

Attachment: trace1.txt (text/plain), 32.93 KiB.

[16 Nov 2018 14:43] Sinisa Milivojevic
Hi,

I have uploaded everything for the UPDATE.

The join order remains the same, but the different part is the plan, not the join order. Different plan affects the performance, so I am verifying it as a performance bug.