MySQL Bugs: #2904: Subqueries lack optimization

Bug #2904	Subqueries lack optimization
Submitted:	20 Feb 2004 7:05	Modified:	14 Jan 2020 20:21
Reporter:	Joao Santos	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	5.0.0-alpha	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	performance, subquery

Description:
I don't know how you are handling results returned by subqueries but I guess something is not being done as it should since I can get a better performance (hundreds of times faster in this example) with a properly locked temporary table than with a subquery.

Although I can still use a workaround, a better optimization on subqueries would be nice.

How to repeat:
I have two tables: one with a few PIDs and another with an history of events logged for each PID (called procs (with procs.id being the primary key) and events (with events.proc_id being a logical foreign key for procs.id) respectively).

The following queries look semanticaly the same (at least for me, if I am not getting crazy) but the second group runs way faster:

First group (all-in-one query):
mysql> select count(*) as count from (select procs.id, q1.proc_id from procs left join (select distinct proc_id from events where msg = 'MSG_LOJA_CONF_LEVANT_ARTIGO') as q1 on procs.id = q1.proc_id where q1.proc_id is null) as q1;
+-------+
| count |
+-------+
| 22263 |
+-------+
1 row in set (4 min 6.82 sec)

Second group (using locked temporary tables):
mysql> lock tables procs read, events read, procs_tmp write;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into procs_tmp select distinct proc_id from events where msg = 'MSG_LOJA_CONF_LEVANT_ARTIGO';
Query OK, 23523 rows affected (0.97 sec)
Records: 23523  Duplicates: 0  Warnings: 0

mysql> select count(*) as count from (select procs.id, procs_tmp.id from procs left join procs_tmp on procs.id = procs_tmp.id where procs_tmp.id is null) as q1;
+-------+
| count |
+-------+
| 22263 |
+-------+
1 row in set (0.36 sec)

mysql> unlock tables;
Query OK, 0 rows affected (0.01 sec)

Suggested fix:
I may be simplifying things too much, but why not load the results from the subquery into the memory as if it was a regular table and then requerying them?

It seems to be a join order issue, mostly.  A test case:

CREATE TABLE t1 ( a INT UNSIGNED NOT NULL PRIMARY KEY );

CREATE TABLE t2 LIKE t1;

INSERT INTO t1 SELECT a+b+c+d+1 FROM ( SELECT 0 AS a UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 ) a CROSS JOIN ( SELECT 0 AS b UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60 UNION SELECT 70 UNION SELECT 80 UNION SELECT 90 ) b CROSS JOIN ( SELECT 0 AS c UNION SELECT 100 UNION SELECT 200 UNION SELECT 300 UNION SELECT 400 UNION SELECT 500 UNION SELECT 600 UNION SELECT 700 UNION SELECT 800 UNION SELECT 900 ) c CROSS JOIN ( SELECT 0 AS d UNION SELECT 1000 UNION SELECT 2000 UNION SELECT 3000 UNION SELECT 4000 UNION SELECT 5000 UNION SELECT 6000 UNION SELECT 7000 UNION SELECT 8000 UNION SELECT 9000 ) d;

INSERT INTO t2 SELECT a FROM t1 WHERE a % 2;

-- Both of these are fast
SELECT STRAIGHT_JOIN COUNT(*) FROM t1 INNER JOIN t2 USING (a);
SELECT STRAIGHT_JOIN COUNT(*) FROM t2 INNER JOIN t1 USING (a);

-- First is slow; second is fast
SELECT STRAIGHT_JOIN COUNT(*) FROM t1 INNER JOIN (SELECT a FROM t1 WHERE a %2) t2 USING (a);
SELECT STRAIGHT_JOIN COUNT(*) FROM (SELECT a FROM t1 WHERE a %2) t2 INNER JOIN t1 USING (a);

Posted by developer:
 
All of these operations are now consistently fast, hence I consider this feature implemented.
The feature that fixed this problem is probably subquery materialization with index generation.
Implemented in 5.7 (or maybe earlier).