Bug #85834 InnoDB: Very slow select query with many tables and IS NULL in where clause
Submitted: 6 Apr 2017 13:31 Modified: 1 Aug 2017 10:13
Reporter: Vassilis Kary Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.7.17 OS:Windows
Assigned to: CPU Architecture:Any
Tags: in null, innodb, SELECT, slow query, where, windows

[6 Apr 2017 13:31] Vassilis Kary
Description:
Beginning with a database schema like this:

CREATE TABLE `filetable` (
  `FileID` int(10) UNSIGNED NOT NULL,
  `FileName` varchar(255) NOT NULL DEFAULT ''
) ENGINE=InnoDB;

and many tables like these (33 in my case):

CREATE TABLE `m1491317069` (
  `LineID` int(10) UNSIGNED NOT NULL,
  `CatID` int(10) UNSIGNED NOT NULL,
  `FileID` int(10) UNSIGNED NOT NULL,
  `Cnt` smallint(5) UNSIGNED NOT NULL DEFAULT '0'
) ENGINE=InnoDB;

Then the query like this (shorten version):

SELECT ft.fileid, ft.filename
FROM filetable AS ft
LEFT JOIN m1491317069 ON m1491317069.fileid = ft.fileid
LEFT JOIN m1491233158 ON m1491233158.fileid = ft.fileid
LEFT JOIN m1491387139 ON m1491387139.fileid = ft.fileid
[...]
WHERE m1491317069.catid IS NULL
  AND m1491233158.catid IS NULL
  AND m1491387139.catid IS NULL
  [...]

This query takes on a MySQL 5.6.x more than 20 seconds to execute and in 5.7.x (not 5.7.17, older one) more than 3 minutes and maybe more!

Changing the engine to MyISAM runs very fast, in less than 0,02 seconds mostly.

How to repeat:
Using the attached MySQL-Dump (InnoDB- or MyISAM-version) and the attached full query.

The more WHERE-AND-IS-NULL are added, the slower is the execution in InnoDB (about twice slower with each one more WHERE-AND-IS-NULL).

Suggested fix:
I can not give a fix, I found a workaround as temporary solution (SQL query).

SELECT ft.fileid, ft.filename,
       m1491317069.catid AS mcat1491317069,
       m1491233158.catid AS mcat1491233158,
       m1491387139.catid AS mcat1491387139,
       [...]
FROM filetable ft
LEFT JOIN m1491317069 ON m1491317069.fileid = ft.fileid
LEFT JOIN m1491233158 ON m1491233158.fileid = ft.fileid
LEFT JOIN m1491387139 ON m1491387139.fileid = ft.fileid
[...]
GROUP BY ft.fileid
HAVING mcat1491317069 IS NULL
  AND mcat1491233158 IS NULL
  AND mcat1491387139 IS NULL
  AND mcat1480513343 IS NULL
  [...]
[6 Apr 2017 13:32] Vassilis Kary
Full MySQL test dump with InnoDB as engine

Attachment: bug-report-sql-test-dump-innodb.sql (application/octet-stream, text), 28.21 KiB.

[6 Apr 2017 13:32] Vassilis Kary
Full MySQL test query

Attachment: bug-report-test-query-full.sql (application/octet-stream, text), 2.95 KiB.

[6 Apr 2017 13:33] Vassilis Kary
Full MySQL test dump with MyISAM as engine

Attachment: bug-report-sql-test-dump-myisam.sql (application/octet-stream, text), 28.21 KiB.

[6 Apr 2017 13:36] Vassilis Kary
On MySQL 5.7.17, it runs a little bit "faster" (maybe in less than 3 minutes...), but not as "fast" as the very slow 5.6.x version.
[6 Apr 2017 13:44] Vassilis Kary
Interesting remark:
Using the SQL workaround with GROUP BY / HAVING, it runs on 5.7.17 very fast (about 0,015 seconds) but needs about 19 seconds in 5.6.x!
[7 Apr 2017 15:54] MySQL Verification Team
Hi!

I have been able to repeat the behavior. GROUP BY with HAVING is not a workaround as SQL standard clearly states that HAVING can be applied only to fields in GROUP BY clause.

Also, there should be no difference between MyISAM and InnoDB as these queries are mostly multi-join queries, where performance should not depend on the index search.

Verified as reported.
[7 Apr 2017 16:02] MySQL Verification Team
All the LEFT JOINs so I wonder if it is similar to 
https://bugs.mysql.com/bug.php?id=72854
[1 Aug 2017 10:13] Ajo Robert
Optimizer behaviour for InnoDB case
-----------------------------------------------------
Table estimate gets 1 rec per table for all 15 empty tables and becomes part
join operation as HA_STATS_RECORDS_IS_EXACT is not set for InnoDB and there
exists a non-const join condition attached with the table.
This will result in all these 37 tables using join buffer (Block Nested Loop)
while execution. And as a result InnoDB execution will take a long time due
to bug#72854.

Optimizer behaviour for MyISAM case
-------------------------------------------------------
Mark all 15 empty tables based of record stat value 0 and ha_table_flags set
to HA_STATS_RECORDS_IS_EXACT. This will make optimizer mark it as system
table with no rows found and skip using join buffer (Block Nested Loop).
If i add 1 row each to all 15 empty tables to crosscheck the same, MyISAM
executions takes exactly same time as InnoDB as all MyISAM tables will end up
using join buffers.

Conclusion
-----------------
With optimizer_switch='block_nested_loop=off' query executes super fast (less than a second instead of 3+ minute).

1. HA_STATS_RECORDS_IS_EXACT is available only for MyISAM and HEAP engine.
Thus there is not much can be done towards this direction for InnoDB.
2. I do not see any further optimization possibility due to the presents of
non-const join condition with each table.

This issue will get resolved once Bug#72854 is fixed.