| Bug #96378 | Subquery with parameter is exponentially slower than hard-coded value | ||
|---|---|---|---|
| Submitted: | 31 Jul 2019 3:00 | Modified: | 2 Jan 2020 9:47 |
| Reporter: | Jeff Johnson | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
| Version: | 8.0.17 | OS: | Windows (Windows 10 x64) |
| Assigned to: | CPU Architecture: | x86 (AMD Threadripper 1950X) | |
| Tags: | performance | ||
[31 Jul 2019 12:35]
MySQL Verification Team
Hi Mr. Johnson, Thank you for your bug report. What you describe is a known behaviour of our optimiser. Simply, in order for index to be used, you can not use user variables. You can circumvent this in your application. However, I find that this is a very realistic feature request, so I am suggesting to you to convert this report into a feature request. Your feedback will be appreciated.
[31 Jul 2019 14:56]
Jeff Johnson
Converted to feature request. Just FYI, MS SQL Server has no problem with the parameters and runs the query quickly either way in just under 2 seconds, same data set.
[31 Jul 2019 15:01]
Jeff Johnson
When parameters are used not in a sub-query they do not suffer the performance issue as well, very odd.
[31 Jul 2019 15:10]
MySQL Verification Team
Hello Mr. Johnson, We do know that a problem is restricted to the nested queries. Simply, our optimiser currently concludes that if user variable is present that the nested query is a dependent one, which can not be converted to sem-join. Henceforth, this is very welcome feature request. Verified.
[2 Dec 2019 9:47]
Roy Lyseng
Hi Jeff, if you still have problems with this case, we'd like to reopen it. In version 8.0.18, we got a new feature EXPLAIN ANALYZE that will better provide information about what parts of a query contribute to the total cost. Please, can you run EXPLAIN ANALYZE on two queries: 1. The original query 2. The original query with user variables substituted with constant values. In version 8.0.17, we started optimizing NOT IN and NOT EXISTS in a hopefully better way, like a semi-join strategy. The new optimizer strategy can be de-selected with this setting: set optimizer_switch='semijoin=off'; If possible, can you try to run the query with both setting (semijoin on and off)? It is a bit tedious for us to try to recreate your data, and when you run the queries, we know it is the explained plan that causes you problems.
[3 Jan 2020 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[7 Jan 2020 14:48]
MySQL Verification Team
Hi Mr. Johnson, Is there a chance that we get a feedback from you on our questions ???

Description: I have the following query which takes almost 5 minutes to run. If I replace @__cutOff_1 with '2019-07-24' in the query, it takes 2 seconds to run. Both return identical results. Submissions table has 6 million rows, ip addresses table has 360K rows. Total db size is 2 GB. All necessary fields have indexes. Query explainer looks the same for both queries with the parameter and with the hard-coded value. QUERY: SET @__cutOff_0 = '2019-07-29'; SET @__minBanCount_1 = 2; SET @__cutOff_1 = '2019-07-24'; SET @__minBanCount_2 = 5; SELECT `i`.`IPAddress`, `i`.`BanCount`, MAX(`s`.`Timestamp`) AS `c` FROM `Submissions` AS `s` CROSS JOIN `IPAddresses` AS `i` WHERE (((`s`.`Timestamp` >= @__cutOff_0) AND (`i`.`BanCount` >= @__minBanCount_1)) AND (`s`.`IPAddressId` = `i`.`IPAddress`)) AND `i`.`IPAddress` NOT IN ( SELECT DISTINCT `i2`.`IPAddress` FROM `Submissions` AS `s2` CROSS JOIN `IPAddresses` AS `i2` WHERE (`s2`.`Timestamp` >= @__cutOff_1) AND (`i2`.`BanCount` >= @__minBanCount_2) AND (`s2`.`IPAddressId` = `i2`.`IPAddress`) ) GROUP BY `i`.`IPAddress`, `i`.`BanCount` ORDER BY `c` DESC LIMIT 50000 How to repeat: Here is my table creation script. You could honesty just create 360K random ip addresses entries, and then 8 million random submission entries pointing to the ip addresses and a few random found text entries. CREATE TABLE `foundtext` ( `Id` int(11) NOT NULL, `Text` varchar(255) NOT NULL, PRIMARY KEY (`Id`), UNIQUE KEY `IX_FoundText_Text` (`Text`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE TABLE `submissions` ( `Timestamp` datetime(6) NOT NULL, `IPAddressId` varbinary(16) NOT NULL, `SourceIPAddressId` varbinary(16) NOT NULL, `Source` longtext NOT NULL, `UserName` longtext NOT NULL, `OSNameId` int(11) NOT NULL, `OSVersionId` int(11) NOT NULL, `VersionId` int(11) NOT NULL, `ApiKeyId` bigint(20) DEFAULT NULL, PRIMARY KEY (`Timestamp`,`IPAddressId`,`SourceIPAddressId`), KEY `IX_Submissions_ApiKeyId` (`ApiKeyId`), KEY `IX_Submissions_IPAddressId` (`IPAddressId`), KEY `IX_Submissions_OSNameId` (`OSNameId`), KEY `IX_Submissions_OSVersionId` (`OSVersionId`), KEY `IX_Submissions_SourceIPAddressId` (`SourceIPAddressId`), KEY `IX_Submissions_VersionId` (`VersionId`), CONSTRAINT `FK_Submissions_ApiKeys_ApiKeyId` FOREIGN KEY (`ApiKeyId`) REFERENCES `apikeys` (`Id`) ON DELETE RESTRICT, CONSTRAINT `FK_Submissions_FoundText_OSNameId` FOREIGN KEY (`OSNameId`) REFERENCES `foundtext` (`Id`) ON DELETE RESTRICT, CONSTRAINT `FK_Submissions_FoundText_OSVersionId` FOREIGN KEY (`OSVersionId`) REFERENCES `foundtext` (`Id`) ON DELETE RESTRICT, CONSTRAINT `FK_Submissions_FoundText_VersionId` FOREIGN KEY (`VersionId`) REFERENCES `foundtext` (`Id`) ON DELETE RESTRICT, CONSTRAINT `FK_Submissions_IPAddresses_IPAddressId` FOREIGN KEY (`IPAddressId`) REFERENCES `ipaddresses` (`IPAddress`) ON DELETE RESTRICT, CONSTRAINT `FK_Submissions_IPAddresses_SourceIPAddressId` FOREIGN KEY (`SourceIPAddressId`) REFERENCES `ipaddresses` (`IPAddress`) ON DELETE RESTRICT ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE TABLE `ipaddresses` ( `IPAddress` varbinary(16) NOT NULL, `BanCount` bigint(20) NOT NULL, `AttackedCount` bigint(20) NOT NULL, `Details` varchar(1024) NOT NULL, `Flags` int(11) NOT NULL, PRIMARY KEY (`IPAddress`), KEY `IX_IPAddresses_AttackedCount` (`AttackedCount`), KEY `IX_IPAddresses_BanCount` (`BanCount`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; Suggested fix: This seems like a serious performance bug in the internal query parser / handler of MySQL. Even if full table scans were being done, I would not expect this query to take 5 minutes, it is not very complex and I have added all the necessary indexes...