MySQL Bugs: #96378: Subquery with parameter is exponentially slower than hard-coded value

Bug #96378	Subquery with parameter is exponentially slower than hard-coded value
Submitted:	31 Jul 2019 3:00	Modified:	2 Jan 2020 9:47
Reporter:	Jeff Johnson	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	8.0.17	OS:	Windows (Windows 10 x64)
Assigned to:		CPU Architecture:	x86 (AMD Threadripper 1950X)
Tags:	performance

Description:
I have the following query which takes almost 5 minutes to run. If I replace @__cutOff_1 with '2019-07-24' in the query, it takes 2 seconds to run. Both return identical results.

Submissions table has 6 million rows, ip addresses table has 360K rows. Total db size is 2 GB. All necessary fields have indexes. Query explainer looks the same for both queries with the parameter and with the hard-coded value.

QUERY:

SET @__cutOff_0 = '2019-07-29';
SET @__minBanCount_1 = 2;
SET @__cutOff_1 = '2019-07-24';
SET @__minBanCount_2 = 5;

SELECT `i`.`IPAddress`, `i`.`BanCount`, MAX(`s`.`Timestamp`) AS `c`
FROM `Submissions` AS `s`
CROSS JOIN `IPAddresses` AS `i`
WHERE (((`s`.`Timestamp` >= @__cutOff_0) AND (`i`.`BanCount` >= @__minBanCount_1)) AND (`s`.`IPAddressId` = `i`.`IPAddress`)) AND `i`.`IPAddress` NOT IN (
    SELECT DISTINCT `i2`.`IPAddress`
    FROM `Submissions` AS `s2`
    CROSS JOIN `IPAddresses` AS `i2`
    WHERE (`s2`.`Timestamp` >= @__cutOff_1) AND (`i2`.`BanCount` >= @__minBanCount_2) AND (`s2`.`IPAddressId` = `i2`.`IPAddress`)
)
GROUP BY `i`.`IPAddress`, `i`.`BanCount`
ORDER BY `c` DESC
LIMIT 50000

How to repeat:
Here is my table creation script. You could honesty just create 360K random ip addresses entries, and then 8 million random submission entries pointing to the ip addresses and a few random found text entries.

CREATE TABLE `foundtext` (
  `Id` int(11) NOT NULL,
  `Text` varchar(255) NOT NULL,
  PRIMARY KEY (`Id`),
  UNIQUE KEY `IX_FoundText_Text` (`Text`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `submissions` (
  `Timestamp` datetime(6) NOT NULL,
  `IPAddressId` varbinary(16) NOT NULL,
  `SourceIPAddressId` varbinary(16) NOT NULL,
  `Source` longtext NOT NULL,
  `UserName` longtext NOT NULL,
  `OSNameId` int(11) NOT NULL,
  `OSVersionId` int(11) NOT NULL,
  `VersionId` int(11) NOT NULL,
  `ApiKeyId` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`Timestamp`,`IPAddressId`,`SourceIPAddressId`),
  KEY `IX_Submissions_ApiKeyId` (`ApiKeyId`),
  KEY `IX_Submissions_IPAddressId` (`IPAddressId`),
  KEY `IX_Submissions_OSNameId` (`OSNameId`),
  KEY `IX_Submissions_OSVersionId` (`OSVersionId`),
  KEY `IX_Submissions_SourceIPAddressId` (`SourceIPAddressId`),
  KEY `IX_Submissions_VersionId` (`VersionId`),
  CONSTRAINT `FK_Submissions_ApiKeys_ApiKeyId` FOREIGN KEY (`ApiKeyId`) REFERENCES `apikeys` (`Id`) ON DELETE RESTRICT,
  CONSTRAINT `FK_Submissions_FoundText_OSNameId` FOREIGN KEY (`OSNameId`) REFERENCES `foundtext` (`Id`) ON DELETE RESTRICT,
  CONSTRAINT `FK_Submissions_FoundText_OSVersionId` FOREIGN KEY (`OSVersionId`) REFERENCES `foundtext` (`Id`) ON DELETE RESTRICT,
  CONSTRAINT `FK_Submissions_FoundText_VersionId` FOREIGN KEY (`VersionId`) REFERENCES `foundtext` (`Id`) ON DELETE RESTRICT,
  CONSTRAINT `FK_Submissions_IPAddresses_IPAddressId` FOREIGN KEY (`IPAddressId`) REFERENCES `ipaddresses` (`IPAddress`) ON DELETE RESTRICT,
  CONSTRAINT `FK_Submissions_IPAddresses_SourceIPAddressId` FOREIGN KEY (`SourceIPAddressId`) REFERENCES `ipaddresses` (`IPAddress`) ON DELETE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `ipaddresses` (
  `IPAddress` varbinary(16) NOT NULL,
  `BanCount` bigint(20) NOT NULL,
  `AttackedCount` bigint(20) NOT NULL,
  `Details` varchar(1024) NOT NULL,
  `Flags` int(11) NOT NULL,
  PRIMARY KEY (`IPAddress`),
  KEY `IX_IPAddresses_AttackedCount` (`AttackedCount`),
  KEY `IX_IPAddresses_BanCount` (`BanCount`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Suggested fix:
This seems like a serious performance bug in the internal query parser / handler of MySQL.

Even if full table scans were being done, I would not expect this query to take 5 minutes, it is not very complex and I have added all the necessary indexes...

Hi Mr. Johnson,

Thank you for your bug report.

What you describe is a known behaviour of our optimiser. Simply, in order for index to be used, you can not use user variables. You can circumvent this in your application.

However, I find that this is a very realistic feature request, so I am suggesting to you to convert this report into a feature request.

Your feedback will be appreciated.

Converted to feature request. Just FYI, MS SQL Server has no problem with the parameters and runs the query quickly either way in just under 2 seconds, same data set.

When parameters are used not in a sub-query they do not suffer the performance issue as well, very odd.

Hello Mr. Johnson,

We do know that a problem is restricted to the nested queries. Simply, our optimiser currently concludes that if user variable is present that the nested query is a dependent one, which can not be converted to sem-join.

Henceforth, this is very welcome feature request.

Verified.

Hi Jeff,

if you still have problems with this case, we'd like to reopen it.

In version 8.0.18, we got a new feature EXPLAIN ANALYZE that will better provide information about what parts of a query contribute to the total cost.

Please, can you run EXPLAIN ANALYZE on two queries:
1. The original query
2. The original query with user variables substituted with constant values.

In version 8.0.17, we started optimizing NOT IN and NOT EXISTS in a hopefully better way, like a semi-join strategy. The new optimizer strategy can be de-selected with this setting:
set optimizer_switch='semijoin=off';

If possible, can you try to run the query with both setting (semijoin on and off)?

It is a bit tedious for us to try to recreate your data, and when you run the queries, we know it is the explained plan that causes you problems.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Hi Mr. Johnson,

Is there a chance that we get a feedback from you on our questions ???