Bug #104978 Adding Join Index forces query optimizer to use inefficient execution plan
Submitted: 17 Sep 2021 19:57 Modified: 20 Sep 2021 8:58
Reporter: Dmitrii Naumov Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.26, 5.7.35 OS:Windows
Assigned to: CPU Architecture:Any
Tags: execution plan, INDEX, Optimizer

[17 Sep 2021 19:57] Dmitrii Naumov
Description:
There is a specific case where I need to join a smaller table to a bigger table, which works fine. 
However, if I add an index for the bigger table containing the fields in join conditions query optimizer tries to execute smaller table select first ignoring the fact that it still needs to order by/filter by the bigger table in the end. Thus adding an index makes the query slower. 

Tested in 8.0.26 and 5.7.35. 
I didn't find any open bugs with similar description.

How to repeat:
-- 1. Create two tables:
CREATE TABLE DocumentOwner
(
	CompanyID INT NOT NULL,
	OwnerID INT NOT NULL, 
	Description nvarchar(100),
	PRIMARY KEY (CompanyID, OwnerID)
);
CREATE TABLE Document 
(
	CompanyID INT NOT NULL,
	DocID INT NOT NULL, 
	DocOwnerID INT NOT NULL,
	Description nvarchar(100), 
	PRIMARY KEY (CompanyID, DocID)
);

-- 2. Insert data (Document table should be significantly bigger than DocumentOwner)
-- Insert 500 rows for DocumentOwner table
INSERT INTO DocumentOwner(CompanyID, OwnerID, Description) SELECT  1, 1, 'Description1';
...
INSERT INTO DocumentOwner(CompanyID, OwnerID, Description) SELECT  1, 500, 'Description1';

-- Insert 40 000 rows for Document table
INSERT INTO Document (CompanyID, DocID, DocOwnerID, Description) SELECT 1,1,ceil(rand() * 400), ceil(rand() * 100);
...
INSERT INTO Document (CompanyID, DocID, DocOwnerID, Description) SELECT 1,40000,ceil(rand() * 400), ceil(rand() * 100);

-- 3. Execute the following select:
EXPLAIN SELECT * FROM Document 
INNER JOIN 	DocumentOwner 
	ON DocumentOwner.OwnerID=Document.DocOwnerID
	AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1 
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
 ORDER BY	Document.DocID DESC
LIMIT 2
;

SELECT * FROM Document 
INNER JOIN 	DocumentOwner 
	ON DocumentOwner.OwnerID=Document.DocOwnerID
	AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1 
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
 ORDER BY	Document.DocID DESC
LIMIT 2
;

-- The select completes very fast (under 1 ms), in the execution plan the firs part is Document by Primary key and the second part is DocumentOwner by Primary Key
	
-- 4. Create index for Document table containing the join conditions
CREATE INDEX BadIndex ON Document (DocOwnerID); 

-- 5. Execute the select again:
EXPLAIN SELECT * FROM Document 
INNER JOIN 	DocumentOwner 
	ON DocumentOwner.OwnerID=Document.DocOwnerID
	AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1 
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
 ORDER BY	Document.DocID DESC
LIMIT 2
;

SELECT * FROM Document 
INNER JOIN 	DocumentOwner 
	ON DocumentOwner.OwnerID=Document.DocOwnerID
	AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1 
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
 ORDER BY	Document.DocID DESC
LIMIT 2
;
-- This time the select takes significantly more time to complete (~50 ms) and execution plan contains DocumentOwner as the first table and Document joined using BadIndex index

Suggested fix:
I understand that having where condition  DocumentOwner.Description LIKE 'Description%' makes it so optimizer cannot predict which table would be more restrictive and what would be the most efficient way, but it tries to use a risky plan instead of a safe plan that has all selects using primary keys. I think it would be better if it uses the safe option in case it's not clear what impact would changing the primary table have.
[17 Sep 2021 19:58] Dmitrii Naumov
Select that one can use to test

Attachment: SELECT.sql (application/octet-stream, text), 758 bytes.

[17 Sep 2021 19:58] Dmitrii Naumov
query to create tables

Attachment: CreateTables.sql (application/octet-stream, text), 318 bytes.

[17 Sep 2021 19:59] Dmitrii Naumov
Query to generate DocumentOwner data

Attachment: DocumentOwnerData.sql (application/octet-stream, text), 45.69 KiB.

[17 Sep 2021 20:01] Dmitrii Naumov
Script to generate Document Data Part 1

Attachment: DocumentData.sql (application/octet-stream, text), 2.30 MiB.

[17 Sep 2021 20:02] Dmitrii Naumov
Script to generate Document Data Part 2

Attachment: DocumentData_Part2.sql (application/octet-stream, text), 2.31 MiB.

[17 Sep 2021 20:02] Dmitrii Naumov
Attached files with the queries used to reproduce the issue
[20 Sep 2021 8:58] MySQL Verification Team
Hello Dmitrii,

Thank you for the report and test case.

regards,
Umesh