Description:
There is a specific case where I need to join a smaller table to a bigger table, which works fine.
However, if I add an index for the bigger table containing the fields in join conditions query optimizer tries to execute smaller table select first ignoring the fact that it still needs to order by/filter by the bigger table in the end. Thus adding an index makes the query slower.
Tested in 8.0.26 and 5.7.35.
I didn't find any open bugs with similar description.
How to repeat:
-- 1. Create two tables:
CREATE TABLE DocumentOwner
(
CompanyID INT NOT NULL,
OwnerID INT NOT NULL,
Description nvarchar(100),
PRIMARY KEY (CompanyID, OwnerID)
);
CREATE TABLE Document
(
CompanyID INT NOT NULL,
DocID INT NOT NULL,
DocOwnerID INT NOT NULL,
Description nvarchar(100),
PRIMARY KEY (CompanyID, DocID)
);
-- 2. Insert data (Document table should be significantly bigger than DocumentOwner)
-- Insert 500 rows for DocumentOwner table
INSERT INTO DocumentOwner(CompanyID, OwnerID, Description) SELECT 1, 1, 'Description1';
...
INSERT INTO DocumentOwner(CompanyID, OwnerID, Description) SELECT 1, 500, 'Description1';
-- Insert 40 000 rows for Document table
INSERT INTO Document (CompanyID, DocID, DocOwnerID, Description) SELECT 1,1,ceil(rand() * 400), ceil(rand() * 100);
...
INSERT INTO Document (CompanyID, DocID, DocOwnerID, Description) SELECT 1,40000,ceil(rand() * 400), ceil(rand() * 100);
-- 3. Execute the following select:
EXPLAIN SELECT * FROM Document
INNER JOIN DocumentOwner
ON DocumentOwner.OwnerID=Document.DocOwnerID
AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
ORDER BY Document.DocID DESC
LIMIT 2
;
SELECT * FROM Document
INNER JOIN DocumentOwner
ON DocumentOwner.OwnerID=Document.DocOwnerID
AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
ORDER BY Document.DocID DESC
LIMIT 2
;
-- The select completes very fast (under 1 ms), in the execution plan the firs part is Document by Primary key and the second part is DocumentOwner by Primary Key
-- 4. Create index for Document table containing the join conditions
CREATE INDEX BadIndex ON Document (DocOwnerID);
-- 5. Execute the select again:
EXPLAIN SELECT * FROM Document
INNER JOIN DocumentOwner
ON DocumentOwner.OwnerID=Document.DocOwnerID
AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
ORDER BY Document.DocID DESC
LIMIT 2
;
SELECT * FROM Document
INNER JOIN DocumentOwner
ON DocumentOwner.OwnerID=Document.DocOwnerID
AND DocumentOwner.CompanyID=Document.CompanyID
WHERE Document.CompanyID=1
AND Document.Description > '5'
AND DocumentOwner.Description LIKE 'Description%'
ORDER BY Document.DocID DESC
LIMIT 2
;
-- This time the select takes significantly more time to complete (~50 ms) and execution plan contains DocumentOwner as the first table and Document joined using BadIndex index
Suggested fix:
I understand that having where condition DocumentOwner.Description LIKE 'Description%' makes it so optimizer cannot predict which table would be more restrictive and what would be the most efficient way, but it tries to use a risky plan instead of a safe plan that has all selects using primary keys. I think it would be better if it uses the safe option in case it's not clear what impact would changing the primary table have.