Bug #111025 Different cost with index hint for same plan
Submitted: 13 May 2023 18:22 Modified: 15 May 2023 10:23
Reporter: Ilya Kantor Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.33 OS:Any
Assigned to: CPU Architecture:Any

[13 May 2023 18:22] Ilya Kantor
Description:
EXPLAIN ANALYZE with an INDEX hint shows the same plan as without it.

But the cost is different.

mysql> explain analyze SELECT * FROM users WHERE name="Ann" ORDER BY salary;
+-------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on users using name_salary (name='Ann')  (cost=2910 rows=10106) (actual time=0.219..20.3 rows=10106 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql> explain analyze SELECT /*+ INDEX(users name_salary) */ * FROM users WHERE name="Ann" ORDER BY salary;
+-------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on users using name_salary (name='Ann')  (cost=3734 rows=18354) (actual time=0.225..20.5 rows=10106 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

As you can see, the plans are identical (that's fine, they should be), but their costs differ (weird).

How to repeat:
CREATE TABLE `users` (
  `id` int NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `name` varchar(64) NOT NULL,
  `salary` mediumint unsigned DEFAULT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `deleted_at` datetime DEFAULT NULL,
  KEY `name` (`name`),
  KEY `salary` (`salary`),
  KEY `name_salary` (`name`,`salary`),
  KEY `salary_name` (`salary`,`name`)
);

DROP TABLE IF EXISTS names;
CREATE TABLE names (
  name varchar(300) not null
);

INSERT INTO names (name) VALUES
('Emma'), ('Liam'), ('Noah'), ('Olivia'), ('William'), ('James'), ('Isabella'), ('Oliver'), ('Sophia'),
('Benjamin'), ('Elijah'), ('Charlotte'), ('Lucas'), ('Mia'), ('Mason'), ('Amelia'), ('Ethan'), ('Harper'),
('Logan'), ('Ava'), ('Alexander'), ('Abigail'), ('Sebastian'), ('Emily'), ('Michael'), ('Evelyn'), ('Daniel'),
('Madison'), ('Matthew'), ('Avery'), ('Henry'), ('Sofia'), ('Jackson'), ('Ella'), ('Samuel'), ('Scarlett'),
('David'), ('Victoria'), ('Joseph'), ('Aria'), ('Carter'), ('Grace'), ('Owen'), ('Chloe'), ('Wyatt'),
('Penelope'), ('John'), ('Riley'), ('Jack'), ('Lily'), ('Luke'), ('Aubrey'), ('Jayden'), ('Zoey'), ('Dylan'),
('Lila'), ('Levi'), ('Ellie'), ('Gabriel'), ('Aaliyah'), ('Nicholas'), ('Addison'), ('Isaac'), ('Natalie'),
('Lincoln'), ('Luna'), ('Christopher'), ('Savannah'), ('Joshua'), ('Brooklyn'), ('Andrew'), ('Hazel'),
('Julian'), ('Aurora'), ('Grayson'), ('Audrey'), ('Leah'), ('Bella'), ('Mateo'), ('Claire'), ('Ryan'),
('Stella'), ('Jaxon'), ('Paisley'), ('Leo'), ('Skylar'), ('Jonathan'), ('Samantha'), ('Charles'), ('Nora'),
('Adam'), ('Eleanor'), ('Thomas'), ('Caroline'), ('Xavier'), ('Nova'), ('Eli'), ('Genesis'), ('Pete'), ('Ann');

-- insert 1m users
INSERT INTO users (name, salary, created_at, deleted_at)
WITH RECURSIVE seq AS (SELECT 1 as n UNION ALL SELECT 1 + n FROM seq LIMIT 1000000)
SELECT
  -- random name
  (SELECT name FROM names ORDER BY RAND() LIMIT 1),
  -- random salary up to 10000
  rand()*10000,
  DATE_ADD('2020-01-01', INTERVAL n MINUTE),
  -- 99% null, 1% deleted_at = created_at + random within 30 days
  IF(rand()>0.01, null, DATE_ADD(DATE_ADD('2020-01-01', INTERVAL n MINUTE), INTERVAL rand()*30 DAY))
FROM seq;

explain analyze SELECT * FROM users WHERE name="Ann" ORDER BY salary;

explain analyze SELECT /*+ INDEX(users name_salary) */ * FROM users WHERE name="Ann" ORDER BY salary;
[15 May 2023 10:23] MySQL Verification Team
Hello Ilya Kantor,

Thank you for the report and test case.

regards,
Umesh