Bug #115122 Unneeded full table scan, although there is an index and condition
Submitted: 25 May 2024 11:17 Modified: 16 Jun 2024 9:10
Reporter: Ahmed Wahba Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.4 (Latest), 8.0.37 OS:Ubuntu (22.04 LTS)
Assigned to: CPU Architecture:ARM
Tags: nested query, performance, regression

[25 May 2024 11:17] Ahmed Wahba
Description:
I have a simple schema with few tables, and I found that a query is taking very long (7+ hours), using explain query I found that there is a full table scan although it shouldn't because there is an index an the column in where condition.

I restructured the query, and the index was used and the query took less than 5 seconds.

table device_message includes around 5 million records
table sent_messages includes around 5 million records

Query 1 - which took (7+ hours):

SELECT count(*) 
FROM sent_messages
WHERE message_type = 0 
AND message_timestamp BETWEEN '2024-05-23 16:00:00' AND '2024-05-23 19:00:00' 
AND EXISTS(
	SELECT 1 
	FROM device_tenant_id as dti 
	WHERE EXISTS(
		SELECT 1 
		FROM device_message 
		WHERE is_imei = 0 
		AND dti.imei = device_message.imei
		AND received_time BETWEEN '2024-05-23 16:00:00' AND '2024-05-23 19:00:00'
	)
	AND device_id = id
);

Query 2 - which took less than 5 seconds:

SELECT count(*) 
FROM sent_messages 
INNER JOIN (SELECT distinct id 
			FROM device_tenant_id as dti 
			WHERE EXISTS(
				SELECT 1 
				FROM device_message 
				WHERE is_imei = 0 
				AND dti.imei = device_message.imei
				AND received_time BETWEEN '2024-05-23 16:00:00' AND '2024-05-23 19:00:00'
				)
			) as device_ids
			ON device_ids.id = sent_messages.device_id
WHERE message_type = 0 
AND message_timestamp BETWEEN '2024-05-23 16:00:00' AND '2024-05-23 19:00:00';

How to repeat:
In order to repeat, I will attach an SQL file to create the table structure, and a python script to fill the tables with 5M rows.

then you will be able to explain both queries (Query 1 and Query 2) and you will see that Query 1 does full table scan although it shouldn't.
[25 May 2024 11:22] Ahmed Wahba
SQL script to create schema

Attachment: 01.schema.sql (application/octet-stream, text), 1.45 KiB.

[25 May 2024 11:24] Ahmed Wahba
Python script to fill the tables with 5M rows (takes time)

Attachment: generator.py (text/x-python), 3.02 KiB.

[25 May 2024 11:48] Ahmed Wahba
test_db part 1

Attachment: backup.sql.zstaa (application/octet-stream, text), 47.68 MiB.

[25 May 2024 11:50] Ahmed Wahba
I uploaded a test database with 5 Million records on both tables, generated using the python script which I attached as well
[25 May 2024 11:51] Ahmed Wahba
test_db part 2

Attachment: backup.sql.zstab (application/octet-stream, text), 43.52 MiB.

[6 Jun 2024 4:51] MySQL Verification Team
Hello Ahmed Wahba,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[16 Jun 2024 9:10] Ahmed Wahba
Hi Team, thanks for verifying the issue.

what is the way forward? is there a target date? target fix version?

Thanks.