Bug #102359 Functional Index ignored by Select query from within a Stored Function
Submitted: 24 Jan 8:57 Modified: 27 Jan 18:48
Reporter: Robert Roland Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.23 OS:Any
Assigned to: CPU Architecture:Any
Tags: functional index, stored routine

[24 Jan 8:57] Robert Roland
Description:
A functional index was created based on a CONCAT of 2 varchar(30) columns. The index is available for normal select queries and works as expected. However when the same select query is used within a user defined function UDF, the functional index is ignored.

How to repeat:
Create a table (mytable) with autoincrement primary key and at least 3 varchar columns named n1,n2,n3... (varchar(30) character set utf8mb4 collation utf8mb4_0900_ai_ci)

Populate table such that it is large enough (even after buffer pool warm up) to clearly observe whether an index is being ignored by a select query.

Create a functional index:
ALTER TABLE mytable ADD INDEX n12((concat(`n1`, `n2`))

Run select query that uses functional index and note execution time:
SELECT n3,COUNT(*) as 'count' from mytable where CONCAT(n1,n2)='str' group by n3 limit 1;

Create UDF for this same query and note execution time:

CREATE DEFINER=`userX`@`%` FUNCTION `test`(str VARCHAR(60) CHARSET utf8mb4 COLLATE 'utf8mb4_0900_ai_ci') RETURNS varchar(30) CHARSET utf8mb4 COLLATE 'utf8mb4_0900_ai_ci'

BEGIN

DECLARE result varchar (30) CHARSET utf8mb4 COLLATE 'utf8mb4_0900_ai_ci' DEFAULT '';
DECLARE cnt INT DEFAULT 0;

SELECT n3,COUNT(*) as 'count' into result,cnt from mytable where CONCAT(n1,n2)=str group by n3 limit 1;

RETURN (result);

END

Suggested fix:
Unknown
[25 Jan 13:18] MySQL Verification Team
Hi Mr. Roland,

Thank you for your bug report.

However, most likely, this is not a bug.

When UDF is invoked, the execution of the query has already passed the stage of the optimisation. Hence, we would recommend that you force the index for that query.

We also, could not find that this item is documented, so we could make this a documentation bug.

Let us know if you agree with this.
[25 Jan 13:23] MySQL Verification Team
Hi,

Beside that we would need some more feedback from you.

We so not see how do you call that UDF in your code.

Next, please try using user variable and SELECT INTO ...... query, on the command line, without invoking your function.

Also, please each of the three queries three times. Query that does not call UDF, query with INTO user variable (instead of local variable) and query with your UDF. We would like to see those times.

Last , but not least, try ot run EXPLAIN on all three queries, including the one that you call in your UDF.

Many thanks in advance.
[26 Jan 7:51] Robert Roland
I will revert back to you with more information by Wednesday. In the meantime I have comments on your reply:

"When UDF is invoked, the execution of the query has already passed the stage of the optimisation. Hence, we would recommend that you force the index for that query."

I use many UDFs that contain within them a query. For example, I may use a UDF as follows: update mytable set mycolumn = UDFname(anothercolumn);
and within the UDF I have a select query which requires an index to have good performance. These have always worked well and the select statements use the correct index regardless of whether they are wrapped in a UDF or not.

I have to believe that MySQL does 2 levels of optimization. The first level is for the primary query such as:

select myUDF('string');

And the second level is within the UDF itself since they frequently may contain a query inside. I have never needed to FORCE index on any select statement in a UDF so there logically must be 2 layers of optimization.

The UDF that this bug report is regarding actually has a "select UNION ALL select..." structure inside and each select is a query against a table with 200M rows. This UDF had been returning a result within a few hundred milliseconds over the past year. But, when I recently added 1 more select statement (needing a functional index) to the UNION ALL the UDF never returned a result even after 30 minutes.

BUT, this exact "select UNION ALL select.." I just described above that includes the select statement needing the functional index runs in under 1 second if I pull it out of the UDF and run it.

I don't think this should be a documentation bug. I think that a functional index should be available like any other index is available for a select statement within a UDF or Procedure.

"We so not see how do you call that UDF in your code."

The UDF is called as follows assuming my UDF is named "test":

select test('mystring');

"please try using user variable and SELECT INTO ...... query, on the command line, without invoking your function."

The "select into" syntax I put in the bug report is most likely irrelevant because the "select into" syntax works perfectly fine if my UDF select statement relies on a normal index rather than a functional index. I also verified I have the same experience if I create a temp table in the UDF and try to populate it with my select result. I also tried just having the select statement run inside the UDF and not place it's result anywhere and the UDF runs fine if the select doesn't need a functional index and "hangs" if the select needs a functional index.

"Also, please each of the three queries three times"

As described above, when the select is outside the UDF the time is under 1 second. When inside the UDF it is > 30 minutes and must be killed.

"try ot run EXPLAIN on all three queries"

EXPLAIN for the query outside of UDF indicates the optimizer is choosing the correct functional index.
I do not know how to view the output for EXPLAIN for a select statement wrapped inside a UDF.
[26 Jan 12:42] MySQL Verification Team
Thank you Mr. Roland,

However, we do need a fully repeatable test case.

We can process a report only when we can observe the same behaviour that you are seeing.
[26 Jan 13:20] MySQL Verification Team
Hi,

Before getting us the the test case, please respond to all of our questions that we asked you yesterday, On January 25th 2021.
[26 Jan 21:27] Robert Roland
I have attached the following files to this case so that you can easily reproduce the issue. The attached files:

1. Dump of "mytable" which has an AUTOINCREMENT primary key and 2 varchar(50) columns called n1 and n2 with 180M rows of text strings (random people names). mytable also has 2 secondary keys: a composite/compound index (`n1`,`n2`) and a functional index ((concat(`n1`,`n2`)))

2. Two .txt files containing the UDFs that run a select against mytable. One uses the composite index and the other uses the functional index

Using mytable and the UDFs attached these are the results:

select n1,n2,count(*) as 'cnt' from mytable where n1='James' and n2='Smith';

Execution Time: 0.058s
Explain:
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	mytable		ref	composite_index	composite_index	406	const,const	13278	100.00	Using index

select n1,n2,count(*) as 'cnt' from mytable where concat(n1,n2)='JamesSmith' group by concat(n1,n2);

Execution Time: 0.049s
Explain:
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	mytable		ref	n12	n12	403	const	13278	100.00

USING THE UDFs:

select get_count_nonfunctional('James','Smith');

Execution time: 0.164s
Explain:
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE										No tables used

select get_count_functional('JamesSmith');

Execution time: > 30 minutes (killed)
Explain:
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE										No tables used

SELECT INTO USER VARS:

select n1,n2,count(*) as 'cnt' into @var1,@var2,@var3 from mytable where n1='James' and n2='Smith'

Execution time:  0.004s
Explain:
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	mytable		ref	composite_index	composite_index	406	const,const	13278	100.00	Using index

select n1,n2,count(*) as 'cnt' into @var1,@var2,@var3 from mytable where concat(n1,n2)='JamesSmith' group by concat(n1,n2);

Execution time:  0.026s
Explain:
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	mytable		ref	n12	n12	403	const	13278	100.00
[26 Jan 21:29] Robert Roland
UDF using composite index

Attachment: UDF_using_composite_index.txt (text/plain), 334 bytes.

[26 Jan 21:29] Robert Roland
UDF using functional index

Attachment: UDF_using_functional_index.txt (text/plain), 267 bytes.

[26 Jan 21:42] Robert Roland
I have uploaded a dump of mytable to sftp.oracle.com:2021 /support/incoming

mysql-bug-102359-retry-Dump-mytable.7z
[27 Jan 4:53] Robert Roland
I have tried to upload mytable to the sftp.oracle site 11 times over the past 9 hours and it simply won't complete the upload. The connection speed is pretty close to a dial-up modem even though we're fiber optic on this end. I tried vpn's local to the oracle ip address and it's not any better. There is no resume capability once it hangs so I give up trying to send you the file. You will have to create your own version or provide a better alternative than the 1997 bulletin board setup you have.
[27 Jan 7:50] Dmitry Lenev
Hello!

The problem seems to be caused by the fact that query used in the stored function refers to indexed expression using variable and not string literal.

Here is the test case:

CREATE TABLE mytable (pk BIGINT AUTO_INCREMENT PRIMARY KEY,
                      n1 varchar(30) character set utf8mb4 collate utf8mb4_0900_ai_ci,
                      n2 varchar(30) character set utf8mb4 collate utf8mb4_0900_ai_ci,
                      n3 varchar(30) character set utf8mb4 collate utf8mb4_0900_ai_ci);

INSERT INTO mytable (n1, n2, n3) VALUES (RAND(), RAND(), RAND());
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
INSERT INTO mytable (n1, n2, n3) SELECT RAND(), RAND(), RAND() FROM mytable;
SELECT COUNT(*) FROM mytable;
# COUNT(*)
# 524288

ALTER TABLE mytable ADD INDEX n12((concat(n1, n2)));

DELIMITER |;

CREATE PROCEDURE test1 (str VARCHAR(60) CHARSET utf8mb4 COLLATE 'utf8mb4_0900_ai_ci') 
BEGIN
 SELECT n3,COUNT(*) as 'count' from mytable where CONCAT(n1,n2)=str group by n3 limit 1;
END |

CREATE PROCEDURE test2 () 
BEGIN
 SELECT n3,COUNT(*) as 'count' from mytable where CONCAT(n1,n2)='str' group by n3 limit 1;
END |

DELIMITER ;|

SELECT CURRENT_TIME(3);
# CURRENT_TIME(3)
# 10:30:32.093
SELECT n3,COUNT(*) as 'count' from mytable where CONCAT(n1,n2)='str' group by n3 limit 1;
# n3     count
SELECT CURRENT_TIME(3);
# CURRENT_TIME(3)
# 10:30:32.094
CALL test1('str');
# n3     count
SELECT CURRENT_TIME(3);
# CURRENT_TIME(3)
# 10:30:34.578
CALL test2();
# n3     count
SELECT CURRENT_TIME(3);
# CURRENT_TIME(3)
# 10:30:34.580

Notice difference in execution time between direct query case/query in SP that uses string literal case (1-2 milliseconds) and query in SP that uses variable (1-2 seconds).
[27 Jan 7:55] Dmitry Lenev
So I am verifying this as a bug in 8.0.23 (I have also taken the liberty to adjust the bug title to remove confusion between UDFs and Stored Routines).
[27 Jan 13:24] MySQL Verification Team
Thank you, Dmitri ....
[27 Jan 18:48] Robert Roland
I agree with your analysis. Thank you