Bug #114235 DETERMINISTIC FUNCTIONS can return incorrect results
Submitted: 6 Mar 5:06 Modified: 17 Jun 21:47
Reporter: Yoseph Phillips Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Stored Routines Severity:S1 (Critical)
Version:8.0.22, 8.0.36 OS:Any (Windows 2012 R2 Datacenter)
Assigned to: CPU Architecture:x86
Tags: DETERMINISTIC function, regression

[6 Mar 5:06] Yoseph Phillips
Description:
DETERMINISTIC FUNCTIONS can return incorrect results.

This is an intermittent issue and so not easy to replicate.

The function will randomly start to return the same value irrespective of the value of the passed in parameter.

With multiple database connections open we can see that 1 connection is always returning the same value and the other connections are returning different and correct values for the same function with the same parameter even at the same time.

Somehow MySQL is getting stuck in a state where for an individual connection it starts ignoring the parameter and just returning the same value.

Also interesting to note is that when it gets into this state that queries calling this DETERMINISTIC FUNCTION start taking an exceptionally long time to run as well.

This is a disaster for us as we can't have DETERMINISTIC FUNCTIONS returning incorrect results and we need these functions to be DETERMINISTIC for binary logging, etc.

If the developers are able to direct us to the files in MySQL responsible for returning values from DETERMINISTIC FUNCTIONS then we can look at the code to see if we can spot the errors.      

How to repeat:
We don't have a way to repeat this as the issue is intermittent.

What we do have is the function:

CREATE FUNCTION udf1(par1 INT) RETURNS VARCHAR(100) DETERMINISTIC
BEGIN 
  RETURN (
    SELECT t1.col3 
    FROM table1 t1
    INNER JOIN table2 t2 ON t2.col1 = t1.col2 AND t2.col4 = par1
    ORDER BY t1.col1 DESC 
    LIMIT 1  
  );  
END
[6 Mar 11:27] MySQL Verification Team
Hi Mr. Philips,

Thank you for your bug report.

However, we are not able to repeat what you are reporting to us.

Hence, what we need from you is a fully repeatable test case. That means that we must have the tables involved and the values that you are passing to that function.

We can then run a multi-threaded test case, with exempli gratia 10 concurrent connections and feed those various values in different threads and run it for couple of hours and see if we get a wrong result.

Thanks in advance.
[6 Mar 22:19] Yoseph Phillips
We have not been able to come up with a repeatable test case yet.
Each time when the issue starts it might be the case that records are getting added to either or both of those tables in the query.

Further investigation indicates that a regression might have been introduced into MySQL 8.0.32 and is still there in MySQL 8.0.36. We have different versions of MySQL installed on thousands of environments and we seem to have a few reports of this issue on 8.0.32 and yesterday confirmed that this issue is in 8.0.36 as well.

It seems related to how this function has been written. We have lots of DETERMINISTIC FUNCTIONS and this is the only one where we have seen this issue. The function itself and tables it is using have not changed in many years.

It appears that the connections with the issues are storing an execution plan for this DETERMINISTIC FUNCTION and somehow one of the MySQL optimisations has transformed the query into a non equivalent query or something like that.

We think we need to take the approach of working out exactly how MySQL is executing this query on these connections and which optimisations it is using. We can then compare this to all of these other connections which are executing this correctly.

On a connection that was having an issue it seems that when we created a 2nd function with a different name but the exact same query that the issue went away for the original function as well. It started happening later again yesterday.
[7 Mar 6:09] Yoseph Phillips
Trace of Failing

Attachment: failing.txt (text/plain), 7.20 KiB.

[7 Mar 6:10] Yoseph Phillips
Trace of Working

Attachment: working.txt (text/plain), 18.08 KiB.

[7 Mar 6:14] Yoseph Phillips
We have ran the following on many connections:
SET optimizer_trace = "enabled=on"; 
SET optimizer_trace_max_mem_size = 1024 * 1024 * 16; 
SELECT udf1(26);
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE; 
SET optimizer_trace = "enabled=off";

I have attached the results.
Whenever the results are wrong the output of this is always the same.
Whenever the results are correct the output of this is always the same.
The output of the Working and the Failing are completely different than each other so this should give us a big clue as to what the bug is.
[7 Mar 9:58] Yoseph Phillips
Changed back to Critical.

That output shows that for the failing connections they have completely lost the conditions.
There is no condition_processing section and it is showing "original_condition": null.

The output from SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE; is showing the query correctly in the QUERY column which narrows down the bug.

It is executing the query as though it thought the query was:
SELECT t1.col3 
FROM table1 t1
CROSS JOIN table2 t2 
ORDER BY t1.col1 DESC 
LIMIT 1

which also explains why it takes forever and returns the same result irrespective of the passed in parameter.

This looks like a very serious regression.
[7 Mar 11:14] MySQL Verification Team
Hi Mr. Phillips,

Thank you for your feedback.

However, we must inform you again that this is forum dealing ONLY with reports with a fully repeatable test cases.

You have not provided such a test case, so we can not proceed ....

We have also examined our entire bugs database and we have not found, with our very extensive and sophisticated searching tools, any bug that would be vaguely similar to what you are reporting.

Since this is a forum dealing ONLY with fully repeatable test cases, we can not proceed in diagnosing the bug that you are reporting.

Once we have a fully repeatable test case, we shall be happy to try to repeat it and then we would continue with it's processing.
[7 Mar 18:28] MySQL Verification Team
Hello Yoseph,

Please see the file I've uploaded,   bug114235.sql 

It seems to repeat the bug you described.
Run the sql script in a single thread with nobody else accessing the tables and 
let me know if that is the same issue you see.

The function ends up returning 1010 for whatever input you give it!
If you run the query directly it's fine.

Regards,
-- 
Shane Bester, MySQL Senior Principal Technical Support Engineer
Oracle Corporation
http://dev.mysql.com/
[7 Mar 18:53] MySQL Verification Team
Notice the flush tables in the testcase.  Even after server restart, see this, even on 8.0.27:

mysql> \r
Connection id:    8
Current database: test

mysql> select udf1(73449) ;
+-------------+
| udf1(73449) |
+-------------+
| NULL        |
+-------------+
1 row in set (0.02 sec)

mysql> select udf1(73449) ;
+-------------+
| udf1(73449) |
+-------------+
| NULL        |
+-------------+
1 row in set (0.00 sec)

mysql> select udf1(73449) ;
+-------------+
| udf1(73449) |
+-------------+
| NULL        |
+-------------+
1 row in set (0.00 sec)

mysql> select udf1(73449) ;
+-------------+
| udf1(73449) |
+-------------+
| NULL        |
+-------------+
1 row in set (0.00 sec)

mysql> flush tables;
Query OK, 0 rows affected (0.00 sec)

mysql> select udf1(73449) ;
+-------------+
| udf1(73449) |
+-------------+
| 1010        |           <<<<<<< ??????
+-------------+
1 row in set (0.28 sec)
[7 Mar 19:45] MySQL Verification Team
Yoseph,

When the issue occurs again,  please check if this temporarily resolves it:

"create function f() returns int deterministic return 1;drop function f;"

-- 
Shane Bester, MySQL Senior Principal Technical Support Engineer
Oracle Corporation
http://dev.mysql.com/
[7 Mar 19:59] MySQL Verification Team
deterministic testcase

Attachment: bug114235_new.sql (application/octet-stream, text), 2.44 KiB.

[7 Mar 20:01] MySQL Verification Team
this regression appears to start in 8.0.22.

8.0.11: ok
8.0.21: ok
8.0.22: broken
8.0.23: broken
8.0.24: broken
8.0.25: broken
8.0.26: broken
8.0.27: broken
8.0.36: broken
9.0.0 trunk : broken
[7 Mar 23:03] Yoseph Phillips
Thanks Shane.

I also added:
SET optimizer_trace = "enabled=on"; 
SET optimizer_trace_max_mem_size = 1024 * 1024 * 16; 
SELECT udf1(73449);
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE; 
SET optimizer_trace = "enabled=off";

to your test case after each of the failing and working calls to udf1(73449);
They produce exactly the same failing and working traces which I uploaded, so yes you have managed to come up with a repeatable test case.

The creating and dropping of that function does temporarily solve the issue, but it doesn't take too long for the issue to start again.

This might also explain some other strange things that we saw with clients on 8.0.25. 

Now we need to understand under what circumstances the conditions are ending up as null here. 
I am going to look through the 8.0.22 release notes again.
[7 Mar 23:10] MySQL Verification Team
My best guess is this is another regression caused by WL #9384.  Will await the devs analysis.
[12 Mar 11:13] MySQL Verification Team
* Confirmed that the regression was caused by WL#9384 (commit ca2cd90bffb201b93a29974aa04d46806c466d72)
[7 May 5:21] Yoseph Phillips
This regression still exists in 8.0.37
[9 May 10:05] MySQL Verification Team
Thank you, Mr. Phillips.
[17 Jun 21:47] Jon Stephens
Documented fix as follows in the MySQL 8.0.38, 8.4.1, and 9.0.0 changelogs:

    It was possible for a deterministic stored function to return an
    incorrect result when the function used JOIN ON inside the
    return statement. If the query needed to be reprepared due to a
    table metadata change such as FLUSH TABLES between two
    executions, the ON clause was sometimes lost.

Closed.
[18 Jun 10:11] MySQL Verification Team
Thank you, Jon.