Bug #86922 Outer join optimized away with user-defined functions
Submitted: 4 Jul 2017 10:31 Modified: 1 Dec 2017 15:31
Reporter: Roy Lyseng Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[4 Jul 2017 10:31] Roy Lyseng
Description:
Given the following conditions:

- A (left) outer join between two table t1 and t2

- A user-defined (deterministic) function that takes input argument(s)

- A WHERE predicate that references the function with an argument from the inner table of the outer join

Then, optimizer may turn outer join into an inner join and hence a few rows may disappear from the result.

Workaround: Wrap the argument in ifnull: ifnull(<arg>, NULL)

A possible cause is that all SQL functions are implicitly classified as so-called "null-called" functions, ie they behave as the clause RETURNS NULL ON NULL INPUT has been specified (MySQL does not explicitly allow this, but SQL standard does). This clause implies that a function that is supplied a NULL value also returns a NULL value, and hence when used in e.g a WHERE clause will always return FALSE, and consequently can be used to optimize an outer join to an inner join.

How to repeat:
create table t00(a int);
insert into t00 values(1),(2);
create table t01(a int);
insert into t01 values(1);
create view v0 as select t00.a, t01.a as b, ifnull(t01.a, 666) as c from t00 left join t01 using(a);
select * from v0 where c is not null;

create function f(a integer) returns integer deterministic
return ifnull(a, 666);

create view v1 as select t00.a, t01.a as b, f(t01.a) as c from t00 left join t01 using(a);
select * from v1 where c >= 0;

create view v2 as select t00.a, t01.a as b, f(ifnull(t01.a, null)) as c from t00 left join t01 using(a);
select * from v2 where c >= 0;

select t00.a, t01.a as b, f(t01.a) as c from t00 left join t01 using(a) where f(t01.a) >= 0;

drop function f;
drop view v0, v1, v2;
drop table t00, t01;

Suggested fix:
Implement RETURNS NULL ON NULL INPUT for SQL functions.
Modify not_null_tables handling of items that represent SQL functions.
[4 Sep 2017 11:21] Abhishek Ranjan
Posted by developer:
 
Bug#83957 implements a workaround of this bug in the code to create information schema views.
When this bug is pushed, the work around should be removed. 
One of test case based on trunk is follows which returns empty result set in 2nd query:

SET information_schema_stats=latest;
CREATE TABLE t1 (a int, PRIMARY KEY (a)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1),(2),(3),(4),(5),(6),(7),(8);
SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES WHERE CREATE_TIME IS NOT NULL AND TABLE_NAME='t1';
SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES WHERE CREATE_TIME IS NOT NULL AND UPDATE_TIME IS NOT NULL AND TABLE_NAME='t1';
DROP TABLE t1;
[1 Dec 2017 15:31] Jon Stephens
Documented fix in the MySQL 8.0.4 changelog, as follows:

    When a stored function is used with a table column value as an
    argument in a WHERE predicate, its internal not_null_tables
    property is falsely set to a non-empty value. If this predicate
    is applied to an outer join operation and one of the arguments
    is from an inner table of the outer join, the predicate may be
    used to convert the outer join to an inner join. According to
    the SQL standard, only functions that have the RETURNS NULL ON
    NULL INPUT property should behave in that manner. Since MySQL
    does not currently implement this property, stored functions are
    changed such that they no longer implement the RETURNS NULL ON
    NULL INPUT behavior.

Closed.