Bug #86220 Query returns wrong results when using UDF in view
Submitted: 8 May 2017 16:35 Modified: 5 Nov 2018 21:01
Reporter: Oleksii Vynnychenko Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.18 OS:Any
Assigned to: CPU Architecture:Any

[8 May 2017 16:35] Oleksii Vynnychenko
Description:
There's seem to be a bug in query execution if sub-query is using DB view that in turn uses some UDF.

I've tried it on MySQL 5.7.15, 5.7.17, 5.7.18 - all give the same wrong result on Win 10. Connector is not the issue either - same result in Mysql Workbench 6.3 and DataGrip 2017.1.1 (MySQL connector/J 5.1.40).

It would be very nice if you could fix that so UDFs can be used in views. Not being able to use them (UDFs) makes views very unreadable and hard to maintain, not to mention code reusing and so on...

How to repeat:
Here's a SQL listing that will reproduce the problem - just copy-paste and execute, explanation is at the end.

CREATE DATABASE bug_mysql;
USE bug_mysql;
# -----------------------------------------------------------------------------------------------
# creating tables
DROP TABLE IF EXISTS accounts;
CREATE TABLE IF NOT EXISTS accounts
(
    id       INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(500) DEFAULT '' NOT NULL,
    name     VARCHAR(500)            NULL
);
INSERT INTO accounts (id, username, name) VALUES (1, 'username1', 'Name1');
INSERT INTO accounts (id, username, name) VALUES (2, 'username2', 'Name2');
INSERT INTO accounts (id, username, name) VALUES (3, 'username3', 'Name3');
INSERT INTO accounts (id, username, name) VALUES (4, 'username4', 'Name4');
INSERT INTO accounts (id, username, name) VALUES (5, 'username5', 'Name5');
INSERT INTO accounts (id, username, name) VALUES (6, 'username6', 'Name6');
INSERT INTO accounts (id, username, name) VALUES (7, 'username7', 'Name7');

DROP TABLE IF EXISTS custom_fields;
CREATE TABLE IF NOT EXISTS custom_fields
(
    id          INT AUTO_INCREMENT PRIMARY KEY,
    field_type  VARCHAR(50)   NOT NULL,
    properties  VARCHAR(4000) NULL,
    strict_name VARCHAR(250)  NULL
);
INSERT INTO custom_fields (id, field_type, properties, strict_name) VALUES (1, 'isMemberOfTeam', '{"id": 1}', 'memberOfTeam');

DROP TABLE IF EXISTS custom_field_values;
CREATE TABLE IF NOT EXISTS custom_field_values
(
    id               INT AUTO_INCREMENT PRIMARY KEY,
    custom_fields_id INT           NOT NULL,
    owner_id         INT           NOT NULL,
    value            VARCHAR(4000) NULL
);

DROP TABLE IF EXISTS team_definitions;
CREATE TABLE IF NOT EXISTS team_definitions
(
    id          INT AUTO_INCREMENT PRIMARY KEY,
    strict_name VARCHAR(250) NULL
);
INSERT INTO team_definitions (id, strict_name) VALUES (1, 'team1');

DROP TABLE IF EXISTS team_entries;
CREATE TABLE IF NOT EXISTS team_entries
(
    id                 INT AUTO_INCREMENT PRIMARY KEY,
    type               VARCHAR(50),
    member_id          INT NOT NULL,
    team_definition_id INT NOT NULL
);
INSERT INTO team_entries (id, type, member_id, team_definition_id) VALUES (1, 'account', 1, 1);
INSERT INTO team_entries (id, type, member_id, team_definition_id) VALUES (2, 'account', 2, 1);
INSERT INTO team_entries (id, type, member_id, team_definition_id) VALUES (3, 'account', 3, 1);

# -----------------------------------------------------------------------------------------------
# creating TD view
CREATE OR REPLACE VIEW team_definitions_to_accounts AS
    SELECT
        td2a.id         AS id,
        td2a.account_id AS account_id
    FROM (
        SELECT
            td.id         AS id,
            (
                CASE te.type
                    WHEN 'account' THEN te.member_id
                    WHEN 'exclude_account' THEN te.member_id
                    ELSE NULL
                END
            ) AS account_id,
            if((te.type = 'exclude_account'), 0, 1) AS include_in_result
        FROM team_definitions td
            JOIN team_entries te ON te.team_definition_id = td.id
    ) td2a
    GROUP BY td2a.id, td2a.account_id
    HAVING (bit_and(td2a.include_in_result) <> 0);

# -----------------------------------------------------------------------------------------------
# creating routine
DROP FUNCTION IF EXISTS IsMemberOfTeam;
CREATE FUNCTION IsMemberOfTeam(accountId INT, teamDefinitionId INT)
    RETURNS VARCHAR(4000)
    BEGIN
		IF exists(
			SELECT id
			FROM team_definitions_to_accounts td2a
			WHERE td2a.id = teamDefinitionId AND td2a.account_id = accountId
		)
		THEN RETURN 'true';
		ELSE RETURN 'false';
		END IF ;
	END;

# -----------------------------------------------------------------------------------------------
# creating bugged view
CREATE OR REPLACE VIEW bugged_db_view AS
    SELECT
        a.id           AS a_id,
        a.username     AS username,
        cf.id          AS id,
        cf.strict_name AS strict_name,
        cfv.owner_id   AS value_owner_id,
        (
            CASE cf.field_type
                WHEN 'isMemberOfTeam'
                    THEN IsMemberOfTeam(a.id, cf.properties->>'$.id')
                ELSE cfv.`value`
            END
        ) AS `value`
    FROM accounts a
        JOIN custom_fields cf
        LEFT JOIN custom_field_values cfv ON cf.id = cfv.custom_fields_id AND a.id = cfv.owner_id;

# -----------------------------------------------------------------------------------------------
# creating NOT bugged view
CREATE OR REPLACE VIEW not_bugged_db_view AS
    SELECT
        a.id           AS a_id,
        a.username     AS username,
        cf.id          AS id,
        cf.strict_name AS strict_name,
        cfv.owner_id   AS value_owner_id,
        (
            CASE cf.field_type
                WHEN 'isMemberOfTeam'
                    THEN if(json_valid(cf.properties) AND exists(
                        SELECT id
                        FROM team_definitions_to_accounts td2a
                            WHERE td2a.account_id = a_id
                        ), 'true', 'false'
                    )
                ELSE cfv.`value`
            END
        ) AS `value`
    FROM accounts a
        JOIN custom_fields cf
        LEFT JOIN custom_field_values cfv ON cf.id = cfv.custom_fields_id AND a.id = cfv.owner_id;

# -----------------------------------------------------------------------------------------------
# reproducing bug
SELECT
    *
FROM accounts a
    WHERE id IN (
        SELECT bv.a_id
        FROM bugged_db_view bv
        WHERE bv.strict_name = 'memberOfTeam' AND bv.value = 'true'
    );

# inner sub-query
SELECT bv.a_id
FROM bugged_db_view bv
    WHERE bv.strict_name = 'memberOfTeam' AND bv.value = 'true';

# EXPANATION:
# 1. Execute inner sub-query (SELECT bv.a_id FROM bugged_db_view bv WHERE bv.strict_name = 'memberOfTeam' AND bv.value = 'false');
#   You'll see that it returns accounts (1,2,3) which is right. But if you execute whole query with (SELECT * FROM accounts WHERE id IN sub_query_of_123) - it will return all accounts that exists in a table (accounts 1,2,3,4,5,6,7),#   which is totally wrong and is a bug.
# 2. Interesting thing is that if you use not 'bugged_db_view' for inner sub-query but 'NOT_bugged_db_view' (the only difference is that function IsMemberOfTeam is 'inlined' so to speak) - everything works as it should.
#   I guess calling UDF somehow messes up that query execution.
[10 May 2017 17:01] MySQL Verification Team
Hi!

First of all, IsMemberOfTeam is not UDF, but a STORED FUNCTION.

The reason why you get wrong results is that you have not set its attributes correctly.

Can you please set the stored routine IsMemberOfTeam to be DETERMINISTIC and that it READS SQL DATA and then try again ???

This time you should get the same results.

Please, let us know if this has helped.
[10 May 2017 17:46] Oleksii Vynnychenko
Hi

Right, SF, not UDF, my bad.

It actually helped - specifying as DETERMINISTIC, so far results are as should be. Thank you very much!

Although I don't understand why is it so? In manual it says:

--------------------
Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.
--------------------
https://dev.mysql.com/doc/refman/5.7/en/create-procedure.html

From what is written above NOT DETERMINISTIC (default) can only harm performance if function actually is deterministic because less optimization and so on, but result will always be right. Only the other way around can produce false results.
Am I understanding it wrong? Why SF needs to be specified as DETERMINISTIC to return 'right' results?

Specifying READS SQL DATA alone (without DETERMINISTIC) didn't affect the execution and it returned 'wrong' results like without specifying it. Specifying DETERMINISTIC on a routine alone made that query return 'right' results.
[10 May 2017 18:33] Oleksii Vynnychenko
To add to previous one... What if my function was like that (lets imagine that's the only change to listing from the first post):

DROP FUNCTION IF EXISTS IsMemberOfTeam;
CREATE FUNCTION IsMemberOfTeam(accountId INT, teamDefinitionId INT)
    RETURNS VARCHAR(4000)
    NOT DETERMINISTIC
    READS SQL DATA
    BEGIN
		IF exists(
			SELECT id
			FROM team_definitions_to_accounts td2a
			WHERE td2a.id = teamDefinitionId AND td2a.account_id = accountId AND now() < '2018-01-01'
		)
		THEN RETURN 'true';
		ELSE RETURN 'false';
		END IF ;
	END;

I've specified READS SQL DATA and it's NOT DETERMINISTIC because I've added additional condition in WHERE - { now() < '2018-01-01' }, so it's not deterministic because of NOW() function call (am I right?).
And I'm getting same problem results again - sub-query returns 1,2,3 accounts (as expected) and whole query returns everything from table (1,2,3,4,5,6,7) accounts (not what I've expected). Interesting thing is that specifying that function with NOW() call as DETERMINISTIC makes the whole query return 'right' (1,2,3) accounts, but from reading manual it's not right to do that, because in 2018 year and later it won't return anything.
How to make the query work now with this SF?
Can you shed the light on how is it so?

Thank you
[11 May 2017 12:20] MySQL Verification Team
Hi!

I actually agree with you.

This requires additional explanations in the manual.

Verified as a documentation bug.
[11 May 2017 12:34] Oleksii Vynnychenko
Hi

But what about my last comment? What if function uses non-deterministic features, like in my last comment I've added as an example new condition to sub-query { AND now() < '2018-01-01' } ?
Are not deterministic (not marked as DETERMINISTIC) functions should not be used in views at all because it may produce unexpected results at some point? 

Thank you
[11 May 2017 13:54] MySQL Verification Team
It will be explained in the manual. This is some intricate interaction between the stored routine definition and the optimizer. It will take some time to be explained.
[5 Nov 2018 14:34] MySQL Verification Team
Hi,

This has turned out to be a bug in our code. Furthermore, it is a regression bug.

Thank you for your contribution.
[5 Nov 2018 15:31] Paul DuBois
Also: Potential workaround to get correct results:

set optimizer_switch='derived_merge=off';
[5 Nov 2018 21:01] Oleksii Vynnychenko
Thank you for the info, good to know it eventually will be fixed.

Internally we've abandoned the idea of using SF in views because of this and used another approach, but thank you for heads-up on the issue.