Bug #101101 incorrect interpritate sub resuls
Submitted: 9 Oct 2020 10:19 Modified: 18 Oct 2020 11:20
Reporter: Александр Ммммммм Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.21 OS:Any
Assigned to: CPU Architecture:Any

[9 Oct 2020 10:19] Александр Ммммммм
Description:
incorrect interpritate sub resuls, NOT NULL filter doesn't work on sub result

How to repeat:
SELECT *
FROM (
    WITH RECURSIVE p(n) AS (
        SELECT 1 n
        UNION ALL
        SELECT n + 1 n FROM p WHERE n + 1 <= 10
    )
    , pt(n) AS (
        SELECT 1 n
        UNION ALL
        SELECT n + 1 n FROM p WHERE n + 1 <= 4
    )
    SELECT pt.n
    , IF(RAND() < 0.5, RAND() * 10000, NULL) res
    FROM pt
    CROSS JOIN p
) t
WHERE res IS NOT NULL

+------+--------------------+
| n    | res                |
+------+--------------------+
|    1 |  3999.316807835658 |
|    1 | 3966.6492622035084 |
|    1 |               NULL |
|    1 |               NULL |
|    1 |               NULL |
|    2 |               NULL |
|    2 |               NULL |
|    2 |    7220.2597160081 |
|    2 |               NULL |
|    2 | 4074.6836122820932 |
|    3 | 1571.6528348360703 |
|    3 | 8040.1919298248295 |
|    3 |  7371.835957627591 |
|    3 |               NULL |
|    4 |               NULL |
|    4 |               NULL |
|    4 |  9949.812572402705 |
|    4 |               NULL |
+------+--------------------+
18 rows in set (0.00 sec)

some results are NULL, but I want filter only NOT NULL (WHERE res IS NOT NULL)
[9 Oct 2020 10:27] Александр Ммммммм
this code work incorrect too

CREATE TEMPORARY TABLE pt
WITH RECURSIVE p(n) AS (
    SELECT 1 n
    UNION ALL
    SELECT n + 1 n FROM p WHERE n + 1 <= 4
)
SELECT * FROM p
;
SELECT *
FROM (
    SELECT pt.n
    , IF(RAND() < 0.5, RAND() * 10000, NULL) res
    -- , IF(RAND() < 0.5, RAND() * 10000, -1) res
    FROM pt
) t
WHERE res IS NOT NULL
-- WHERE res != -1
-- WHERE res + 0 IS NOT NULL
;
DROP TEMPORARY TABLE IF EXISTS pt
;
[9 Oct 2020 10:30] Александр Ммммммм
Only this code work correct

CREATE TEMPORARY TABLE pt
WITH RECURSIVE p(n) AS (
    SELECT 1 n
    UNION ALL
    SELECT n + 1 n FROM p WHERE n + 1 <= 10
)
SELECT * FROM p
;
CREATE TEMPORARY TABLE res
SELECT pt.n
, IF(RAND() < 0.5, RAND() * 10000, NULL) res
FROM pt
;
SELECT *
FROM res
WHERE res.res IS NOT NULL
;
DROP TEMPORARY TABLE IF EXISTS pt, res
;
[9 Oct 2020 12:39] MySQL Verification Team
Hi Mr. Mmmmmmm,

Thank you for your bug report.

However, this is not a bug.

If you read our Reference Manual, sub-subchapter 13.2.15, you will see couple of examples that show that visibility of the resources in table expressions is tied to those expressions themselves. 

Hence, the only way to do it properly is to write a query, exactly as you did in your last example.

Not a bug.
[9 Oct 2020 15:17] Александр Ммммммм
I did not find a description of this behavior in the documentation, here
https://dev.mysql.com/doc/refman/8.0/en/with.html

behavior is very unexpected, when wrapping query over query, you expect that WHERE the conditions to be met.
By the way, DISTINCT materializes the request and it starts working as expected

With DISTINCT it works correctly, without - no

SELECT *
FROM (
    WITH RECURSIVE p(n) AS (
        SELECT 1 n
        UNION ALL
        SELECT n + 1 n FROM p WHERE n + 1 <= 10
    )
    SELECT DISTINCT p.n
    , IF(RAND() < 0.5, RAND() * 10000, NULL) res
    FROM p
) t
WHERE res IS NOT NULL

-> Filter: (t.res is not null)  (actual time=0.052..0.053 rows=8 loops=1)
    -> Table scan on t  (actual time=0.000..0.001 rows=10 loops=1)
        -> Materialize  (actual time=0.051..0.051 rows=10 loops=1)
            -> Table scan on <temporary>  (actual time=0.000..0.001 rows=10 loops=1)
                -> Temporary table with deduplication  (actual time=0.036..0.037 rows=10 loops=1)
                    -> Table scan on p  (actual time=0.000..0.001 rows=10 loops=1)
                        -> Materialize recursive CTE p  (actual time=0.016..0.017 rows=10 loops=1)
                            -> Rows fetched before execution  (actual time=0.000..0.000 rows=1 loops=1)
                            -> Repeat until convergence
                                -> Filter: ((p.n + 1) <= 10)  (cost=2.73 rows=2) (actual time=0.002..0.003 rows=4 loops=2)
                                    -> Scan new records on p  (cost=2.73 rows=2) (actual time=0.001..0.001 rows=5 loops=2)

SELECT *
FROM (
    WITH RECURSIVE p(n) AS (
        SELECT 1 n
        UNION ALL
        SELECT n + 1 n FROM p WHERE n + 1 <= 10
    )
    SELECT p.n
    , IF(RAND() < 0.5, RAND() * 10000, NULL) res
    FROM p
) t
WHERE res IS NOT NULL

-> Filter: (if((rand() < 0.5),(rand() * 10000),NULL) is not null)  (actual time=0.025..0.027 rows=6 loops=1)
    -> Table scan on p  (actual time=0.000..0.001 rows=10 loops=1)
        -> Materialize recursive CTE p  (actual time=0.020..0.021 rows=10 loops=1)
            -> Rows fetched before execution  (actual time=0.000..0.000 rows=1 loops=1)
            -> Repeat until convergence
                -> Filter: ((p.n + 1) <= 10)  (cost=2.73 rows=2) (actual time=0.001..0.002 rows=4 loops=2)
                    -> Scan new records on p  (cost=2.73 rows=2) (actual time=0.001..0.001 rows=5 loops=2)

Everything became clear! MySQL, rewrites the query, pushes the expression and substitutes it in WHERE
Filter: (if((rand() < 0.5),(rand() * 10000),NULL) is not null)
CTE has nothing to do with it!

May be don't apply such optimization to non-deterministic functions?
[12 Oct 2020 12:05] MySQL Verification Team
HI,

This is, truly, not a bug.

There is an example in the reference that I quoted you, but there are other reasons. Resources from upper queries are available to nested queries, but not the other way around.

Not a bug.
[14 Oct 2020 15:12] Александр Ммммммм
it is an bug anyway if I restrict subquery "res IS NOT NULL" mysql must not select rows where res IS NULL
[15 Oct 2020 11:44] Александр Ммммммм
ISO/IEC 9075:1992
https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
7.6 <where clause>
...
<where clause> ::= WHERE <search condition>
...
1) The <search condition> is applied to each row of T. The result
of the <where clause> is a table of those rows of T for which
the result of the <search condition> is true.
[15 Oct 2020 12:33] MySQL Verification Team
Hi,

This is not a bug.

We repeat one more time that this is a question of visibility, not a question of the syntax.
[18 Oct 2020 11:20] Александр Ммммммм
Is is not a question of visibility and not a question of the syntax. This is incorrect operation of the database. The standards clearly say that I should only get rows where <search condition> is true, but I receive rows where <search condition> is false, this is against the SQL92 standard