Bug #119499 Item_view_ref::used_tables() is wrong, leads to wrong query result
Submitted: 3 Dec 15:20
Reporter: Guilhem Bichot Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:9.5.0,8.0.41 OS:Any
Assigned to: CPU Architecture:Any

[3 Dec 15:20] Guilhem Bichot
Description:
In how-to-repeat, the first SELECT reports

+----+-------------------------+
| c0 | coalesce(subq_1.c1,1)=1 |
+----+-------------------------+
| NULL |                       1 |
| NULL |                       1 |
+----+-------------------------+

so the second SELECT, with WHERE comparing COALESCE to 1, should report the same result but it reports an empty set instead.

The bug is in Item_view_ref::used_tables().

How to repeat:
CREATE TABLE t0 (c1 TINYINT);
INSERT INTO t0 VALUES(0),(1);
CREATE TABLE t98 (c3 DECIMAL(10,0) NOT NULL DEFAULT '1317787993');

SELECT subq_1.c1 AS c0, coalesce(subq_1.c1,1)=1
FROM
(SELECT subq_0.c1 AS c1 FROM t0 ref_0 LEFT JOIN
                          (SELECT PI() AS c1 FROM t98 ref_1) subq_0
                          ON FALSE)
subq_1;

SELECT subq_1.c1 AS c0
FROM
(SELECT subq_0.c1 AS c1 FROM t0 ref_0 LEFT JOIN
                         (SELECT PI() AS c1 FROM t98 ref_1) subq_0
                         ON FALSE)
subq_1
where coalesce(subq_1.c1,1)=1;

Suggested fix:
Consider first this simpler query:
SELECT derived.a FROM t1 LEFT JOIN (SELECT PI() AS a FROM t2) AS derived
                            ON FALSE;
with view merging it becomes:
SELECT Y FROM t1 LEFT JOIN t2 ON FALSE;
 
with Y being an Item_view_ref (=derived.a) wrapping the constant PI(), then
'this' is Y and inner_item is PI(), and this->first_inner_table is t2. In
Item_view_ref::val*() functions, we check has_null_row() which itself
tests t2->has_null_row(), which is important so that the query returns
NULL properly and not PI(). That it why this->first_inner_table (t2) must
be included in Item_view_ref::used_tables(), and it properly is.
However, we can have a more "nested" situation:
 
SELECT derived2.a FROM
      (SELECT derived.a FROM t1 LEFT JOIN (SELECT PI() AS a FROM t2) AS derived
                             ON FALSE) AS derived2;
Here we have two successive merging of 'derived' and of 'derived2',
'derived2' is not part of any left join, and thus has a
first_inner_table equal to nullptr. In this case, Item_view_ref::used_tables() has to reach to first_inner_table of the underlying item of derived2.a, which is
derived.a. But Item_view_ref::used_tables() does not do that, alas.
So derived2.a is _not_ said to depend on t2.

Now if we look at the query in how-to-repeat, we have a similar situation (double merging). We miss the dependency of the WHERE on ref_1, so the WHERE is considered constant, so is evaluated during optimization, which is incorrect. At this moment, tables have not been read, so there has been no null-complementing on ref_1 (its first_inner_table->has_null_row() is false), so Item_view_ref::val_real() just returns PI and the '=' test is false, and the result is empty.

If the implementation of Item_view_ref::used_tables() did like Item_view_ref::val*() does, i.e. called this->ref_item()->used_tables() (crucial difference with
current code: it's about calling used_tables() on the the direct child
this->ref_item() instead of on the grand-grand-...-child inner_item),
our problem of "nested" situation would not exist; and it used to do
such direct-child call in previous implementations, but has been changed
in recent versions of MySQL (by ce1965d431a268bc9ad4c8d455ff335af3c80972 ).