| Bug #101108 | incorrect derived_merge optimization | ||
|---|---|---|---|
| Submitted: | 9 Oct 2020 18:11 | Modified: | 19 Oct 2021 12:01 |
| Reporter: | Александр Ммммммм | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: DML | Severity: | S2 (Serious) |
| Version: | 8.0.27 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[9 Oct 2020 20:15]
MySQL Verification Team
Thank you for the bug report.
[19 Oct 2021 12:01]
Александр Ммммммм
https://github.com/mysql/mysql-server/blob/da7a1870abe131923ed003e0e9747d2de75a41e5/sql/it... bool Item_field::check_column_from_derived_table(uchar *arg) { TABLE_LIST *tl = pointer_cast<TABLE_LIST *>(arg); if (field->table == tl->table) { // If the expression in the derived table for this column has a subquery // or contains parameters or has non-deterministic result, condition is // not pushed down. // Expressions having subqueries need a more complicated replacement // strategy than the one that currently exists when the condition is // moved to derived table. // Expression having parameters when cloned as part of replacement have // problems to locate the original "?" and therefore will not be able to // get the value. TODO: Lift these two limitations. // Any condition with expressions having non-deterministic result in the // underlying derived table should not be pushed. // For ex: // select * from (select rand() as a from t1) where a >0.5; // Here a > 0.5 if pushed down would result in rand() getting evaluated // twice because the query would then be // select * from (select rand() as a from t1 where rand() > 0.5) which // is not correct. See also Item_func::check_column_from_derived_table Item *item = tl->get_derived_expr(field->field_index()); return (item->has_subquery() || (item->used_tables() & (INNER_TABLE_BIT | RAND_TABLE_BIT))); } return true; }

Description: derived_merge optimization applying for not deterministic functions and it gives an unpredictable, erroneous result How to repeat: set session optimizer_switch = 'derived_merge=on'; SELECT * FROM ( SELECT n , IF(RAND() < 0.5, RAND() * 10000, NULL) res FROM ( SELECT 1 n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 ) t ) t WHERE res IS NOT NULL +---+-------------------+ | n | res | +---+-------------------+ | 1 | NULL | | 2 | NULL | | 4 | NULL | | 5 | 2772.243267644423 | +---+-------------------+ 4 rows in set (0.00 sec) all SQL standarts say, if I make condition WHERE res IS NOT NULL, I must not receive NULL values this comes from derived_merge optimization -> Filter: (if((rand() < 0.5),(rand() * 10000),NULL) is not null) (actual time=0.026..0.027 rows=3 loops=1) -> Table scan on t (actual time=0.001..0.001 rows=5 loops=1) -> Union materialize (actual time=0.019..0.020 rows=5 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) filtered other result than selected! if derived_merge optimization disabled, result will be expected set session optimizer_switch = 'derived_merge=off' ; SELECT * FROM ( SELECT n , IF(RAND() < 0.5, RAND() * 10000, NULL) res FROM ( SELECT 1 n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 ) t ) t WHERE res IS NOT NULL +---+--------------------+ | n | res | +---+--------------------+ | 3 | 7635.638404298237 | | 4 | 7536.478478029816 | | 5 | 4472.3482099104185 | +---+--------------------+ 3 rows in set (0.00 sec) -> Filter: (t.res is not null) (actual time=0.024..0.025 rows=3 loops=1) -> Table scan on t (actual time=0.000..0.000 rows=5 loops=1) -> Materialize (actual time=0.023..0.023 rows=5 loops=1) -> Table scan on t (actual time=0.001..0.001 rows=5 loops=1) -> Union materialize (actual time=0.006..0.007 rows=5 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) -> Rows fetched before execution (actual time=0.000..0.000 rows=1 loops=1) Suggested fix: Don't use derived_merge optimization for non-deterministic expressions