| Bug #86624 | Subquery's RAND() column re-evaluated at every reference | ||
|---|---|---|---|
| Submitted: | 8 Jun 2017 12:52 | Modified: | 9 Jun 2017 9:32 |
| Reporter: | Pavel Sharov | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: DML | Severity: | S2 (Serious) |
| Version: | 5.7.18, 8.0.1 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[8 Jun 2017 13:48]
MySQL Verification Team
Thank you for the bug report.
[9 Jun 2017 6:57]
Roy Lyseng
Hi Pavel, the hint you try does not work for derived tables. To make sure the subquery q is materialized and not merged into the outer query, try this: SELECT /*+ no_merge(q) */ ... FROM (<subquery>) AS q;
[9 Jun 2017 8:51]
Roy Lyseng
Sorry, this hint is unfortunately not implemented in 5.7, only in 8.0. The simplest way to force materialization of the derived table in 5.7 is to add LIMIT <a very high number> to the derived table definition.
[9 Jun 2017 9:32]
Pavel Sharov
Hi Roy, Thanks for your advice. Yes, NO_MERGE hint does help with 8.0. Using LIMIT does help with 5.7. Don't you then consider this behavior a bug any more?
[30 Jun 2020 17:57]
Christopher Granahan
I ran into this problem, as well. Here is another way of seeing it: WITH RECURSIVE one_hundred (N) AS ( SELECT 1 UNION ALL SELECT N+1 FROM one_hundred WHERE N < 100 ), randomized (RANDOM_VALUE) AS ( SELECT RAND() AS SORT_ORDER FROM one_hundred ) SELECT COUNT(*) FROM randomized WHERE RANDOM_VALUE <= 0.1; This query should return 10 on average. Instead, it returns 1 on average, apparently because it is applying the WHERE clause to both the originally generated RAND() value and to the additional value generated in the outer query.
[1 Jul 2020 8:10]
Guilhem Bichot
Hello Mr Granahan. I do get 10 on average.
And EXPLAIN looks correct:
mysql> explain format=tree WITH RECURSIVE one_hundred (N) AS ( SELECT 1 UNION ALL SELECT N+1 FROM one_hundred WHERE N < 100 ), randomized (RANDOM_VALUE) AS ( SELECT RAND() AS SORT_ORDER FROM one_hundred ) SELECT COUNT(*) FROM randomized WHERE RANDOM_VALUE <= 0.1;
| -> Aggregate: count(0)
-> Filter: (rand() <= 0.1)
-> Table scan on one_hundred (cost=2.84 rows=3)
-> Materialize recursive CTE one_hundred
-> Rows fetched before execution (cost=0.00 rows=1)
-> Repeat until convergence
-> Filter: (one_hundred.N < 100) (cost=2.73 rows=1)
-> Scan new records on one_hundred (cost=2.73 rows=2)
Could you please post here: your version of MySQL, and the output of EXPLAIN FORMAT=TREE like above, and also run EXPLAIN FORMAT=TRADITIONAL and immediately after, run SHOW WARNINGS and post all output here?
[1 Jul 2020 13:10]
Christopher Granahan
I have confirmed that the specific case that I described does not affect recent versions; please disregard.

Description: There is a subselect which has a column involving random number generation with RAND(). In the main query this column is referenced more than once, no matter whether in different columns of the main query or within a single one. Every time the column involving RAND() is referenced it is re-evaluated to a new random number. With this, presenting for example square root of a generated number in form of r*r will be inaccurate. MySQL 5.6.36, MariaDB 10.2.2 behave as expected, the RAND() value being fixed. I expect MySQL 5.7, MySQL 8.x to behave the same way. How to repeat: CREATE TABLE t ( `id` BIGINT(20) NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=InnoDB; insert into t values(); insert into t values(); insert into t values(); insert into t values(); insert into t values(); SELECT q.i, q.r, q.r FROM ( SELECT id AS i, (FLOOR(RAND(100) * 4)) AS r FROM t ) q; Expected (MySQL 5.6 behavior, r values are the same): +---+-----+-----+ | i | r | r | +---+-----+-----+ | 1 | 0 | 0 | | 2 | 2 | 2 | | 3 | 3 | 3 | | 4 | 2 | 2 | | 5 | 1 | 1 | +---+-----+-----+ Actual 5.7 behavior (values are different): +---+-----+-----+ | i | r | r | +---+-----+-----+ | 1 | 0 | 2 | | 2 | 3 | 2 | | 3 | 1 | 1 | | 4 | 2 | 1 | | 5 | 2 | 0 | +---+-----+-----+ Note that without the 't' base table in subselect, it works OK: SELECT q.r, q.r FROM ( SELECT (FLOOR(RAND(100) * 4)) AS r ) q; +-----+-----+ | r | r | +-----+-----+ | 0 | 0 | +-----+-----+ (play with the RAND() seed to make sure values are always the same). P.S. /*+ SUBQUERY(MATERIALIZATION) */ does not help.