Bug #86624 Subquery's RAND() column re-evaluated at every reference
Submitted: 8 Jun 2017 12:52 Modified: 9 Jun 2017 9:32
Reporter: Pavel Sharov Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.7.18, 8.0.1 OS:Any
Assigned to: CPU Architecture:Any

[8 Jun 2017 12:52] Pavel Sharov
Description:
There is a subselect which has a column involving random number generation with RAND(). In the main query this column is referenced more than once, no matter whether in different columns of the main query or within a single one. Every time the column involving RAND() is referenced it is re-evaluated to a new random number. With this, presenting for example square root of a generated number in form of r*r will be inaccurate. MySQL 5.6.36, MariaDB 10.2.2 behave as expected, the RAND() value being fixed. I expect MySQL 5.7, MySQL 8.x to behave the same way.

How to repeat:
CREATE TABLE t (
`id` BIGINT(20) NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=InnoDB;

insert into t values();
insert into t values();
insert into t values();
insert into t values();
insert into t values();

SELECT q.i, q.r, q.r
FROM (
   SELECT
      id AS i,
      (FLOOR(RAND(100) * 4)) AS r
   FROM t
) q;

Expected (MySQL 5.6 behavior, r values are the same):

+---+-----+-----+
| i |  r  |  r  |
+---+-----+-----+
| 1 |  0  |  0  |
| 2 |  2  |  2  |
| 3 |  3  |  3  |
| 4 |  2  |  2  |
| 5 |  1  |  1  |
+---+-----+-----+

Actual 5.7 behavior (values are different):

+---+-----+-----+
| i |  r  |  r  |
+---+-----+-----+
| 1 |  0  |  2  |
| 2 |  3  |  2  |
| 3 |  1  |  1  |
| 4 |  2  |  1  |
| 5 |  2  |  0  |
+---+-----+-----+

Note that without the 't' base table in subselect, it works OK:

SELECT q.r, q.r
FROM (
   SELECT (FLOOR(RAND(100) * 4)) AS r
) q;

+-----+-----+
|  r  |  r  |
+-----+-----+
|  0  |  0  |
+-----+-----+

(play with the RAND() seed to make sure values are always the same).

P.S. /*+ SUBQUERY(MATERIALIZATION) */ does not help.
[8 Jun 2017 13:48] MySQL Verification Team
Thank you for the bug report.
[9 Jun 2017 6:57] Roy Lyseng
Hi Pavel,

the hint you try does not work for derived tables. To make sure the subquery q is materialized and not merged into the outer query, try this:

SELECT /*+ no_merge(q) */ ... FROM (<subquery>) AS q;
[9 Jun 2017 8:51] Roy Lyseng
Sorry, this hint is unfortunately not implemented in 5.7, only in 8.0.

The simplest way to force materialization of the derived table in 5.7 is to add LIMIT <a very high number> to the derived table definition.
[9 Jun 2017 9:32] Pavel Sharov
Hi Roy,

Thanks for your advice.
Yes, NO_MERGE hint does help with 8.0. 
Using LIMIT does help with 5.7.
Don't you then consider this behavior a bug any more?
[30 Jun 2020 17:57] Christopher Granahan
I ran into this problem, as well. Here is another way of seeing it:

WITH RECURSIVE one_hundred (N) AS (
  SELECT 1
  UNION ALL
  SELECT N+1 FROM one_hundred WHERE N < 100
),
randomized (RANDOM_VALUE) AS (
  SELECT RAND() AS SORT_ORDER
  FROM one_hundred
)
SELECT COUNT(*)
FROM randomized
WHERE RANDOM_VALUE <= 0.1;

This query should return 10 on average. Instead, it returns 1 on average, apparently because it is applying the WHERE clause to both the originally generated RAND() value and to the additional value generated in the outer query.
[1 Jul 2020 8:10] Guilhem Bichot
Hello Mr Granahan. I do get 10 on average.
And EXPLAIN looks correct:
mysql> explain format=tree WITH RECURSIVE one_hundred (N) AS (   SELECT 1   UNION ALL   SELECT N+1 FROM one_hundred WHERE N < 100 ), randomized (RANDOM_VALUE) AS (   SELECT RAND() AS SORT_ORDER   FROM one_hundred ) SELECT COUNT(*) FROM randomized WHERE RANDOM_VALUE <= 0.1;

| -> Aggregate: count(0)
    -> Filter: (rand() <= 0.1)
        -> Table scan on one_hundred  (cost=2.84 rows=3)
            -> Materialize recursive CTE one_hundred
                -> Rows fetched before execution  (cost=0.00 rows=1)
                -> Repeat until convergence
                    -> Filter: (one_hundred.N < 100)  (cost=2.73 rows=1)
                        -> Scan new records on one_hundred  (cost=2.73 rows=2)

Could you please post here: your version of MySQL, and the output of EXPLAIN FORMAT=TREE like above, and also run EXPLAIN FORMAT=TRADITIONAL and immediately after, run SHOW WARNINGS and post all output here?
[1 Jul 2020 13:10] Christopher Granahan
I have confirmed that the specific case that I described does not affect recent versions; please disregard.