Bug #72447 Using order by RAND() inside query in a join produces unexpected results
Submitted: 25 Apr 2014 0:21 Modified: 29 Apr 2014 0:38
Reporter: William Ferrer Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.5/5.6 OS:Other (Debian Wheezy)
Assigned to: CPU Architecture:Any
Tags: join, order by, Rand()

[25 Apr 2014 0:21] William Ferrer
Description:
Hi

I have been working on a query that joins to 1 row in a table based on priority that is multiplied by a RAND() function call.

I am seeing this producing very strange results. The query I have should only be able to return 1 row, however it is currently returning between 0 and 2 rows. Other tests I have done with this also have shown the results of the nulls being returned when they should not be able to.

Please see schema and query below.

Normally I would post something like this on stackoverflow but it seems very much like a bug to me so I am posting it here.

I will be working on a work around for my code involving a stored procedure with 2 queries.

Thank you for making mysql.

All the best.

Will Ferrer

Run the Business LLC

How to repeat:
Here is a very basic set of tables I made to illustrate the issue

delimiter $$

CREATE TABLE `parent` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1$$

INSERT INTO `test`.`parent`
(`id`)
VALUES
(
1
);

delimiter $$

CREATE TABLE `child` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL,
  `priority` decimal(16,8) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_parent_id_idx` (`parent_id`),
  CONSTRAINT `fk_parent_id` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1$$

INSERT INTO `test`.`child`
(`id`,
`parent_id`,
`priority`)
VALUES
(
1,
1,
1
), (
2,
1,
1
);

Then run this query

SELECT p.id, c.id
	FROM `parent` p
	INNER JOIN `child` c 
		ON c.id = (
			SELECT c2.id
			FROM `child` c2
			WHERE c2.parent_id = p.id
			ORDER BY RAND() * c2.priority
			LIMIT 1
		)
	WHERE p.id = 1

Suggested fix:
The expected behavior of this query would be that with the fixtures I provided it would always return 1 row, randomly selected from the child table.
[25 Apr 2014 0:23] William Ferrer
I should also mention that the query: SELECT c2.id
			FROM `child` c2
			WHERE c2.parent_id = 1
			ORDER BY RAND() * c2.priority
			LIMIT 1
Works perfectly on it's own and if you change the full query in my example to use a static number instead: SELECT p.id, c.id
	FROM `parent` p
	INNER JOIN `child` c 
		ON c.id = (
			SELECT c2.id
			FROM `child` c2
			WHERE c2.parent_id = p.id
			ORDER BY 2 * c2.priority
			LIMIT 1
		)
	WHERE p.id = 1

It also works. The part of the query that creates the issue appears to be the use of RAND().
[25 Apr 2014 9:52] Hartmut Holzgraefe
The secret is in the EXPLAIN plan:

+------+----------------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------+
| id   | select_type          | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                           |
+------+----------------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------+
|    1 | PRIMARY              | p     | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index                     |
|    1 | PRIMARY              | c     | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using where; Using index        |
|    2 | UNCACHEABLE SUBQUERY | c2    | ALL   | NULL          | NULL    | NULL    | NULL  |    2 | Using temporary; Using filesort |
+------+----------------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------+

As the subquery is uncachable it is executed for every parent/child pair,
and so chances are that when checking pair 1/1 it returns 2 while when checking pair 1/2 it returns 1, so both times filtering out the result and ending up with empty set, or matching both times, so producing two matches
[25 Apr 2014 10:00] Hartmut Holzgraefe
By the way, why the subquery construct at all?

Couldn't you get the same desired result from just this?

SELECT p.id, c.id 
  FROM parent p 
  JOIN child c 
 WHERE p.id = 1 
 ORDER BY RAND()*c.priority 
 LIMIT 1;
[25 Apr 2014 12:37] MySQL Verification Team
C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --debug-info --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.19 Source distribution

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > use test
Database changed
mysql 5.6 > SELECT p.id, c.id
    ->  FROM `parent` p
    ->  INNER JOIN `child` c
    ->          ON c.id = (
    ->                  SELECT c2.id
    ->                  FROM `child` c2
    ->                  WHERE c2.parent_id = p.id
    ->                  ORDER BY RAND() * c2.priority
    ->                  LIMIT 1
    ->          )
    ->  WHERE p.id = 1;
+----+----+
| id | id |
+----+----+
|  1 |  1 |
|  1 |  2 |
+----+----+
2 rows in set (0.00 sec)

mysql 5.6 > SELECT p.id, c.id
    ->  FROM `parent` p
    ->  INNER JOIN `child` c
    ->          ON c.id = (
    ->                  SELECT c2.id
    ->                  FROM `child` c2
    ->                  WHERE c2.parent_id = p.id
    ->                  ORDER BY RAND() * c2.priority
    ->                  LIMIT 1
    ->          )
    ->  WHERE p.id = 1;
Empty set (0.00 sec)

mysql 5.6 > SELECT p.id, c.id
    ->  FROM `parent` p
    ->  INNER JOIN `child` c
    ->          ON c.id = (
    ->                  SELECT c2.id
    ->                  FROM `child` c2
    ->                  WHERE c2.parent_id = p.id
    ->                  ORDER BY RAND() * c2.priority
    ->                  LIMIT 1
    ->          )
    ->  WHERE p.id = 1;
+----+----+
| id | id |
+----+----+
|  1 |  1 |
|  1 |  2 |
+----+----+
2 rows in set (0.00 sec)

mysql 5.6 > SELECT p.id, c.id
    ->  FROM `parent` p
    ->  INNER JOIN `child` c
    ->          ON c.id = (
    ->                  SELECT c2.id
    ->                  FROM `child` c2
    ->                  WHERE c2.parent_id = p.id
    ->                  ORDER BY RAND() * c2.priority
    ->                  LIMIT 1
    ->          )
    ->  WHERE p.id = 1;
Empty set (0.00 sec)

mysql 5.6 > SELECT p.id, c.id
    ->  FROM `parent` p
    ->  INNER JOIN `child` c
    ->          ON c.id = (
    ->                  SELECT c2.id
    ->                  FROM `child` c2
    ->                  WHERE c2.parent_id = p.id
    ->                  ORDER BY RAND() * c2.priority
    ->                  LIMIT 1
    ->          )
    ->  WHERE p.id = 1;
+----+----+
| id | id |
+----+----+
|  1 |  2 |
+----+----+
1 row in set (0.00 sec)

mysql 5.6 >
[25 Apr 2014 13:11] MySQL Verification Team
Thank you for the bug report.
[28 Apr 2014 14:04] Hartmut Holzgraefe
Yes, it is reproducible, but I'd still say "expected behavior, not a bug"
[28 Apr 2014 16:03] MySQL Verification Team
I have to agree here with Hartmut Holzgraefe. 

What we have here is what ANSI SQL99 standard calls "Correlated subquery". To cite from the book written by the SQL standard authors, it also says that " DBMS can't evaluate the subquery without feeding it values from outside of the subquery. Correlation can be messy ....., but sometimes there is no alternative". 

The above rule remained unchanged in ANSI SQL 2011, according to its paragraph IWD 9075-9:2016(E).

That means that the subquery is uncacheable, which is why you get something a result set with 2 rows and sometimes an empty result set.
[28 Apr 2014 22:38] William Ferrer
Hi Hartmut, Godofredo and Sinisa 

Thank you all very very much for your help in understanding the situation.

I had understood that I was using an uncachable subquery but I am still a bit hazy as to why that results in a varying number of rows in the result.

The RAND() is in the ORDER BY for the subquery, but the subquery it's self has a limit of 1. So due to that limit it seems it should only ever return 1 row, whether the subquery is cachable or not. If the subquery did return more than 1 row it would cause an error because the syntax: "ON c.id = (<result of the subquery>)" would error if c.id was being compared using an equals sign against more than one row -- it would need to use the IN syntax if more than 1 row was returned by the subquery.

For instance see this query (with a limit of 2 in the subquery)

SELECT p.id, c.id
	FROM `parent` p
	INNER JOIN `child` c 
		ON c.id = (
			SELECT c2.id
			FROM `child` c2
			WHERE c2.parent_id = p.id
			ORDER BY RAND() * c2.priority
			LIMIT 2
		)
	WHERE p.id = 1

Gives this error:

Error Code: 1242 Subquery returns more than 1 row

Also you mentioned "chances are that when checking pair 1/1 it returns 2 while when checking pair 1/2 it returns 1, so both times filtering out the result and ending up with empty set, or matching both times, so producing two matches" however since the RAND is in the ORDER BY it shouldn't be involved in any filtering, instead it is used in ordering the results of the subquery instead of filtering (which would take place in a where or having). It seems to me as though the subqery should evaluate all the rows it could join to in the child table, order them randomly, and then cull the result of the subquery down to just 1 row (never 2 or 0).

The reason I am using this style of syntax (instead of the style recommended in Hartmut's response where there is no subquery used) is because the actual queries where I encountered this issue are very complex. The example I provided was simplified to illustrate the issue.

I would post my original queries for illustration (and will happily do so if it would be helpful) but they are very long and involved and I wouldn't want to monopolize your time trying to parse it.

For my own purposes my original queries were part of stored procedures used by our telephony services company to select information used in call ratings and lcr. I was able to cut the 2 very long queries I had used (each in it's own stored procedure -- one for inbound calls and one for out bound call) each into 3 queries inside the stored procedures, where many of the values are selected into variables and passed between the 3 queries and this resulted in a usable work around for me.

Much appreciation to you all for your help and advice.

I hope this message finds you well.

All the best

Will Ferrer
[28 Apr 2014 23:12] Hartmut Holzgraefe
You are joining parent and child tables.

For every parent/child combination the ON subquery is evaluated.

So you end up with two parent/child combinations here that need to be checked: parent 1 / child 1 and parent 1 / child 2. And for each of these the subquery is evaluated. The subquery will produce a value of 1 or 2, and is called for every parent/child combination separately.

So you end up with 4 possible cases:

1) parent / child / subquery_result

      1       1        1    -> result row
      1       2        1    -> not a result row

2) parent / child / subquery result

      1       1        1    -> result row
      1       2        2    -> result row

3) parent / child /subquery result

      1       1        2    -> not a result row
      1       2        1    -> not a result row

4) parent / child / subquery result

      1       1        2    -> not a result row
      1       2        2    -> result row

So depending on the actual RAND() results you will end up with 1 result row in cases 1 and 4, with 2 result rows in case 2 an with an empty result set in case 3
[28 Apr 2014 23:21] Hartmut Holzgraefe
Hm ... tested things on PostgreSQL now ...

With the final "WHERE p.id = 1" clause it indeed returns only one row ...

Even more confusing though: when removing the WHERE clause it returns either zero or one rows (but never two) ...

I think the MySQL result is the correct one (and one of the two PostgreSQL result variants is wrong for sure) ... but need to look into this some more tomorrow ...
[29 Apr 2014 0:03] William Ferrer
Hi Hartmut 

Thank you very much for that breakdown of whats going on. That clarified the logic for me nicely.

I had not expected every combination of parent and child to be evaluated. I instead expected to see the subquery run only once. Since there is only 1 parent row being evaluated, I expected the query parsing to work as follows:

1) 1 parent row is fetched
2) subquery runs once associated with the single parent row that was fetched and returns a single id
3) the 1 parent row is joined to a child row matching the id returned from the subquery.

Instead what is going on is more along the lines of this"

1) 1 parent row is fetched
2) query parser gives each and every child row a chance to be joined to the parent by running the subquery for every potentially joinable child row
3) All the applicable child rows are joined to the parent row.

Is there a more appropriate way to join a table to another table using single row randomly selected from a correlated  subquery result?

Very interesting that PostgreSQL has a different behavior.

I look forward to hearing what you find out about PostgreSQL.

I hope this message finds you well and thanks again for the clarification.

All the best.

Will Ferrer
[29 Apr 2014 0:38] William Ferrer
Hi Hartmut 

Just a quick update from my end.

My more complex query I had been working for the telephony service product had always just returned 1 row due to the way I had crafted the joins, after talking with you I realized that fairly simply I could re factor it to return multiple rows and then order them by RAND in the way I wanted and them limit the result to 1 row.

This appears to work just fine for my needs and lets me move the stored procedures back to be being a single query.

Thanks again for the assistance.

All the best.

Will Ferrer
[30 Apr 2014 11:00] MySQL Verification Team
I observed that Oracle always returned 1 row..
[30 Apr 2014 11:19] MySQL Verification Team
SQL> SELECT version FROM V$INSTANCE;

VERSION
-----------------
11.2.0.2.0

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = ( SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 ) WHERE p.id = 1;

        ID         ID
---------- ----------
         1          1

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = ( SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 ) WHERE p.id = 1;

        ID         ID
---------- ----------
         1          1

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = ( SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 ) WHERE p.id = 1;

        ID         ID
---------- ----------
         1          1

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = ( SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 ) WHERE p.id = 1;

        ID         ID
---------- ----------
         1          1

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = ( SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 ) WHERE p.id = 1;

        ID         ID
---------- ----------
         1          2

SQL>
[30 Apr 2014 13:13] Hartmut Holzgraefe
Out of curiosity, as this makes a difference on PostgreSQL:

What happens on Oracle if you remove the "WHERE p.id = 1" part from the end of the query?
[30 Apr 2014 16:48] MySQL Verification Team
SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = (
  2  SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER
  3  BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 );

        ID         ID
---------- ----------
         1          2

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = (
  2  SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER
  3  BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 );

        ID         ID
---------- ----------
         1          2

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = (
  2  SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER
  3  BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 );

        ID         ID
---------- ----------
         1          1

SQL> SELECT p.id, c.id FROM parent p INNER JOIN child c ON c.id = (
  2  SELECT * FROM(SELECT c2.id FROM child c2 WHERE c2.parent_id = 1 ORDER
  3  BY (dbms_random.value * c2.priority)) WHERE ROWNUM=1 );

        ID         ID
---------- ----------
         1          1

SQL>