Bug #53587 what are the semantics of the rand function
Submitted: 11 May 2010 22:57 Modified: 17 Aug 2016 14:51
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.0,5.1 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: by, FUNCTION, GROUP, rand, UPDATE

[11 May 2010 22:57] Mark Callaghan
Description:
I think that rand() needs more documentation. From http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand there is a warning about using it with order by. But there is not a warning for group by.

See http://bugs.mysql.com/bug.php?id=32249 for the order by problem

See http://bugs.mysql.com/bug.php?id=8652 for the group by problem. Note that Igor's update to this bug is not mentioned in the docs.

I think there is another problem. I get many deadlocks from a simple benchmark that runs this SQL using concurrent sessions. The test uses InnoDB tables. The SQL text below is run in a loop 1000+ times per connection. Deadlocks occur when it is run by 2 connections.

begin;
update t1 set count = count + 1 where id = 1;
update t1 set count = count + 1 where id = round(rand() * 7000) + 1000;
commit;

The problem does not occur when I change it to use this SQL:

begin;
update t1 set count = count + 1 where id = 1;
select round(rand() * 7000) + 1000 into @r;
update t1 set count = count + 1 where id = @r;
commit;

Is rand() in the where clause getting re-evaluated?

I use this SQL to setup the test:
drop table if exists t1;
create table t1 (id int primary key auto_increment, count int) engine= innodb ;
insert into t1 values (null, 0);
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;

analyze table t1;

How to repeat:
This reproduces the group by problem -- duplicate key error on query:

drop table if exists f;
create table f (i int);
insert into f values (1);

insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;

explain select count(*), round(100.1 * 70) + 1000 as a from test.f group by a;
select count(*), round(100.1 * 70) + 1000 as a from test.f group by a;

explain select count(*), round(rand() * 70) + 1000 as a from test.f group by a;
select count(*), round(rand() * 70) + 1000 as a from test.f group by a;
[12 May 2010 3:51] Valeriy Kravchuk
Behavior of rand() in WHERE clause is documented on that page:

"RAND() in a WHERE clause is re-evaluated every time the WHERE is executed."

But I agree that manual should describe RAND() behavior in more details, including GROUP BY clause and even potential deadlocks, as you had correctly pointed out.
[12 May 2010 4:14] Mark Callaghan
For the update case I don't know how it can be correct to evaluate rand() more than once per row.
[17 May 2010 23:51] Donna Harmon
Verified on 5.1.46 the group by problem with test provided
[15 Feb 2016 23:04] Paul DuBois
Added a note to the RAND() description that the ORDER BY problem can also occur for GROUP BY.