MySQL Bugs: #53587: what are the semantics of the rand function

Bug #53587	what are the semantics of the rand function
Submitted:	11 May 2010 22:57	Modified:	17 Aug 2016 14:51
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	5.0,5.1	OS:	Any
Assigned to:	Paul DuBois	CPU Architecture:	Any
Tags:	by, FUNCTION, GROUP, rand, UPDATE

Description:
I think that rand() needs more documentation. From http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand there is a warning about using it with order by. But there is not a warning for group by.

See http://bugs.mysql.com/bug.php?id=32249 for the order by problem

See http://bugs.mysql.com/bug.php?id=8652 for the group by problem. Note that Igor's update to this bug is not mentioned in the docs.

I think there is another problem. I get many deadlocks from a simple benchmark that runs this SQL using concurrent sessions. The test uses InnoDB tables. The SQL text below is run in a loop 1000+ times per connection. Deadlocks occur when it is run by 2 connections.

begin;
update t1 set count = count + 1 where id = 1;
update t1 set count = count + 1 where id = round(rand() * 7000) + 1000;
commit;

The problem does not occur when I change it to use this SQL:

begin;
update t1 set count = count + 1 where id = 1;
select round(rand() * 7000) + 1000 into @r;
update t1 set count = count + 1 where id = @r;
commit;

Is rand() in the where clause getting re-evaluated?

I use this SQL to setup the test:
drop table if exists t1;
create table t1 (id int primary key auto_increment, count int) engine= innodb ;
insert into t1 values (null, 0);
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;
insert into t1 select null, 0 from t1;

analyze table t1;

How to repeat:
This reproduces the group by problem -- duplicate key error on query:

drop table if exists f;
create table f (i int);
insert into f values (1);

insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;
insert into f select * from f;

explain select count(*), round(100.1 * 70) + 1000 as a from test.f group by a;
select count(*), round(100.1 * 70) + 1000 as a from test.f group by a;

explain select count(*), round(rand() * 70) + 1000 as a from test.f group by a;
select count(*), round(rand() * 70) + 1000 as a from test.f group by a;

Behavior of rand() in WHERE clause is documented on that page:

"RAND() in a WHERE clause is re-evaluated every time the WHERE is executed."

But I agree that manual should describe RAND() behavior in more details, including GROUP BY clause and even potential deadlocks, as you had correctly pointed out.

For the update case I don't know how it can be correct to evaluate rand() more than once per row.

Verified on 5.1.46 the group by problem with test provided

Added a note to the RAND() description that the ORDER BY problem can also occur for GROUP BY.