Bug #50332 ORDER BY RAND() with explicit RAND_SEEDs produces different results
Submitted: 14 Jan 2010 14:43 Modified: 4 Feb 2010 5:47
Reporter: Elena Stepanova Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.5.1-m2, Celosia OS:Linux
Assigned to: Zhenxing He CPU Architecture:Any

[14 Jan 2010 14:43] Elena Stepanova
Description:
The provided test performs three types of test cases. 

Test cases A and B are complementary and are meant to show that the server with exactly the same parameters returns different results for ORDER BY RAND() with constant RAND_SEEDs, depending whether replication was or was not started.

Test case C shows the realistic impact of the difference, when master and slave produce different results for ORDER BY RAND() with explicit RAND_SEEDs.

I could reproduce it on 5.5.1-m2, next-mr and trunk, but could not on 5.1.42.
Also could not reproduce on Solaris.

Test case A:

- start master and slave servers;
- do NOT execute CHANGE MASTER / START SLAVE on slave;
- execute a previously stored mysqlbinlog output on master;
- set RAND_SEED1 and RAND_SEED2 and execute a SELECT .. ORDER BY RAND();
- check the result

Test case B:

- start master and slave servers with the same parameters as in test case A;
- execute CHANGE MASTER / START SLAVE on slave;
- execute the same previously stored mysqlbinlog output on master;
- set the same RAND_SEED1 and RAND_SEED2 and execute a SELECT .. ORDER BY RAND();
- check the result (see that the result differs from test case A)

Test case C:

- start master and slave servers with the same parameters as in test case A;
- start replication between the servers;
- execute the same previously stored mysqlbinlog output on master;
- set the same RAND_SEED1 and RAND_SEED2 and execute a SELECT .. ORDER BY RAND() on master;
- set the same RAND_SEED1 and RAND_SEED2 and execute a SELECT .. ORDER BY RAND() on slave;
- compare the results on master and slave

How to repeat:
Unpack the attached archive in your <mysql-basedir>/mysql-test folder;
run as 
perl rand_test/run.pl [options]

options: 
  --client-folder=<folder where mysql and mysqladmin live>, default ../bin
  --test-case=[A|B|C], default C
  --mtr-build-thread, default 300
  --shutdown (whether shutdown servers or not), default yes

Suggested fix:
For SystemQA:
Initially encountered as replication difference on betony_features schema in PB2 system tests.
[14 Jan 2010 14:45] Elena Stepanova
Perl script to execute the described scenarios

Attachment: 50332_rand_test.tar.gz (application/gzip, text), 228.37 KiB.

[4 Feb 2010 5:09] Zhenxing He
If run test case 'A' several times, the result order can be different too.
I think this problem is caused by different records order, we can also have similar issue when using SELECT ... LIMIT n;

See also bug#49222
[4 Feb 2010 12:24] Elena Stepanova
Hi,

I checked out the bug#49222 before submitting this one, and I could not see how it related to the problem described here. The bug#49222 says that RAND() is to be marked unsafe because pre-setting seeds "is not enough in cases where the value of RAND() is inserted into several rows, because the order in which rows are retreived is not specified". There are no such inserts in the provided test case, and the order in which rows are retrieved is specified (albeit, by the very same RAND()). 
I also checked the documentation on rand_seed1 and rand_seed2, and the manual still says:
"The purpose of these variables is to support replication of the RAND() function. For statements that invoke RAND(), the master passes two values to the slave, where they are used to seed the random number generator. The slave uses these values to set the session variables rand_seed1 and rand_seed2 so that RAND() on the slave generates the same value as on the master."
http://dev.mysql.com/doc/refman/5.5/en/server-session-variables.html#sysvar_rand_seed1

I've never got different results while running test case A myself, but if you can see that, I would think that it shows that I was wrong about what triggers the problem (different replication-related settings on the server), but the problem itself still exists, just becomes wider...