Bug #91641 Using RAND() in SELECT query in huge table makes the next query result wrong
Submitted: 15 Jul 2018 14:41 Modified: 18 Jul 2018 13:00
Reporter: jinkli Josin Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.7.21 OS:Any
Assigned to: CPU Architecture:x86 (Mac OSX 10.13.6)

[15 Jul 2018 14:41] jinkli Josin
Description:
Table name 'v_record', which have 100,000,000 rows data, only have a primary key name `id`, when doing the following query:

```sql
SELECT id FROM v_record ORDER BY RAND() LIMIT 100;
```

and then running the following query, the data was wrong:

```sql
SELECT id FROM v_record LIMIT 100
```

The sql 'SELECT id FROM v_record LIMIT 100'  will return the following data as normal:

```
1
2
3
4
5
6
7
8
9
10
...
```

but now, the return result was non-order, i think it was returned from buffering:

sql: ' SELECT id FROM v_record LIMIT 100 '

```
1913120
5697920
9482720
3487666
7272466
3146179
6930979
3342184
7126984
187150
...
```

and when a pick another column, the result was right:

sql: ' SELECT id,vote_id FROM v_record LIMIT 100 '

```
1	216
2	388
3	652
4	177
5	732
6	999
7	978
8	217
9	792
10	186
...
```

How to repeat:
1. i use the procedure to produce much of data, counted as 100,000,000:

```sql
CREATE FUNCTION `rand_string`(n INT) RETURNS varchar(255) CHARSET utf8mb4
BEGIN
	DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXZY0123456789';
	DECLARE return_str VARCHAR(255) DEFAULT '';
	DECLARE i  INT DEFAULT 0;
	WHILE i < n DO
		SET return_str = concat(return_str, SUBSTRING(chars_str, FLOOR(1 + RAND() * 62), 1));
		SET i = i + 1;
	END WHILE;
	RETURN return_str;
END
```

```sql
CREATE PROCEDURE `add_memory`(IN n INT)
BEGIN 
	DECLARE i INT DEFAULT 1;
	START TRANSACTION;
	WHILE ( n > 0 ) DO
		INSERT INTO v_record (user_id, vote_id, group_id, create_time) VALUES (rand_string(20), FLOOR(RAND() * 1000), FLOOR(RAND() * 100), NOW());
		SET i = i + 1;
		SET n = n - 1;
	END WHILE;
	COMMIT;
END
```

then call the procedure:

```
CALL add_memory (10000000);
```

2. running the select query:

```sql
SELECT * FROM v_record ORDER BY RAND() LIMIT 100;
```

or

```sql
SELECT id FROM v_record ORDER BY RAND() LIMIT 100;
```

3. To get the 100 rows of data from 1 ~ 100:

```sql
SELECT id FROM v_record LIMIT 100;
```

the output data was wrong, if you want to get the right data, i must do the following query replace the before one (add some other column to let the engine to scan the table not to fetch from the buffering cache, i guess. ):

```
SELECT id,vote_id FROM v_record LIMIT 100;
```

TEMPORARILY RESOLUTION:

Add another column when you want to get the right data form database's table, not the primary key ( The primary key was `id` in table named: `v_record`):

REPLACE THE SQL:

```sql
SELECT id FROM v_record LIMIT 100;
```
WITH the following:

```sql
SELECT id,vote_id FROM v_record LIMIT 100;
```
[16 Jul 2018 8:16] Hartmut Holzgraefe
While this behavior is weird in an interesting way, it is not really wrong. As you did not specify any ORDER the server is free to return rows in any order. Usually this would be the primary key order in this case, but there's no guarantee.

What would be interesting now would be the EXPLAIN plan for the "wrong" 2nd query after the first one had been run.
[18 Jul 2018 13:00] MySQL Verification Team
Hi,

I have to agree with my friend Hartmut regarding this report.

To quote from the SQL definitions, a relational entity, such as a result set, is an unordered set of tuples, consisting of the unordered set of attributes. This is also addressed in SQL standard from 2011, in the chapter 4.15.

This means that a query:

SELECT * FROM table WHERE ......;

can return rows in any order. If columns are not specified, their sequence is left up to the SQL server.

If you specify SELECT id FROM table, then you will get a result set, only with column `id`, but in any order of rows. 

In order to get rows that are ordered, you have to use ORDER BY clause. That is basic SQL.