Bug #65177 | perromance issue of "order by rand()" | ||
---|---|---|---|
Submitted: | 2 May 2012 9:43 | Modified: | 7 May 2012 3:22 |
Reporter: | xiaobin lin (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.1,5.5 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | Contribution, order by rand, Using temporary |
[2 May 2012 9:43]
xiaobin lin
[2 May 2012 9:44]
xiaobin lin
base on 5.5.22
Attachment: order_by_rand.patch (application/octet-stream, text), 406 bytes.
[2 May 2012 12:48]
Evgeny Potemkin
Thank you for your contribution! We'll review it soon and if we'll find it appropriate will push it into the main code base.
[2 May 2012 14:20]
MySQL Verification Team
IMHO, this is a fine patch that solves a problem with SIMPLE join and ORDER BY rand(). However, this patch should be tested how it works with all currently supported types of JOINs and ORDER BY rand().
[2 May 2012 14:24]
MySQL Verification Team
Thank you for the bug report and contribution.
[2 May 2012 18:39]
Sveta Smirnova
Bug #59253 was marked as duplicate of this one.
[2 May 2012 18:45]
Sveta Smirnova
Bug #33837 was marked as duplicate of this one.
[2 May 2012 18:53]
Sveta Smirnova
xiaobin, have you tested your patch with larger amount of data? I get slower results with patch than without patch for 262144 rows. Without patch: mysql> explain select * from tb order by rand() limit 99999,1; +----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+ | 1 | SIMPLE | tb | ALL | NULL | NULL | NULL | NULL | 262173 | Using temporary; Using filesort | +----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+ 1 row in set (0.03 sec) mysql> select * from tb order by rand() limit 99999,1; +------+------+------+------+ | a | b | c | d | +------+------+------+------+ | a | a | a | a | +------+------+------+------+ 1 row in set (6.31 sec) mysql> select count(*) from tb; +----------+ | count(*) | +----------+ | 262144 | +----------+ 1 row in set (3.33 sec) With patch: mysql> explain select * from tb order by rand() limit 99999,1; +----+-------------+-------+------+---------------+------+---------+------+--------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+--------+----------------+ | 1 | SIMPLE | tb | ALL | NULL | NULL | NULL | NULL | 262173 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+--------+----------------+ 1 row in set (0.00 sec) mysql> select * from tb order by rand() limit 99999,1; +------+------+------+------+ | a | b | c | d | +------+------+------+------+ | a | a | a | a | +------+------+------+------+ 1 row in set (23.16 sec) mysql> select count(*) from tb; +----------+ | count(*) | +----------+ | 262144 | +----------+ 1 row in set (3.01 sec) For smaller tables it worked better.
[3 May 2012 2:59]
xiaobin lin
Sveta, thank you for test. Yes, I have encountered the same case. I think it is because the buffer pool is not enough, and Innodb should read data from disk repeatly. In this case, "Using temporary" become a better way because the page is read in sequence. The other hand, the patched way should read data page randomly ,which lead more LRU evicting. Can you check the increment of "show global status like 'Innodb_buffer_pool_reads';" in both version when large amount of data? If the patched version has a much bigger increment , it validates.
[3 May 2012 11:26]
Sveta Smirnova
xiaobin, thank you for the feedback. You are absolutely correct. Without patch: mysql> show global status like 'Innodb_buffer_pool_reads'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Innodb_buffer_pool_reads | 188 | +--------------------------+-------+ 1 row in set (0.01 sec) mysql> select * from tb order by rand() limit 99999,1; +------+------+------+------+ | a | b | c | d | +------+------+------+------+ | a | a | a | a | +------+------+------+------+ 1 row in set (3.80 sec) mysql> show global status like 'Innodb_buffer_pool_reads'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Innodb_buffer_pool_reads | 238 | +--------------------------+-------+ 1 row in set (0.00 sec) with patch: mysql> show global status like 'Innodb_buffer_pool_reads'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Innodb_buffer_pool_reads | 171 | +--------------------------+-------+ 1 row in set (0.00 sec) mysql> select * from tb order by rand() limit 99999,1; +------+------+------+------+ | a | b | c | d | +------+------+------+------+ | a | a | a | a | +------+------+------+------+ 1 row in set (22.47 sec) mysql> show global status like 'Innodb_buffer_pool_reads'; +--------------------------+--------+ | Variable_name | Value | +--------------------------+--------+ | Innodb_buffer_pool_reads | 104735 | +--------------------------+--------+ 1 row in set (0.00 sec)
[4 May 2012 3:58]
xiaobin lin
Sveta, thanks, so it validates. But I think the more common case is similar with little data case. There are also queries that have a "where" phrase to order less data, such as "select * from tb where a='xx' order by rand()", In such cases, the InnoDB buffer page will not be evicted immediately, so the same data page will not read from disk repeatly. Hence the procedure of filling data into temporary table will cost a bigger percent of running time, makes "Using where; Using filesort" quicker than "Using where; Using temporary; Using filesort" I test the assumption in the big data environment. 1) alter table tb add index a(a); 2) insert into tb values('b','b','b','b'); //repeat 20 times 3) select * from tb where a='b' order by rand() limit 19,1; repeat 3) 10,000 times and calculate the total time. In my test machine, raw version costs 14.756s and patched version costs 13.041s.
[4 May 2012 12:24]
Evgeny Potemkin
Beside performance, this patch has another problem: create table t1(f1 int); insert into t1 values(1),(2),(3),(4),(5),(6); mysql> explain select @a:=rand() from t1 order by 1; +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 6 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ 1 row in set (0.00 sec) mysql> select @a:=rand() from t1 order by 1; +---------------------+ | @a:=rand() | +---------------------+ | 0.11307570721309232 | | 0.29361815498547456 | | 0.12886368402174087 | | 0.7634639858859256 | | 0.4307289080980503 | | 0.8632526135661198 | +---------------------+ 6 rows in set (0.00 sec)
[7 May 2012 3:22]
xiaobin lin
Evgeny , Sorry for delay response across weekend. Thank you for case, it shows the prior patch is buggy. I find the reason of "disorder" is that when we do not use temporary table, the "rand()" is re-calcute, once for sorting, once for sending data. The action of re-calculating appears in other cases. As "select f1, abs(f1) from t1 order by 2", the "abs" is re-calculated, both in sorting and sending data period. The difference is that the result of "abs" is stable, but "rand" not. Perhaps the simply way to fix as this: if the rand() is shown in the field list, we also "use temporary". The new patch is attached bellow.
[7 May 2012 3:24]
xiaobin lin
based on 5.5.22
Attachment: order_by_rand.patch (application/octet-stream, text), 494 bytes.
[9 May 2012 1:26]
xiaobin lin
Last patch file are generated in error order of patched version and raw version. Modify it
Attachment: order_by_rand.patch (application/octet-stream, text), 494 bytes.
[31 Dec 2016 12:32]
Daniƫl van Eeden
Related: Bug #33837 order by rand() speed