Bug #33837 order by rand() speed
Submitted: 12 Jan 2008 20:48 Modified: 3 May 2012 15:08
Reporter: anzenews asdf Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:any OS:Any
Assigned to: CPU Architecture:Any
Tags: order by rand, speed

[12 Jan 2008 20:48] anzenews asdf
Description:
Hi!

There are numerous articles on the Internet about optimization of this simple query:
SELECT * FROM posts WHERE ... ORDER BY RAND() LIMIT 3

This is a very common SQL query in web development and still there is no fast way to retreive a few random records from a table. Most web developers just leave this query unefficient and some go to great lengths just to solve what should not be a problem in a first place. 

One of the pages that try to solve this is:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/

Anyway, it would be nice if "ORDER BY RAND() LIMIT X" was fast - it should be fairly easy to solve this on MyQSL optimizer level (I guess). 

Thanks, keep up the good work,

Anze

How to repeat:
Take a very large table and issue a select:
SELECT * FROM mytable ORDER BY RAND() LIMIT 3

Suggested fix:
"RAND()" in this case should not be treated as an expression, but rather as the way of ordering data (randomly).
[14 Jan 2008 12:58] Susanne Ebrecht
Many thanks for writing a feature request.

This can't be changed.

select * from tab order by something limit 3; 

always means a sequential scan of the table. Because you only can get the first three results, when you know the whole results.
[14 Jan 2008 13:36] anzenews asdf
... which is exactly why this is a special case: if you are looking for _RANDOM_ results there is no need to make a sequential scan. You could make a random scan and stop after X found records... 

I appreciate that you have a deeper understanding of the system and how it could be incorporated but this really is only solveable on MySQL level. All higher-level implementations are cumbersome, limited or even impossible to do. 

Right now "ORDER BY RAND()" is a very common and at the same time very inefficient query...

Thanks for the answer though, I hope you change your mind about this...
[26 Mar 2008 10:24] Susanne Ebrecht
Many thanks for writing a feature request. We will check if we can implement this or not.
[2 May 2012 18:46] Sveta Smirnova
There is verified bug #65177 with patch proposed about very same problem, so closing this one as duplicate of that.
[3 May 2012 15:08] anzenews asdf
I don't use MySQL anymore in my products, so this is not very important to me anymore. 

However, there are a few points I would like to make:
- proposed solution apparently works better only when used on smaller number of rows
- proposed solution only makes a small(ish) speed difference (less than factor 2), at least as shown in poster's example
In other words, the new ticket is more specific and does not solve this problem adequately (IMHO, of course).

If you first order and then limit records there can be no big speedup because you need to re-order elements each time and that takes a lot of space / time. What I was trying to point out is that this is a special case where ordering of records is not needed at all. 
Granted, I do not have a patch, and I am still happy to see something done in this direction. 

Anyway, feel free to close the ticket, I am already on MongoDB. :)