Random rows in MySQL with Reasonable Performance

Almost every answer on Stack Overflow for this is terrible on moderately complex or large real data – https://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast.

Bad: ORDER BY RAND() LIMIT N

This means executing your query for the entire resultset and then ordering it and then chopping off the number you need. This can have truly dire performance.

Bad: SELECT MAX(id) FROM table
Don’t pick random numbers from 1 to MAX(id) – deleted rows will be null or result in you getting less rows than you want. Who says your id is even sequential/numeric?

OK: SELECT id FROM table

Then pick N at random (removing those already chosen if you don’t want duplicates) from the resultset in your chosen programming language.

If you just want one row from the db you can do this in SQL only as shown here: https://stackoverflow.com/a/31066058/375262. Doing N unique rows this way is left as an exercise for the reader.

If you have a gargantuan resultset even SELECT COUNT(*) might be slow. What would you do then?

Leave a Reply

Your email address will not be published.