Sunday, March 7, 2010

Returning a random set of data

A number of times, I've been requested to provide a random data for various purposes, including testing and sampling. I have found a number of ways of doing it which mostly include creating a multiple layer of sub-query and even creating temporary tables. My usual mindset is that I can not really use the TOP n keyword because, most of the time, it will return the same records every time. I also tried to use the RAND() function that can generate random number. Closer look with this function, however, says repetitive calls of RAND() with the same seed value return the same results (BOL). When I received another project that requires a randomly generated set of data, I started searching the net on how to do this in a simpler way. This time, I leverage an existing function that will always generate a random value. The NEWID() function. This function always returns a unique value of uniqueidentifier data type.

Here's a sample table:
BookId         BookTitle
------      ---------------
1          Yesterday
2          In My Life
3          Hey Junde
4          Fool On the Hill
5          If I Fell
6          Let It Be
7          Till There Was You
8          Yellow Submarine
9          I Should Have Known Better
Running a simple TOP X% query always return the same set of records.
select  TOP 30 PERCENT BookId, BookTitle from books
Using the RAND() function always returns the same set of records:
select  TOP 30 PERCENT BookId, BookTitle from books
order by rand(rand(1))

BookId         BookTitle
------      ---------------
1          Yesterday
2          In My Life
3          Hey Junde
Using the NEWID() function always returns a random set of records
select  TOP 30 PERCENT BookId, BookTitle from books
order by newid()

BookId         BookTitle
------      ---------------
4          Fool On the Hill
5          If I Fell
1          Yesterday
Here's another run
select  TOP 30 PERCENT BookId, BookTitle from books
order by newid()

BookId         BookTitle
------      ---------------
5          If I Fell
8          Yellow Submarine
2          In My Life
So far, I have not seen any issue using this function in a big table.


~CK