Sunday, March 7, 2010

Returning a random set of data

A number of times, I've been requested to provide a random data for various purposes, including testing and sampling. I have found a number of ways of doing it which mostly include creating a multiple layer of sub-query and even creating temporary tables. My usual mindset is that I can not really use the TOP n keyword because, most of the time, it will return the same records every time. I also tried to use the RAND() function that can generate random number. Closer look with this function, however, says repetitive calls of RAND() with the same seed value return the same results (BOL). When I received another project that requires a randomly generated set of data, I started searching the net on how to do this in a simpler way. This time, I leverage an existing function that will always generate a random value. The NEWID() function. This function always returns a unique value of uniqueidentifier data type.

Here's a sample table:
BookId         BookTitle
------      ---------------
1          Yesterday
2          In My Life
3          Hey Junde
4          Fool On the Hill
5          If I Fell
6          Let It Be
7          Till There Was You
8          Yellow Submarine
9          I Should Have Known Better
Running a simple TOP X% query always return the same set of records.
select  TOP 30 PERCENT BookId, BookTitle from books
Using the RAND() function always returns the same set of records:
select  TOP 30 PERCENT BookId, BookTitle from books
order by rand(rand(1))

BookId         BookTitle
------      ---------------
1          Yesterday
2          In My Life
3          Hey Junde
Using the NEWID() function always returns a random set of records
select  TOP 30 PERCENT BookId, BookTitle from books
order by newid()

BookId         BookTitle
------      ---------------
4          Fool On the Hill
5          If I Fell
1          Yesterday
Here's another run
select  TOP 30 PERCENT BookId, BookTitle from books
order by newid()

BookId         BookTitle
------      ---------------
5          If I Fell
8          Yellow Submarine
2          In My Life
So far, I have not seen any issue using this function in a big table.


~CK

Saturday, October 31, 2009

Stored procedure in a SELECT statement

Objective: Execute a stored procedure inside a SELECT statement and return the result as a table. I found a blog that shows how to execute a stored procedure inside a SELECT statement and return the result as a table. Although this can also be achieved using a table-valued function, this is much useful capturing result sets returned by system stored procedure like, sp_who. Although there are a number of ways to do this, I found this very simple technique. There are a number of consideration before this can be used including: user-rights, configuration setting and the size of the result set to be returned. Also, try to consider using function first. What I did is take advantage of the OPENROWSET function that allows user to access remote data from a data source. Here's a sample of retrieving the result set as returned by the system stored procedure sp_who:
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=.;Trusted_Connection=yes;',
'set fmtonly off; exec sp_who') AS a
As this is like a regular SELECT statement, data merge (JOIN, UNION), filter condition (WHERE) and sorting (ORDER BY) can be used. The result may also be inserted into a table using the INSERT INTO...SELECT statement or to create a new table by including a INTO new_table> statement on the SELECT statement. The new_table may also be a temporary table. Here's another sample. Although the following code may be implemented in some other ways, it was created to illustrate how the stored procedure be executed inside a SELECT statement and return the result. Here's the stored procedure:
create proc tproc (@dbname as varchar(50))
as
begin
   declare @sql varchar(500)
   set @sql = 'select * from ' + @dbname + '.dbo.sysobjects'
   exec (@sql)
end

exec dbo.tproc 'model'
exec dbo.tproc 'master'
To call in a SELECT statement and return as a table:
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=.;Trusted_Connection=yes;',
 'set fmtonly off; exec workdb.dbo.tproc ''master''') AS a;

SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=.;Trusted_Connection=yes;',
 'set fmtonly off; exec workdb.dbo.tproc ''model''') AS a;
As always, comments are welcome ....


~~CK

Rollback of Temp Table vs Table Variable

Is there a difference between a Rollback of a Temp Table vs Rollback of a Table Variable? I've read a number of articles about the difference between Temp Table and Table Variable. Not until I have this major project that I got to work on one of the biggest difference between these two types of table. I am required to write, through an existing stored procedure, to a log table that contains description of each major steps on my stored procedure, including error messages. My problem is, I implemented a transaction to handle any error that my stored procedure might encounter. These errors are usually due to violations of constraint, integrity and data type implicit conversions. Once the rollback is executed, I loose all the records in log table that got inserted as my stored procedure executes. The solution: table variable. I created a table variable with the same structure as that of the log table. Records got inserted into log table during the execution of the stored procedure. When an error caused the process to fail, I inserted the value from the log table into the table variable before I executed a rollback. After the rollback, all records that got inserted to the log-table were deleted. Then I inserted everything back from the table variable to the log table. Since table variable are not affected by a rollback statement, the data was preserved in the variable. One major consideration is to figure out which records were inserted by my stored procedure so as not to duplicate the entire table.


~~ CK