Wednesday, September 24, 2008

What's the Difference Between SET and SELECT?

Is there really a difference between SET and SELECT when used to assign values to variables, in T-SQL? The misconception is that, there's no difference between them. I've been creating SQL Scripts for some time now and had been using these two statements interchangeably. That is until I got an unnoticed logical error resulting from using one instead of the other. So what are the similarities and differences? Both commands can assign values to a variable. For the most part, their similarities end there. Fundamentally, SET is the ANSI standard way of assigning values to a variable, while SELECT is not. Another visible difference is that SET can assign a value to only one variable, while SELECT can assign value to one or more variable at a time. Consider these variables:
declare @myVar1 int, @myVar2 int
This code should run successfully:
SELECT @myVar1 = 1, @myVar2 = 2
print '@myVar1'
print @myVar1
print ''
print '@myVar2'
print @myVar2
Here are the results:
@myVar1
1
 
@myVar2
2
This code will result to an error:
SET @myVar1 = 11, @myVar2 = 12
print '@myVar1'
print @myVar1
print ''
print '@myVar2'
print @myVar2
Here's the error returned:
Msg 102, Level 15, State 1, Line 11
Incorrect syntax near ','.
(I'll use the PRINT command to display variable content to avoid confusion between the command assigning values to variable and displaying values.) In order for this code to run, assignment of values should be done separately for each variable.
SET @myVar1 = 11
SET @myVar2 = 12
print '@myVar1'
print @myVar1
print ''
print '@myVar2'
print @myVar2
Here's the result:
@myVar1
11

@myVar2
12
Standards, however, is not the only consideration when using these two statements. Here's a sample application of how these commands could tremendously affect SQL Scripts:
declare @rowsaffected int, @errornumber int

select id/0 from sysobjects
set @rowsaffected = @@rowcount
set @errornumber = @@error

print 'rowsaffected'
print @rowsaffected
print ''
print 'errornumber'
print @errornumber
Here are the results:
Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered.

rowsaffected
0

errornumber
0
The code returned an error (division by zero), but the system variable @@error returned zero (0). This is not due to SET and SELECT per se. The @@error system variable checks if there's an error on the last t-sql statement executed. In this case set @rowsaffected = @@rowcount, is a valid statement, hence it did not return any error number. This could result in unhandled errors. In another application, I want to return the number of rows affected.
declare @rowsaffected int, @errornumber int

select id from sysobjects where id <= 10
set @errornumber = @@error
set @rowsaffected = @@rowcount

print 'rowsaffected'
print @rowsaffected
print ''
print 'errornumber'  
print @errornumber
Here are the results:
id
-----------
4
5
7
8

rowsaffected
1
 
errornumber
0
Notice that the query actually returned four rows, but the @@rowcount system variable reported only one row. The @@rowcount is a system variable that returns the number of rows affected by the previous statement. In this case, set @errornumber = @@error, returns one row. Using SET to assign the values of these system variables, in this case, is not advisable. To capture both values at the same time, use the SELECT statement.
declare @rowsaffected int, @errornumber int

select id from sysobjects where id <= 10
select @errornumber = @@error,@rowsaffected = @@rowcount

print 'rowsaffected'
print @rowsaffected
print ''
print 'errornumber'  
print @errornumber

id
-----------
4
5
7
8

rowsaffected
4
 
errornumber
0

select id/0 from sysobjects where id <= 10
select @errornumber = @@error,@rowsaffected = @@rowcount

print 'rowsaffected'
print @rowsaffected
print ''
print 'errornumber'  
print @errornumber
 
-----------
Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered.

rowsaffected
0
 
errornumber
8134
In both cases, the @@rowcount and @@error returned the right values. SET and SELECT also varies on the way scalar values are being assigned to variables, specifically if the value is coming from a query. Consider this sample table:
SET NOCOUNT ON
declare @Test table (i int, j varchar(15))
INSERT INTO @Test (i, j) VALUES (1, 'First Row')
INSERT INTO @Test (i, j) VALUES (2, 'Second Row')

select * from @Test

i           j
----------- ----------
1           First Row
2           Second Row
Using the two commands:
declare @myVarUseSET varchar(15), @myVarUseSELECT varchar(15)

SET @myVarUseSET = (select j from @Test where i = 2)
SELECT @myVarUseSELECT = j from @Test where i = 2

print '@myVarUseSET'
print @myVarUseSET
print ''
print '@myVarUseSELECT'
print @myVarUseSELECT
Here are the results:
@myVarUseSET
Second Row

@myVarUseSELECT
Second Row
The values are accurately assigned to both variables. What will happen if the query returned more than one rows.
SET @myVarUseSET = (select j from @Test)
SELECT @myVarUseSELECT = j from @Test

print '@myVarUseSET'
print @myVarUseSET
print ''
print '@myVarUseSELECT'
print @myVarUseSELECT
Here are the results:
Msg 512, Level 16, State 1, Line 10
Subquery returned more than 1 value. 
This is not permitted when the subquery follows =, !=, <, <= , >, >= or 
when the subquery is used as an expression.
@myVarUseSET

@myVarUseSELECT
Second Row
Notice that the SET query failed while the SELECT succeeded. How do SELECT statements choose which value among the rows returned will be stored to the variable? It will be the last row returned. If the query has an ORDER BY clause, it will return the last row as sorted by the ORDER BY clause. What will happen if, instead of multiple values returned, the query did not return any rows? To be able to appreciate more, I initialize the value of the variables and tried to assign a value that's coming from a query that did not return any rows.
select @myVarUseSET = 'initial value',
@myVarUseSELECT = 'initial value'

set @myVarUseSET = (select j from @Test where i = 3)
select @myVarUseSELECT = j from @Test where i = 3

print '@myVarUseSET'
print isnull(@myVarUseSET,'NULL')
print ''
print '@myVarUseSELECT'
print isnull(@myVarUseSELECT,'NULL')
Here are the results:
@myVarUseSET
NULL

@myVarUseSELECT
initial value
The way SET executes is that it will set the value of the variable to NULL. SELECT, on the other hand, will not replace the value of the variable. Notice that the variable @myVarUseSELECT still contains the initialized value of the variable. If not properly handled, this could lead to unnoticed errors that might affect the result of any scripts. Lastly, speed. Dealing with a single variable, there's not much of speed differential between the two commands. SET @variable = value and SELECT @variable = value have almost the same speed with very negligible difference. However, if it's dealing with three or more variables SELECT beats SET by almost 50% for the mere fact that SELECT can assign values to multiple variables at a time. Based on the abovementioned samples and cases, speed should not the only thing that must be considered. Keep in mind how these commands handle the results of the assignment operator.


~~ CK

No comments:

Post a Comment