MS-SQL Server Notes: update

I found this problem on one of the SQL Server forums that I am regular contributor of.

Given this table:

set nocount on
declare @yourtable table
(recnum int,
Col2 varchar(2),
FKCol3 varchar(15),
Col4 varchar(10))

insert into @yourtable values (1, 'AA', 'key0AA', 'Somedata')
insert into @yourtable values (2, 'CC', NULL, 'Somedata')
insert into @yourtable values (3, 'CC', NULL, 'Somedata')
insert into @yourtable values (4, 'AA', 'key0BB', 'Somedata')
insert into @yourtable values (5, 'BB', NULL, 'Somedata')
insert into @yourtable values (6, 'CC', NULL, 'Somedata')
insert into @yourtable values (7, 'AA', 'key0CC', 'Somedata')
insert into @yourtable values (8, 'CC', NULL, 'Somedata')
insert into @yourtable values (9, 'CC', NULL, 'Somedata')
insert into @yourtable values (10, 'AA', 'key0DD', 'Somedata')
insert into @yourtable values (11, 'AA', 'key0EE', 'Somedata')

select *
from @YourTable 


recnum      Col2 FKCol3          Col4
----------- ---- --------------- ----------
1           AA   key0AA          Somedata
2           CC   NULL            Somedata
3           CC   NULL            Somedata
4           AA   key0BB          Somedata
5           BB   NULL            Somedata
6           CC   NULL            Somedata
7           AA   key0CC          Somedata
8           CC   NULL            Somedata
9           CC   NULL            Somedata
10          AA   key0DD          Somedata
11          AA   key0EE          Somedata

The problem: The user wants to replace those PKCol3 that are NULL by the non-null PKCol3 that precedes it. In this case, records 2 and 3 will have PKCol3 with "key0AA". Records 5 and 6 will have PKCol3 = "key0BB", and so on.This has to be done for almost 2 million+ rows.

Here's the code I suggested:

First, here's a query that can be used for analysis:

select recnum, Col2, FKCol3, Col4,
newCol3 = isnull((
   select top 1 FKCol3 
   from @YourTable b    
   where a.recnum > b.recnum and b.FKCol3 is not null
    and a.FKCol3 is null 
   order by b.recnum desc),a.FKCol3)
from @YourTable a

Here's the result set:

recnum      Col2 FKCol3          Col4       newCol3
----------- ---- --------------- ---------- ---------------
1           AA   key0AA          Somedata   key0AA
2           CC   NULL            Somedata   key0AA
3           CC   NULL            Somedata   key0AA
4           AA   key0BB          Somedata   key0BB
5           BB   NULL            Somedata   key0BB
6           CC   NULL            Somedata   key0BB
7           AA   key0CC          Somedata   key0CC
8           CC   NULL            Somedata   key0CC
9           CC   NULL            Somedata   key0CC
10          AA   key0DD          Somedata   key0DD
11          AA   key0EE          Somedata   key0EE

The intention is that newCol3 will be the new value of PKCol3.

Executing the update:

update @yourtable
set FKCol3 = isnull((
   select top 1 FKCol3
   from @YourTable b 
   where a.recnum > b.recnum and b.FKCol3 is not null
   and a.FKCol3 is null 
    order by b.recnum desc),a.FKCol3)
from @yourtable a

select *
from @YourTable

Here is the updated result set:

recnum      Col2 FKCol3          Col4
----------- ---- --------------- ----------
1           AA   key0AA          Somedata
2           CC   key0AA          Somedata
3           CC   key0AA          Somedata
4           AA   key0BB          Somedata
5           BB   key0BB          Somedata
6           CC   key0BB          Somedata
7           AA   key0CC          Somedata
8           CC   key0CC          Somedata
9           CC   key0CC          Somedata
10          AA   key0DD          Somedata
11          AA   key0EE          Somedata

This script will only work if there's a unique column (in this case recnum) that can be used to identify and sort the rows.

~~ CK

The WHERE clause is one of the most commonly used optional part of a query. Simply put, it filters out rows and narrows the number of rows returned by a query based on the condition included in the clause.

It's a common misconception that the SQL optimizer will always use an index whenever the table has a useful one. This is not always the case. In some cases, index will not be use and a table/index scan will be performed resulting in slow processing.

It's also widely accepted that the arrangement of expressions and the operator used does not matter, since the optimizer will parse and prepare an execution plan for the query anyway. Although this is true most of the time, arranging the logical expression properly can improve processing.

Here are some consideration that I keep in mind whenever I build my WHERE clause.

Avoid using expression that has a function on a column. This will prevent the optimizer from using the index and instead perform a table/index scan.

This query will not use an index:

select *
from AdventureWorksDW..FactInternetSalesReason
WHERE substring(SalesOrderNumber,1,2) = 'SI'

Modified, this one will use an index:

select *
from AdventureWorksDW..FactInternetSalesReason
WHERE SalesOrderNumber like 'SI%'

If the use of a function can not be avoided, use an indexed computed column instead.

If the SQL Server is not configured to be case-sensitive, do not bother using LOWER() or UPPER() functions .

These are three identical queries that will return identical results:

select * 
from AdventureWorksDW..DimGeography 
WHERE CountryRegionCode = 'AU'

select * 
from AdventureWorksDW..DimGeography 
WHERE upper(CountryRegionCode) = 'au'

select * 
from AdventureWorksDW..DimGeography 
WHERE CountryRegionCode = 'au'

The second query, however, will not use an index.

Use the equal (=) operator to compare two strings instead of LIKE.

These two queries will return the same results:

select * from AdventureWorksDW..FactInternetSales
WHERE salesordernumber = 'SO43703'

select * from AdventureWorksDW..FactInternetSales
WHERE  salesordernumber LIKE 'SO43703'

The first query is more efficient than the second one. If the LIKE operator can not be avoided, use as much leading character as much as possible. If the application performs too many LIKE operation, consider SQL Server's full-text search option instead.

select * from AdventureWorksDW..FactInternetSalesReason
WHERE salesordernumber like 'S%1'

The above query will perform faster than the query below:

select * from AdventureWorksDW..FactInternetSalesReason
WHERE salesordernumber like '%1'

Although they are not identical, given the choice, use the former rather than the latter.

Here are the most common operators in WHERE clause arranged based on best performing first:

=
>, <, >=, <= 
LIKE 
<>, NOT

Avoid using NOT operator as much as possible. Although not always the case, WHERE clause that uses the NOT operator does not utilize index.

select * from AdventureWorksDW..FactInternetSales
WHERE not ShipDateKey >= 10

will perform faster as:

select * from AdventureWorksDW..FactInternetSales
where ShipDateKey <10

Given the choice, use EXISTS() instead of IN(). Moreover, IN() have some issues handling with NULL values.

Force the optimizer to utilize an index by using index hint on the query. Use this as the last resort for optimization.

Given the choice, use BETWEEN instead of IN. The BETWEEN operator performs faster than IN.

If the clause have multiple logical expression connected by two or more AND operators, locate the expression that will be LEAST likely to be true. This way, if it's false, the clause will immediately end. If both expressions are equally likely to be false, test the least complex expression first. This way, if it's false, the more complex one need not be tested. Also, consider creating an index for a selective column or a covering index for the query.

If the clause have multiple logical expression connected by two or more OR operators, locate the expression that will be MOST likely to be true. This way, if it's true, the clause will immediately end. If both expressions are likely to be true, test the least complex expression first. This way, if it's true, the more complex one need not be tested.

Remember that IN operator is actually another form of OR, so place the most probable value at the start of the list.

A query will perform a table/index scan if it contains OR operator and if any of the referenced column does not have a useful index.

This query will perform a table/index scan. For this query to utilize an index, there must be an index on all three columns in the clause. Even if two out of the four columns have an index, it will still perform a table/index scan.

select *
from AdventureWorksDW..FactInternetSales
where ShipDateKey = 3
or PromotionKey = 3
or SalesTerritoryKey = 5
or SalesOrderLIneNumber = 3

If creating an index for each of these columns is not an option, rewrite the query to use UNION ALL (not UNION) instead. This way the query with useful index will be utilize, even if it's just the one or two out of the four query. It will still execute more efficiently.

select *
from AdventureWorksDW..FactInternetSales
where ShipDateKey = 3
union all
select *
from AdventureWorksDW..FactInternetSales
where PromotionKey = 3
union all
select *
from AdventureWorksDW..FactInternetSales
where SalesTerritoryKey = 5
union all
select *
from AdventureWorksDW..FactInternetSales
where SalesOrderLIneNumber = 3

The above query will give the same results but will run faster than using multiple ORs. If ShipDateKey and PromotionKey have useful index, it will use their respective index for that part of the query, improving the speed of the entire query.

Given the option, use EXIST or a LEFT JOIN instead of IN to test/compare the relationship of a parent-child tables.

Beware of redundant clause.

This query have a redundant WHERE clause:

select DueDateKey, ShipDateKey,
PromotionKey, CustomerKey
from AdventureWorksDW..FactInternetSales
WHERE
DueDateKey = 8 and ShipDateKey = 8 and
PromotionKey = 1 or ShipDateKey = 8 and PromotionKey = 1

ShipDateKey = 8 and PromotionKey = 1 is a subset of DueDateKey = 8 and ShipDateKey = 8 and PromotionKey = 1. What the optimizer will do is it will return all the rows requested.

Here's the result set:

DueDateKey  ShipDateKey PromotionKey CustomerKey
----------- ----------- ------------ -----------
13          8           1            11003
13          8           1            14501
13          8           1            21768
13          8           1            25863
13          8           1            28389

It may look like the query returned the requested rows efficiently. In reality what happen is, it returned these rows twice then perform a DISTINCT to remove the redundant rows.

Convert frequently running batch to stored procedure, specially, if the queries in the batch use user-defined variables. The optimizer might not take advantage of a useful index to run the queries in the batch. This is because the optimizer does not know the value of these variables when it chooses a way to access the data. If the batch is not frequently executed, consider including an INDEX hit on the query.

As much as possible, include a WHERE clause to the query. This will improve the way SQL Server retrieve the data.

~~CK

MS-SQL Server Notes

Wednesday, October 8, 2008

Updating a Row with Value from Preceding Row

Tuesday, September 30, 2008

Optimizing Your WHERE Clause