MS-SQL Server Notes: October 2008

Sunday, October 26, 2008

How to check for leap year?

Checking leap year is one of the most common checks that I perform in date values. Wikipedia have this pseudo-code to calculate for leap year:

if ((year modulo 4 is 0) and (year modulo 100 is not 0)) or
(year modulo 400 is 0) then 
   leap
else 
   not_leap

I've been using this algorithm all my programming life, and there's really nothing wrong with it. SQL Server, however, can accurately handle date data and have a couple functions that can be used to check for leap year without passing through this complicated algorithm. What I did is just to check if the expression is a valid date expression using the ISDATE() function.

Running the queries below:

select isdate('20070229') as validdate
select isdate('20080229') as validdate

will return the following result sets:

validdate
-----------
0
validdate
-----------
1

So if I have an integer year variable, I just need to build the date string.

set nocount on
declare @YearVar int

set @YearVar = 2007
select
isdate(cast(@YearVar as varchar(4)) + '0229') as validdate

set @YearVar = 2008
select
isdate(cast(@YearVar as varchar(4)) + '0229') as validdate

Here are the result sets:

validdate
-----------
0

validdate
-----------
1

To ensure that the date string will be properly interpreted, I use the unseparated date string format. For more on that, read my old notes.

Here's a sample code that illustrate how this algorithm works:

set nocount on
declare @YearVar int

set @YearVar = 2007
select
@YearVar as YearVar,
 case
when isdate(cast(@YearVar as varchar(4)) + '0229') = 1 then
     'leap year'
   else 'not leap year'
 end as validdate

YearVar     validdate
----------- -------------
2007        not leap year

    
set @YearVar = 2008
select
@YearVar as YearVar,
 case
when isdate(cast(@YearVar as varchar(4)) + '0229') = 1 then
     'leap year'
   else 'not leap year'
 end as validdate
    
YearVar     validdate
----------- -------------
2008        leap year

~~CK

Monday, October 20, 2008

Careful EXISTS

In one of my old notes, I wrote how JOIN, IN and EXISTS are all affected by NULL values in the columns used to join tables, or to check for existence of relationship (parent-child) between tables.

Here are some queries that use EXISTS that will not give any syntax error but could give unexpected results

Consider this sample table

select * from Authors
select * from Books

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

aid         BookID      BookTitle                      BookPrice
----------- ----------- ------------------------------ -----------
1           1           Yesterday                      5.00
1           2           In My Life                     7.50
1           3           Hey Junde                      4.45
2           4           Fool On the Hill               NULL
2           6           If I Fell                      7.80
4           7           Let It Be                      NULL
4           8           Till There Was You             0.00
2           9           Yellow Submarine               34.65
1           10          I Should Have Known Better     65.33

Now if I execute these queries:

select * 
from authors 
where  exists 
   (select Authorid from Books  
        where BookTitle = 'Yesterday')

select * 
from authors
where exists 
   (select aid from Books 
      where BookTitle = 'Let It Be')

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

The first query did not result any error. This, in spite the column AuthorID does not exist in Books table.

Separately executed, the subquery will return an error:

select Authorid from Books  
  where BookTitle = 'Yesterday'

Msg 207, Level 16, State 1, Line 1
Invalid column name 'Authorid'.

The parser did not return any error even though AuthorID does not exists in Books. This is because it exists in Authors. In this case, the query will use the column in Authors table instead. This could return unexpected results.

To illustrate this query in another way:

declare @myVar int

set @myVar = 1

select @myVar myVar from Books where BookTitle = 'Yesterday'

set @myVar = 2

select @myVar myVar  from Books where BookTitle = 'Yesterday'

Here are the result sets:

myVar
-----------
1

myVar
-----------
2

I also noticed that the queries returned all rows. This is due to the nature of how EXISTS works. It checks if the subquery clause returns at least a single row. As illustrated in another way, the above query always returns a valid row regardless of the value of @myVar. The query itself is incomplete. The filter condition that should link the two table is missing. If I am to use EXISTS to relate the two tables, the query should be written this way:

select * 
from authors a
where  exists 
   (select 1 from Books b 
        where a.AuthorID = b.aid and   BookTitle = 'Yesterday')

Here’s the result set:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco

~~CK

Saturday, October 11, 2008

The Difference Between IN, EXISTS and JOIN

JOIN, IN and EXISTS can all be use to check for any parent-child table relationship. However, it's a common misconception that these three behave the same way. I found out (the hard way) that it's not always the case, specially, if the NOT operator is used to check for orphan records or broken links.

Consider this sample table:

set nocount on

create table Authors
(AuthorID int, AuthorName varchar(30), AuthorCity varchar(15))

create table Books
(aid int, BookID int, BookTitle varchar(30), BookPrice money)

insert into Authors values (1, 'Jhon', 'San Francisco')
insert into Authors values (2, 'Paul', 'Los Angeles')
insert into Authors values (3, 'George', 'San Diego')
insert into Authors values (4, 'Ringo', 'Oakland')

insert into Books values (1, 1, 'Yesterday',5.00)
insert into Books values (1, 2, 'In My Life',7.50)
insert into Books values (1, 3, 'Hey Junde',4.45)
insert into Books values (2, 4, 'Fool On the Hill',NULL)
insert into Books values (2, 6, 'If I Fell',7.80)
insert into Books values (4, 7, 'Let It Be',NULL)
insert into Books values (4, 8, 'Till There Was You',0.00)
insert into Books values (2, 9, 'Yellow Submarine',34.65)
insert into Books values
(1, 10, 'I Should Have Known Better',65.33)

select * from Authors
select * from Books

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

aid         BookID      BookTitle                      BookPrice
----------- ----------- ------------------------------ ---------------------
1           1           Yesterday                      5.00
1           2           In My Life                     7.50
1           3           Hey Junde                      4.45
2           4           Fool On the Hill               NULL
2           6           If I Fell                      7.80
4           7           Let It Be                      NULL
4           8           Till There Was You             0.00
2           9           Yellow Submarine               34.65
1           10          I Should Have Known Better     65.33

Here's how these three works in checking for existence of parent-child relationship.

select * 
from authors a
where exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid in (select aid from books)

select distinct a.*
from authors a
 join books b on a.authorid = b.aid
where b.aid is not null

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
4           Ringo                          Oakland

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
4           Ringo                          Oakland

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
4           Ringo                          Oakland

In the case of JOIN, I have to use DISTINCT to avoid returning duplicate rows. So far these queries are behaving the way I expected them to be.

Now I'll try and find parent records with no child using the NOT operator.

select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where  authorid not in (select aid from books)

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego

The queries returned the expected results. Now, see what happen if I have a row with AuthorID = NULL on the parent table.

First I'll insert the dummy record.

insert into Authors values (NULL, 'Dummy', 'Daly City')

select * from Authors

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland
NULL        Dummy                          Daly City

Running the queries again:

select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where  authorid not in (select aid from books)  

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego
NULL        Dummy                          Daly City

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
NULL        Dummy                          Daly City
3           George                         San Diego

Notice that IN did not return the expected result. It excluded the NULL on the parent table. To include that record, I actually have to add an OR condition.

select *
from authors a
where  authorid not in (select aid from books)
or authorid is null

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego
NULL        Dummy                          Daly City

Now what if I don't have a row with AuthorID = NULL on the parent table, but instead it's on the child table.

First remove the record from the Authors table :

delete from Authors where AuthorId is null

select * from Authors

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

then insert the record in Books table:

insert into Books values
(NULL, 5, 'The Long and Winding Road',5.65)

select * from Books

aid         BookID      BookTitle                      BookPrice
----------- ----------- ------------------------------ ---------------------
1           1           Yesterday                      5.00
1           2           In My Life                     7.50
1           3           Hey Junde                      4.45
2           4           Fool On the Hill               NULL
2           6           If I Fell                      7.80
4           7           Let It Be                      NULL
4           8           Till There Was You             0.00
2           9           Yellow Submarine               34.65
1           10          I Should Have Known Better     65.33
NULL        5           The Long and Winding Road      5.65

Running the queries again:

select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where  authorid not in (select aid from books)  

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego

Notice that IN did not return any result at all. Now what will happen if both the parent and the child tables have AuthorId = NULL and AID = NULL.

First I'll insert back the row on Authors table:

insert into Authors values (NULL, 'Dummy', 'Daly City')

Now the tables look like this:

select * from Authors
select * from Books

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland
NULL        Dummy                          Daly City

aid         BookID      BookTitle                      BookPrice
----------- ----------- ------------------------------ ---------------------
1           1           Yesterday                      5.00
1           2           In My Life                     7.50
1           3           Hey Junde                      4.45
2           4           Fool On the Hill               NULL
2           6           If I Fell                      7.80
4           7           Let It Be                      NULL
4           8           Till There Was You             0.00
2           9           Yellow Submarine               34.65
1           10          I Should Have Known Better     65.33
NULL        5           The Long and Winding Road      5.65

First I'll check for existence of parent-child relationship

select * 
from authors a
where exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid in (select aid from books)

select distinct a.*
from authors a
 join books b on a.authorid = b.aid
where b.aid is not null

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
4           Ringo                          Oakland

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
4           Ringo                          Oakland

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
4           Ringo                          Oakland

So even if there is a row in both tables with AuthorID = NULL and AID = NULL the result set did not return it. This is because SQL Server can not compare two NULLs. So NULL = NULL is neither TRUE nor FALSE. Since the condition is not TRUE, which is the primary requirement for the filter to include the row in the result set, it was excluded.

Now, if I try and see if there are records on the parent table that does not have a child record…

select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where  authorid not in (select aid from books)

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3           George                         San Diego
NULL        Dummy                          Daly City

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
NULL        Dummy                          Daly City
3           George                         San Diego

The query that uses IN did not return any result set. This is because IN will not return any rows if the list contains a NULL value. This is true whether the list is a list of values or a subquery.

select * from Authors

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland
NULL        Dummy                          Daly City

select *
from authors a
where authorid not in (1,4,NULL)

Here's the result set:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------

Based on these samples IN is the most unreliable way among the three in terms of checking parent-child relationship, specially if either or both the parent and child tables have a NULL on value on the column used as join key.

Although some of these situations may be avoided using the Primary Key and Foreign Key Constraint built in SQL Server, these inconsistencies could happen while using these queries during data analysis stage.

~~CK

Wednesday, October 8, 2008

Updating a Row with Value from Preceding Row

I found this problem on one of the SQL Server forums that I am regular contributor of.

Given this table:

set nocount on
declare @yourtable table
(recnum int,
Col2 varchar(2),
FKCol3 varchar(15),
Col4 varchar(10))

insert into @yourtable values (1, 'AA', 'key0AA', 'Somedata')
insert into @yourtable values (2, 'CC', NULL, 'Somedata')
insert into @yourtable values (3, 'CC', NULL, 'Somedata')
insert into @yourtable values (4, 'AA', 'key0BB', 'Somedata')
insert into @yourtable values (5, 'BB', NULL, 'Somedata')
insert into @yourtable values (6, 'CC', NULL, 'Somedata')
insert into @yourtable values (7, 'AA', 'key0CC', 'Somedata')
insert into @yourtable values (8, 'CC', NULL, 'Somedata')
insert into @yourtable values (9, 'CC', NULL, 'Somedata')
insert into @yourtable values (10, 'AA', 'key0DD', 'Somedata')
insert into @yourtable values (11, 'AA', 'key0EE', 'Somedata')

select *
from @YourTable 


recnum      Col2 FKCol3          Col4
----------- ---- --------------- ----------
1           AA   key0AA          Somedata
2           CC   NULL            Somedata
3           CC   NULL            Somedata
4           AA   key0BB          Somedata
5           BB   NULL            Somedata
6           CC   NULL            Somedata
7           AA   key0CC          Somedata
8           CC   NULL            Somedata
9           CC   NULL            Somedata
10          AA   key0DD          Somedata
11          AA   key0EE          Somedata

The problem: The user wants to replace those PKCol3 that are NULL by the non-null PKCol3 that precedes it. In this case, records 2 and 3 will have PKCol3 with "key0AA". Records 5 and 6 will have PKCol3 = "key0BB", and so on.This has to be done for almost 2 million+ rows.

Here's the code I suggested:

First, here's a query that can be used for analysis:

select recnum, Col2, FKCol3, Col4,
newCol3 = isnull((
   select top 1 FKCol3 
   from @YourTable b    
   where a.recnum > b.recnum and b.FKCol3 is not null
    and a.FKCol3 is null 
   order by b.recnum desc),a.FKCol3)
from @YourTable a

Here's the result set:

recnum      Col2 FKCol3          Col4       newCol3
----------- ---- --------------- ---------- ---------------
1           AA   key0AA          Somedata   key0AA
2           CC   NULL            Somedata   key0AA
3           CC   NULL            Somedata   key0AA
4           AA   key0BB          Somedata   key0BB
5           BB   NULL            Somedata   key0BB
6           CC   NULL            Somedata   key0BB
7           AA   key0CC          Somedata   key0CC
8           CC   NULL            Somedata   key0CC
9           CC   NULL            Somedata   key0CC
10          AA   key0DD          Somedata   key0DD
11          AA   key0EE          Somedata   key0EE

The intention is that newCol3 will be the new value of PKCol3.

Executing the update:

update @yourtable
set FKCol3 = isnull((
   select top 1 FKCol3
   from @YourTable b 
   where a.recnum > b.recnum and b.FKCol3 is not null
   and a.FKCol3 is null 
    order by b.recnum desc),a.FKCol3)
from @yourtable a

select *
from @YourTable

Here is the updated result set:

recnum      Col2 FKCol3          Col4
----------- ---- --------------- ----------
1           AA   key0AA          Somedata
2           CC   key0AA          Somedata
3           CC   key0AA          Somedata
4           AA   key0BB          Somedata
5           BB   key0BB          Somedata
6           CC   key0BB          Somedata
7           AA   key0CC          Somedata
8           CC   key0CC          Somedata
9           CC   key0CC          Somedata
10          AA   key0DD          Somedata
11          AA   key0EE          Somedata

This script will only work if there's a unique column (in this case recnum) that can be used to identify and sort the rows.

~~ CK