Sunday, October 26, 2008

How to check for leap year?

Checking leap year is one of the most common checks that I perform in date values. Wikipedia have this pseudo-code to calculate for leap year:
if ((year modulo 4 is 0) and (year modulo 100 is not 0)) or
(year modulo 400 is 0) then

leap
else
not_leap
I've been using this algorithm all my programming life, and there's really nothing wrong with it. SQL Server, however, can accurately handle date data and have a couple functions that can be used to check for leap year without passing through this complicated algorithm. What I did is just to check if the expression is a valid date expression using the ISDATE() function.

Running the queries below:
select isdate('20070229') as validdate
select isdate('20080229') as validdate
will return the following result sets:
validdate
-----------
0
validdate
-----------
1
So if I have an integer year variable, I just need to build the date string.
set nocount on
declare @YearVar int

set @YearVar = 2007
select
isdate(cast(@YearVar as varchar(4)) + '0229') as validdate


set @YearVar = 2008
select
isdate(cast(@YearVar as varchar(4)) + '0229') as validdate
Here are the result sets:
validdate
-----------
0

validdate
-----------
1
To ensure that the date string will be properly interpreted, I use the unseparated date string format. For more on that, read my old notes.

Here's a sample code that illustrate how this algorithm works:
set nocount on
declare @YearVar int

set @YearVar = 2007
select
@YearVar as YearVar,

case
when isdate(cast(@YearVar as varchar(4)) + '0229') = 1 then

'leap year'
else 'not leap year'
end as validdate

YearVar validdate
----------- -------------
2007 not leap year


set @YearVar = 2008
select
@YearVar as YearVar,

case
when isdate(cast(@YearVar as varchar(4)) + '0229') = 1 then

'leap year'
else 'not leap year'
end as validdate

YearVar validdate
----------- -------------
2008 leap year


~~CK

Monday, October 20, 2008

Careful EXISTS

In one of my old notes, I wrote how JOIN, IN and EXISTS are all affected by NULL values in the columns used to join tables, or to check for existence of relationship (parent-child) between tables.

Here are some queries that use EXISTS that will not give any syntax error but could give unexpected results

Consider this sample table
select * from Authors
select * from Books

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland

aid BookID BookTitle BookPrice
----------- ----------- ------------------------------ -----------
1 1 Yesterday 5.00
1 2 In My Life 7.50
1 3 Hey Junde 4.45
2 4 Fool On the Hill NULL
2 6 If I Fell 7.80
4 7 Let It Be NULL
4 8 Till There Was You 0.00
2 9 Yellow Submarine 34.65
1 10 I Should Have Known Better 65.33
Now if I execute these queries:
select * 
from authors
where exists
(select Authorid from Books
where BookTitle = 'Yesterday')

select *
from authors
where exists
(select aid from Books
where BookTitle = 'Let It Be')

Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland

The first query did not result any error. This, in spite the column AuthorID does not exist in Books table.

Separately executed, the subquery will return an error:
select Authorid from Books  
where BookTitle = 'Yesterday'

Msg 207, Level 16, State 1, Line 1
Invalid column name 'Authorid'.
The parser did not return any error even though AuthorID does not exists in Books. This is because it exists in Authors. In this case, the query will use the column in Authors table instead. This could return unexpected results.

To illustrate this query in another way:
declare @myVar int

set @myVar = 1

select @myVar myVar from Books where BookTitle = 'Yesterday'

set @myVar = 2

select @myVar myVar from Books where BookTitle = 'Yesterday'

Here are the result sets:

myVar
-----------
1

myVar
-----------
2
I also noticed that the queries returned all rows. This is due to the nature of how EXISTS works. It checks if the subquery clause returns at least a single row. As illustrated in another way, the above query always returns a valid row regardless of the value of @myVar. The query itself is incomplete. The filter condition that should link the two table is missing. If I am to use EXISTS to relate the two tables, the query should be written this way:
select * 
from authors a
where exists
(select 1 from Books b
where a.AuthorID = b.aid and BookTitle = 'Yesterday')
Here’s the result set:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco


~~CK

Saturday, October 11, 2008

The Difference Between IN, EXISTS and JOIN

JOIN, IN and EXISTS can all be use to check for any parent-child table relationship. However, it's a common misconception that these three behave the same way. I found out (the hard way) that it's not always the case, specially, if the NOT operator is used to check for orphan records or broken links.

Consider this sample table:
set nocount on

create table Authors
(AuthorID int, AuthorName varchar(30), AuthorCity varchar(15))

create table Books
(aid int, BookID int, BookTitle varchar(30), BookPrice money)

insert into Authors values (1, 'Jhon', 'San Francisco')
insert into Authors values (2, 'Paul', 'Los Angeles')
insert into Authors values (3, 'George', 'San Diego')
insert into Authors values (4, 'Ringo', 'Oakland')

insert into Books values (1, 1, 'Yesterday',5.00)
insert into Books values (1, 2, 'In My Life',7.50)
insert into Books values (1, 3, 'Hey Junde',4.45)
insert into Books values (2, 4, 'Fool On the Hill',NULL)
insert into Books values (2, 6, 'If I Fell',7.80)
insert into Books values (4, 7, 'Let It Be',NULL)
insert into Books values (4, 8, 'Till There Was You',0.00)
insert into Books values (2, 9, 'Yellow Submarine',34.65)
insert into Books values
(1, 10, 'I Should Have Known Better',65.33)


select * from Authors
select * from Books

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland

aid BookID BookTitle BookPrice
----------- ----------- ------------------------------ ---------------------
1 1 Yesterday 5.00
1 2 In My Life 7.50
1 3 Hey Junde 4.45
2 4 Fool On the Hill NULL
2 6 If I Fell 7.80
4 7 Let It Be NULL
4 8 Till There Was You 0.00
2 9 Yellow Submarine 34.65
1 10 I Should Have Known Better 65.33
Here's how these three works in checking for existence of parent-child relationship.
select * 
from authors a
where exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid in (select aid from books)

select distinct a.*
from authors a
join books b on a.authorid = b.aid
where b.aid is not null
Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
4 Ringo Oakland

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
4 Ringo Oakland

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
4 Ringo Oakland
In the case of JOIN, I have to use DISTINCT to avoid returning duplicate rows. So far these queries are behaving the way I expected them to be.

Now I'll try and find parent records with no child using the NOT operator.
select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid not in (select aid from books)

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null
Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3 George San Diego

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
3 George San Diego

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
3 George San Diego
The queries returned the expected results. Now, see what happen if I have a row with AuthorID = NULL on the parent table.

First I'll insert the dummy record.
insert into Authors values (NULL, 'Dummy', 'Daly City')

select * from Authors

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland
NULL Dummy Daly City
Running the queries again:
select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid not in (select aid from books)

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null
Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3 George San Diego
NULL Dummy Daly City

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
3 George San Diego

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
NULL Dummy Daly City
3 George San Diego
Notice that IN did not return the expected result. It excluded the NULL on the parent table. To include that record, I actually have to add an OR condition.
select *
from authors a
where authorid not in (select aid from books)
or authorid is null


AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
3 George San Diego
NULL Dummy Daly City
Now what if I don't have a row with AuthorID = NULL on the parent table, but instead it's on the child table.

First remove the record from the Authors table :
delete from Authors where AuthorId is null

select * from Authors

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland
then insert the record in Books table:
insert into Books values
(NULL, 5, 'The Long and Winding Road',5.65)


select * from Books

aid BookID BookTitle BookPrice
----------- ----------- ------------------------------ ---------------------
1 1 Yesterday 5.00
1 2 In My Life 7.50
1 3 Hey Junde 4.45
2 4 Fool On the Hill NULL
2 6 If I Fell 7.80
4 7 Let It Be NULL
4 8 Till There Was You 0.00
2 9 Yellow Submarine 34.65
1 10 I Should Have Known Better 65.33
NULL 5 The Long and Winding Road 5.65
Running the queries again:
select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid not in (select aid from books)

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null
Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3 George San Diego

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
3 George San Diego
Notice that IN did not return any result at all. Now what will happen if both the parent and the child tables have AuthorId = NULL and AID = NULL.

First I'll insert back the row on Authors table:
insert into Authors values (NULL, 'Dummy', 'Daly City')
Now the tables look like this:
select * from Authors
select * from Books

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland
NULL Dummy Daly City

aid BookID BookTitle BookPrice
----------- ----------- ------------------------------ ---------------------
1 1 Yesterday 5.00
1 2 In My Life 7.50
1 3 Hey Junde 4.45
2 4 Fool On the Hill NULL
2 6 If I Fell 7.80
4 7 Let It Be NULL
4 8 Till There Was You 0.00
2 9 Yellow Submarine 34.65
1 10 I Should Have Known Better 65.33
NULL 5 The Long and Winding Road 5.65
First I'll check for existence of parent-child relationship
select * 
from authors a
where exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid in (select aid from books)

select distinct a.*
from authors a
join books b on a.authorid = b.aid
where b.aid is not null
Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
4 Ringo Oakland

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
4 Ringo Oakland

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
4 Ringo Oakland
So even if there is a row in both tables with AuthorID = NULL and AID = NULL the result set did not return it. This is because SQL Server can not compare two NULLs. So NULL = NULL is neither TRUE nor FALSE. Since the condition is not TRUE, which is the primary requirement for the filter to include the row in the result set, it was excluded.

Now, if I try and see if there are records on the parent table that does not have a child record…
select * 
from authors a
where not exists (select aid from Books b where authorid = aid)

select *
from authors a
where authorid not in (select aid from books)

select distinct a.*
from authors a
left join books b on a.authorid = b.aid
where b.aid is null
Here are the result sets:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
3 George San Diego
NULL Dummy Daly City

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
NULL Dummy Daly City
3 George San Diego
The query that uses IN did not return any result set. This is because IN will not return any rows if the list contains a NULL value. This is true whether the list is a list of values or a subquery.
select * from Authors

AuthorID AuthorName AuthorCity
----------- ------------------------------ ---------------
1 Jhon San Francisco
2 Paul Los Angeles
3 George San Diego
4 Ringo Oakland
NULL Dummy Daly City

select *
from authors a
where authorid not in (1,4,NULL)
Here's the result set:
AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
Based on these samples IN is the most unreliable way among the three in terms of checking parent-child relationship, specially if either or both the parent and child tables have a NULL on value on the column used as join key.

Although some of these situations may be avoided using the Primary Key and Foreign Key Constraint built in SQL Server, these inconsistencies could happen while using these queries during data analysis stage.



~~CK

Wednesday, October 8, 2008

Updating a Row with Value from Preceding Row

I found this problem on one of the SQL Server forums that I am regular contributor of.

Given this table:
set nocount on
declare @yourtable table
(recnum int,
Col2 varchar(2),
FKCol3 varchar(15),
Col4 varchar(10))


insert into @yourtable values (1, 'AA', 'key0AA', 'Somedata')
insert into @yourtable values (2, 'CC', NULL, 'Somedata')
insert into @yourtable values (3, 'CC', NULL, 'Somedata')
insert into @yourtable values (4, 'AA', 'key0BB', 'Somedata')
insert into @yourtable values (5, 'BB', NULL, 'Somedata')
insert into @yourtable values (6, 'CC', NULL, 'Somedata')
insert into @yourtable values (7, 'AA', 'key0CC', 'Somedata')
insert into @yourtable values (8, 'CC', NULL, 'Somedata')
insert into @yourtable values (9, 'CC', NULL, 'Somedata')
insert into @yourtable values (10, 'AA', 'key0DD', 'Somedata')
insert into @yourtable values (11, 'AA', 'key0EE', 'Somedata')

select *
from @YourTable


recnum Col2 FKCol3 Col4
----------- ---- --------------- ----------
1 AA key0AA Somedata
2 CC NULL Somedata
3 CC NULL Somedata
4 AA key0BB Somedata
5 BB NULL Somedata
6 CC NULL Somedata
7 AA key0CC Somedata
8 CC NULL Somedata
9 CC NULL Somedata
10 AA key0DD Somedata
11 AA key0EE Somedata
The problem: The user wants to replace those PKCol3 that are NULL by the non-null PKCol3 that precedes it. In this case, records 2 and 3 will have PKCol3 with "key0AA". Records 5 and 6 will have PKCol3 = "key0BB", and so on.This has to be done for almost 2 million+ rows.

Here's the code I suggested:

First, here's a query that can be used for analysis:
select recnum, Col2, FKCol3, Col4,
newCol3 = isnull((
select top 1 FKCol3
from @YourTable b
where a.recnum > b.recnum and b.FKCol3 is not null
and a.FKCol3 is null

order by b.recnum desc),a.FKCol3)
from @YourTable a
Here's the result set:
recnum      Col2 FKCol3          Col4       newCol3
----------- ---- --------------- ---------- ---------------
1 AA key0AA Somedata key0AA
2 CC NULL Somedata key0AA
3 CC NULL Somedata key0AA
4 AA key0BB Somedata key0BB
5 BB NULL Somedata key0BB
6 CC NULL Somedata key0BB
7 AA key0CC Somedata key0CC
8 CC NULL Somedata key0CC
9 CC NULL Somedata key0CC
10 AA key0DD Somedata key0DD
11 AA key0EE Somedata key0EE
The intention is that newCol3 will be the new value of PKCol3.

Executing the update:
update @yourtable
set FKCol3 = isnull((
select top 1 FKCol3
from @YourTable b
where a.recnum > b.recnum and b.FKCol3 is not null
and a.FKCol3 is null

order by b.recnum desc),a.FKCol3)
from @yourtable a

select *
from @YourTable
Here is the updated result set:
recnum      Col2 FKCol3          Col4
----------- ---- --------------- ----------
1 AA key0AA Somedata
2 CC key0AA Somedata
3 CC key0AA Somedata
4 AA key0BB Somedata
5 BB key0BB Somedata
6 CC key0BB Somedata
7 AA key0CC Somedata
8 CC key0CC Somedata
9 CC key0CC Somedata
10 AA key0DD Somedata
11 AA key0EE Somedata
This script will only work if there's a unique column (in this case recnum) that can be used to identify and sort the rows.



~~ CK