Friday, September 19, 2008

How do SQL Server Programmers Search for Their Dates?

Being a user of SQL Serever, I dealt with searching and using dates in logical conditions, in one way or another. Initially, it was quite confusing.

Here are a couple of factors.

First, it's because, as of SQL 2005, there are no data types that holds date only, nor time only data. SQL Server stores both the date part and time part using the same data type. The only difference is the precision of these data types.

Second is because you have to use string literals to handle static date data. Users usually miss the time part of the string literal. SQL Server will default the converted time part to zero (00:00:00.000). This could lead to logical conditions to return unexpected results. (For more on all these, read my old notes).

Try and simulate some logical conditions. Consider this sample table:
declare @myTable table(rownum int, DateData datetime)
set nocount on

insert into @mytable values(1,'2005-10-14 01:36:25.440')
insert into @mytable values(2,'2005-10-14 01:36:25.497')
insert into @mytable values(3,'2005-10-14 01:36:25.570')
insert into @mytable values(4,'2005-10-14 01:36:25.627')
insert into @mytable values(5,'2005-10-14 01:36:25.683')
insert into @mytable values(6,'2005-10-14 01:36:25.740')
insert into @mytable values(7,'2005-10-15 00:00:00.000')
insert into @mytable values(8,'2008-07-24 12:52:42.360')
insert into @mytable values(9,'2008-07-25 00:00:00.000')
insert into @mytable values(10,'2008-07-25 12:38:35.060')
insert into @mytable values(11,'2008-07-25 12:38:35.137')
insert into @mytable values(12,'2008-07-26 00:00:00.000')
insert into @mytable values(13,'2008-08-13 00:00:00.000')

select * from @myTable

rownum datedata
----------- -----------------------
1 2005-10-14 01:36:25.440
2 2005-10-14 01:36:25.497
3 2005-10-14 01:36:25.570
4 2005-10-14 01:36:25.627
5 2005-10-14 01:36:25.683
6 2005-10-14 01:36:25.740
7 2005-10-15 00:00:00.000
8 2008-07-24 12:52:42.360
9 2008-07-25 00:00:00.000
10 2008-07-25 12:38:35.060
11 2008-07-25 12:38:35.137
12 2008-07-26 00:00:00.000
13 2008-08-13 00:00:00.000

Now search for specific records using the date data as filter condition :
select *
from @myTable
where DateData = '20080724'


select *
from @myTable
where DateData = '20080813'
Here are the result sets:
rownum      datedata
----------- -----------------------

rownum datedata
----------- -----------------------
13 2008-08-13 00:00:00.000
Now, why did the first query did not return any rows while the second query returned the 13th row? That's in spite of row 8 being the only one with date July 24, 2008 anyway.

A more detail analysis shows that the WHERE clause of the query compares the DateData column to a date-formatted string. In order to compare two values, they must be of the same data type, otherwise, the data type with the lower precedence will be converted to the data type with the higher precedence. In this case, datetime is higher than any other native string data types. The first query is synonymous to:
select * from @myTable
where datedata = cast('20080724' as datetime)


rownum datedata
----------- -----------------------
Since the function converted a date-literal with no time part, it sets the time to zero (00:00:00.000). Resulting to a false condition (2008-07-24 12:53:00 <> 2008-07-24 00:00:00.000).

So how do SQL Server Programmers search for their dates? Here are some options how to retrieve the 8th record.

It's always possible tp convert the date column into string with no time part and compare two strings. Something like:
select * from @myTable
where convert(varchar(20), datedata, 112) = '20080724'

rownum datedata
----------- -----------------------
8 2008-07-24 12:52:42.360
It did. Technically, there's nothing wrong with this code. It returned the desired result. The only problem that it will encounter later is performance. SQL Server will not use index to optimize our query since we use a function in date column for our WHERE clause. This is OK if the query is processing a small table. If it's reading a large volume of data, this will tremendously affect the execution time.

Another option is to grab all records with dates between 20080724 AND 20080725.
select * from @myTable
where datedata BETWEEN '20080724' and '20080725'

rownum datedata
----------- -----------------------
8 2008-07-24 12:52:42.360
9 2008-07-25 00:00:00.000
Now why did it include row 9? It's the way how BETWEEN operator works. BETWEEN is a logical operator that operates inclusive-ly. The above query is similar to:
select * from @myTable
where datedata >= '20080724' and datedata <= '20080725'
The second condition will follow the “explicit conversion of lower precedent data type with no time part” rule, so the 9th row was included to the result set. So, try and include the time part:
select * from @myTable 
where datedata
BETWEEN '20080724 00:00:00.000' and '20080724 23:59:59.999'


rownum datedata
----------- -----------------------
8 2008-07-24 12:52:42.360
9 2008-07-25 00:00:00.000
Why did it still return 2 rows? The conversion, as it is, has nothing to do with this. The second expression ('20080724 23:59:59.999') will be properly converted to a DateTime expression, complete with time part. However, the DateTime precision is 3.333 seconds (For more about DateTime and SmallDatetime rounding and precision, read my old notes) resulting the second part to be rounded to '2008-07-25 00:00:00.000'. To resolve this rounding issue, try to prevent the time part from rounding up by doing:
select * from @myTable
where datedata
BETWEEN '20080724 00:00:00.000' and '20080724 23:59:59.998'


rownum datedata
----------- -----------------------
8 2008-07-24 12:52:42.360
And if it's SmallDateTime, the code should be :
select * from @myTable 
where datedata
BETWEEN '20080724 00:00:00.000' and '20080724 23:59:00'


rownum datedata
----------- -----------------------
8 2008-07-24 12:53:00
Now there are two ways of doing it depending on which data types the query is dealing with. Although, these are all valid codes, it is not recommended to have two versions of codes to handle these two sibling data types. Also, it's necessary to be always conscious of whether it's dealing with a DateTime or a SmallDateTime expression. To be more flexible, it would be better to create something like this:
select * from @myTable 
where datedata >='20080724 ' and datedata < '20080725'

rownum datedata
----------- -----------------------
8 2008-07-24 12:53:00
The above code is clearer and SQL Server can also use index for query optimization. For range of dates, it can easily be extended and modify the condition of the same code:
select * from @myTable 
where datedata >='20080724 ' and datedata < '20080727'

rownum datedata
----------- -----------------------
8 2008-07-24 12:53:00
9 2008-07-25 00:00:00
10 2008-07-25 12:39:00
11 2008-07-25 12:39:00
12 2008-07-26 00:00:00
As a general rule, use a method that can handle both DateTime and SmallDateTime, specially if you the requirement does need to deal with the time part. Also, consider the performance of the query. Placing a date data type column in a function as part of a filter condition will ignore the index and will perform a table scan. Most importantly, if using a function is unavoidable, be aware that most date and time functions are dependent to the user's default language, SET LANGUAGE and SET DATEFORMAT settings.



~~CK

No comments:

Post a Comment