MS-SQL Server Notes: function

Showing posts with label function. Show all posts

Thursday, November 1, 2018

Finding Specific Dates in SQL Server

I gathered some useful date functions around the net. This is will be helpful in finding specific dates. Here are some of them:

declare @BaseDate datetime

set @BaseDate='20120205 21:30:27.380'

SELECT
Today=isnull(@BaseDate,GETDATE()),

DayToday=datename(dw,isnull(@BaseDate,GETDATE())),

TodayMidnight=DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0),

TodayEndofDay=dateadd(ms,-2,DATEADD(dd,DATEDIFF(dd,0,DATEADD(d,+1,isnull(@BaseDate,GETDATE()))),0)),


Yesterday=DATEADD(d,-1,isnull(@BaseDate,GETDATE())),

YesterdayMidnight=DATEADD(dd,DATEDIFF(dd,0,DATEADD(d,-1,isnull(@BaseDate,GETDATE()))),0),

YesterdayEndofDay=dateadd(ms,-2,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0)),


Tomorrow=DateAdd(dd,1,isnull(@BaseDate,GETDATE())),

TomorrowMidnight=DATEADD(dd,DATEDIFF(dd,0,DateAdd(dd,1,isnull(@BaseDate,GETDATE()))),0),

TomorrowEndofDay=dateadd(ms,-2,DATEADD(dd,DATEDIFF(dd,0,DATEADD(d,+1,DateAdd(dd,1,isnull(@BaseDate,GETDATE())))),0)),


FirstDayofWeek=dateadd(dd,(datepart(dw,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0))-1)*-1,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0)),

LastDayofWeek=dateadd(ms,-2,dateadd(dd,8-datepart(dw,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0)),DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0))),

FirstDayofLastWeek=dateadd(dd,-7,dateadd(dd,(datepart(dw,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0))-1)*-1,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0))),

LastDayofLastWeek=dateadd(dd,-7,dateadd(ms,-2,dateadd(dd,8-datepart(dw,DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0)),DATEADD(dd,DATEDIFF(dd,0,isnull(@BaseDate,GETDATE())),0)))),


FirstDayofMonth=DATEADD(mm,DATEDIFF(mm,0,isnull(@BaseDate,GETDATE())),0),

LastDayofMonth=DATEADD(ms,-3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,isnull(@BaseDate,GETDATE()))+1,0))),

FirstDayofLastMonth=DATEADD(mm,-1,DATEADD(mm,DATEDIFF(mm,0,isnull(@BaseDate,GETDATE())),0)),

LastDayofLastMonth=DATEADD(ms,-3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,isnull(@BaseDate,GETDATE())),0))),


LastDayLastQuarter=DATEADD(dd, -1, DATEADD(qq, DATEDIFF(qq, 0, isnull(@BaseDate,GETDATE())), 0)),

FirstDayThisQuarter=DATEADD(qq, DATEDIFF(qq, 0, isnull(@BaseDate,GETDATE())), 0),

LastDayThisQuarter=DATEADD (dd, -1, DATEADD(qq, DATEDIFF(qq, 0, isnull(@BaseDate,GETDATE())) +1, 0)),

FirstdayNextQuarter=DATEADD(qq, DATEDIFF(qq, 0, isnull(@BaseDate,GETDATE())) + 1, 0),

LastDayNextQuarter= DATEADD (dd, -1, DATEADD(qq, DATEDIFF(qq, 0, isnull(@BaseDate,GETDATE())) +2, 0)),


FirstDayofYear=DATEADD(yy,DATEDIFF(yy,0,isnull(@BaseDate,GETDATE())),0),

LastDayofYear=DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,isnull(@BaseDate,GETDATE()))+1,0))),

FirstDayofLastYear=DATEADD(yy,-1,DATEADD(yy,DATEDIFF(yy,0,isnull(@BaseDate,GETDATE())),0)),

LastDayofLastYear=DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,isnull(@BaseDate,GETDATE())),0)))

Here's the result set:

Today                   DayToday    TodayMidnight           TodayEndofDay           
----------------------- ----------- ----------------------- ----------------------- 
2012-02-05 21:30:27.380 Sunday      2012-02-05 00:00:00.000 2012-02-05 23:59:59.997 


Yesterday               YesterdayMidnight       YesterdayEndofDay       
----------------------- ----------------------- ----------------------- 
2012-02-04 21:30:27.380 2012-02-04 00:00:00.000 2012-02-04 23:59:59.997 


Tomorrow                TomorrowMidnight         TomorrowEndofDay        
----------------------- ----------------------- ----------------------- 
2012-02-06 21:30:27.380 2012-02-06 00:00:00.000 2012-02-06 23:59:59.997 


FirstDayofWeek          LastDayofWeek           
----------------------- ----------------------- 
2012-01-30 00:00:00.000 2012-02-05 23:59:59.997 


FirstDayofLastWeek      LastDayofLastWeek       
----------------------- ----------------------- 
2012-01-23 00:00:00.000 2012-01-29 23:59:59.997 


FirstDayofMonth         LastDayofMonth          
----------------------- ----------------------- 
2012-02-01 00:00:00.000 2012-02-29 23:59:59.997 


FirstDayofLastMonth     LastDayofLastMonth      
----------------------- -----------------------
2012-01-01 00:00:00.000 2012-01-31 23:59:59.997 


FirstDayLastQuarter     LastDayLastQuarter
----------------------- -----------------------
2011-10-01 00:00:00.000 2011-12-31 00:00:00.000


FirstDayThisQuarter     LastDayThisQuarter
----------------------- -----------------------
2012-01-01 00:00:00.000 2012-03-31 00:00:00.000


FirstdayNextQuarter     LastDayNextQuarter
----------------------- -----------------------
2012-04-01 00:00:00.000 2012-06-30 00:00:00.000


FirstDayofYear          LastDayofYear           
----------------------- ----------------------- 
2012-01-01 00:00:00.000 2012-12-31 23:59:59.997 


FirstDayofLastYear      LastDayofLastYear
----------------------- -----------------------
2011-01-01 00:00:00.000 2011-12-31 23:59:59.997

Remember, the variable needs to be of DateTime datatype, not SmallDateTime.

~~ CK

Monday, February 9, 2015

Why IsNumeric() returns unexpected results?

IsNumeric() is a handy function that returns 1 when the input expression evaluates to a valid numeric data type; otherwise it returns 0.

Reading carefully, it can sometimes return unexpected value. Book Online notes that it returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($).

There are also some accounting symbols that evaluates to a number, hence the function returns 1. The most commonly forgotten reason is "Scientific Notation". The letters "e" and "d" are used for different numeric notations. These letters, as is, do not evaluate as a number and the function will return 0.

As these samples shows:

SELECT Expression = '''d''', IsItaNumber = ISNUMERIC('d')
SELECT Expression = '''e''', IsItaNumber = ISNUMERIC('e')

Here are the results:

Expression IsItANumber
---------- -----------
'd'        0

Expression IsItANumber
---------- -----------
'e'        0

Included in numbers, however will return a different result

SELECT Expression = '''097d89''',  IsItANumber = ISNUMERIC('097d89')
SELECT Expression = '''489e239''', IsItANumber = ISNUMERIC('489e239')

Here are the results:

Expression IsItANumber
---------- -----------
'097d89'   1

Expression IsItANumber
---------- -----------
'489e239'  1

Enter Regular Expression...

SELECT Expression = '''097d89''',  IsItANumber = ISNUMERIC('097d89'),
     RegExIsItANumber = case when '097d89' NOT LIKE '%[^0-9]%' then 1 else 0 end

SELECT Expression = '''489e239''', IsItANumber = ISNUMERIC('489e239'), 
     RegExIsItANumber = case when '489e239' NOT LIKE '%[^0-9]%' then 1 else 0 end

Here are the results:

Expression IsItANumber RegExIsItANumber
---------- ----------- ----------------
'097d89'   1           0

Expression IsItANumber RegExIsItANumber
---------- ----------- ----------------
'489e239'  1           0

The double negative expression looks confusing. Simply put, the condition below will return TRUE if all the character in the expression is a digit from 0 through 9.

[CharExpression] NOT LIKE '%[^0-9]%'

Be careful, this regular expression only checks for digits. So if the input has a decimal point, it will still return FALSE.

UPDATE:

Starting SQL 2012, a new function is provided for checking if a value may be converted to a specific data type. TRY_CONVERT() is a function that converts a value to a specific data type and returns a NULL value if the value cannot be converted to the expected data type.

Here's a sample:

select ValueToBeConverted = '1e35', IsNumericFunction = isnumeric(1e35), 
       ConvertedValue = try_convert(int, 1e35), ConvertedValue = try_convert(float, 1e35)

Here's the resultset:

ValueToBeConverted IsNumericFunction ConvertedToInt ConvertedToFloat
------------------ ----------------- -------------- ----------------------
1e35               1                 NULL           1E+35

Notice that the fourth column returned a valid value as the letter "e" is a part of numeric value.

As always, comments are welcome...

~~ CK

Saturday, January 21, 2012

Replace Multiple Spaces with One

Once in a while I encounter requirements to remove multiple spaces in a string column. My most common solution is to use loop to replace multiple spaces with one. The problem starts when the value that needs cleaning is a column in huge table. There's a possibility that the operation may result in multiple table write or an RBAR (Row By Agonizing Row).

Browsing around I saw this function. The best part of this function is that it's a set-based operation which, most of the time, is faster. With permission to it's original author, I'm posting it here so I can use it next time I need it.

CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKII(@S VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
RETURN   replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(@s)),
  '                                 ',' '),
  '                 ',' '),
  '         ',' '),
  '     ',' '),
  '   ',' '),
  '  ',' '),
  '  ',' ')
END

Big thanks to Michael Mierruth

~~ CK

Friday, April 10, 2009

A Power of Two in a Recursive Function

Here's a table that represents the sequence of power two raised to a sequence of integers

power    2 raise to power
------   ---------------
19           524,288
18           262,144
17           131,072
16            65,536
15            32,768
14            16,384
13             8,192
12             4,096
11             2,048
10             1,024
9               512
8               256
7               128
6                64
5                32
4                16
3                 8
2                 4
1                 2
0                 1

Now, given a number, say 42. Looking at the list, we first find the first product (2 raise to power) that is less than 42, which is 32, and get the difference (42-32=10). Now we find the first number that is less than 10, which is 8 and get the difference (10-8=2). We keep doing it until the difference becomes zero.
Let's say the number is 511. The sequence would be 511-256=255-128=127-64=63; then 63-32=31-16=15-8=7-4=3-2=1-1=0.
I created a recursive function returning the sequence of integers that served as exponents in the power of two series (make sense?). So in the first sample, 42 should return the list 32, 8, 2. In the second sample, 511 should return the list, 256, 128, 64, 32, 16, 8, 4, 2, 1 Here's the function I created:

create function GetSeries(@intVar int)
returns @t table (seqn int, series int)
as
begin

declare @seqn int, @series int

select top 1 @seqn = seqn, @series = power(2, seqn) from
(select top 20 seqn = (select count(*)
            from sysobjects b where a.id > b.id)
from sysobjects a) seqnkey
where power(2, seqn) <= @intVar order by 1 desc   set @intVar = @intVar - @series    if @intVar > 0
insert into @t (seqn, series)
select @seqn, @series
union
select * from dbo.GetSeries(@intVar)
else
insert into @t (seqn, series)
select isnull(@seqn,0), isnull(@series, power(2, 0))
return
end

Here's the sample output:

select * from GetSeries(42)
order by series desc

Here's the resultset:

seqn     series
----     ------
5         32
3          8
1          2

This may be a simple function, but there are a number of application that can use this. Also, this is a good sample of a recursive function.

~~ CK

Sunday, October 26, 2008

How to check for leap year?

Checking leap year is one of the most common checks that I perform in date values. Wikipedia have this pseudo-code to calculate for leap year:

if ((year modulo 4 is 0) and (year modulo 100 is not 0)) or
(year modulo 400 is 0) then 
   leap
else 
   not_leap

I've been using this algorithm all my programming life, and there's really nothing wrong with it. SQL Server, however, can accurately handle date data and have a couple functions that can be used to check for leap year without passing through this complicated algorithm. What I did is just to check if the expression is a valid date expression using the ISDATE() function.

Running the queries below:

select isdate('20070229') as validdate
select isdate('20080229') as validdate

will return the following result sets:

validdate
-----------
0
validdate
-----------
1

So if I have an integer year variable, I just need to build the date string.

set nocount on
declare @YearVar int

set @YearVar = 2007
select
isdate(cast(@YearVar as varchar(4)) + '0229') as validdate

set @YearVar = 2008
select
isdate(cast(@YearVar as varchar(4)) + '0229') as validdate

Here are the result sets:

validdate
-----------
0

validdate
-----------
1

To ensure that the date string will be properly interpreted, I use the unseparated date string format. For more on that, read my old notes.

Here's a sample code that illustrate how this algorithm works:

set nocount on
declare @YearVar int

set @YearVar = 2007
select
@YearVar as YearVar,
 case
when isdate(cast(@YearVar as varchar(4)) + '0229') = 1 then
     'leap year'
   else 'not leap year'
 end as validdate

YearVar     validdate
----------- -------------
2007        not leap year

    
set @YearVar = 2008
select
@YearVar as YearVar,
 case
when isdate(cast(@YearVar as varchar(4)) + '0229') = 1 then
     'leap year'
   else 'not leap year'
 end as validdate
    
YearVar     validdate
----------- -------------
2008        leap year

~~CK

Monday, October 20, 2008

Careful EXISTS

In one of my old notes, I wrote how JOIN, IN and EXISTS are all affected by NULL values in the columns used to join tables, or to check for existence of relationship (parent-child) between tables.

Here are some queries that use EXISTS that will not give any syntax error but could give unexpected results

Consider this sample table

select * from Authors
select * from Books

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

aid         BookID      BookTitle                      BookPrice
----------- ----------- ------------------------------ -----------
1           1           Yesterday                      5.00
1           2           In My Life                     7.50
1           3           Hey Junde                      4.45
2           4           Fool On the Hill               NULL
2           6           If I Fell                      7.80
4           7           Let It Be                      NULL
4           8           Till There Was You             0.00
2           9           Yellow Submarine               34.65
1           10          I Should Have Known Better     65.33

Now if I execute these queries:

select * 
from authors 
where  exists 
   (select Authorid from Books  
        where BookTitle = 'Yesterday')

select * 
from authors
where exists 
   (select aid from Books 
      where BookTitle = 'Let It Be')

Here are the result sets:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco
2           Paul                           Los Angeles
3           George                         San Diego
4           Ringo                          Oakland

The first query did not result any error. This, in spite the column AuthorID does not exist in Books table.

Separately executed, the subquery will return an error:

select Authorid from Books  
  where BookTitle = 'Yesterday'

Msg 207, Level 16, State 1, Line 1
Invalid column name 'Authorid'.

The parser did not return any error even though AuthorID does not exists in Books. This is because it exists in Authors. In this case, the query will use the column in Authors table instead. This could return unexpected results.

To illustrate this query in another way:

declare @myVar int

set @myVar = 1

select @myVar myVar from Books where BookTitle = 'Yesterday'

set @myVar = 2

select @myVar myVar  from Books where BookTitle = 'Yesterday'

Here are the result sets:

myVar
-----------
1

myVar
-----------
2

I also noticed that the queries returned all rows. This is due to the nature of how EXISTS works. It checks if the subquery clause returns at least a single row. As illustrated in another way, the above query always returns a valid row regardless of the value of @myVar. The query itself is incomplete. The filter condition that should link the two table is missing. If I am to use EXISTS to relate the two tables, the query should be written this way:

select * 
from authors a
where  exists 
   (select 1 from Books b 
        where a.AuthorID = b.aid and   BookTitle = 'Yesterday')

Here’s the result set:

AuthorID    AuthorName                     AuthorCity
----------- ------------------------------ ---------------
1           Jhon                           San Francisco

~~CK

Tuesday, September 30, 2008

Optimizing Your WHERE Clause

The WHERE clause is one of the most commonly used optional part of a query. Simply put, it filters out rows and narrows the number of rows returned by a query based on the condition included in the clause.

It's a common misconception that the SQL optimizer will always use an index whenever the table has a useful one. This is not always the case. In some cases, index will not be use and a table/index scan will be performed resulting in slow processing.

It's also widely accepted that the arrangement of expressions and the operator used does not matter, since the optimizer will parse and prepare an execution plan for the query anyway. Although this is true most of the time, arranging the logical expression properly can improve processing.

Here are some consideration that I keep in mind whenever I build my WHERE clause.

Avoid using expression that has a function on a column. This will prevent the optimizer from using the index and instead perform a table/index scan.

This query will not use an index:

select *
from AdventureWorksDW..FactInternetSalesReason
WHERE substring(SalesOrderNumber,1,2) = 'SI'

Modified, this one will use an index:

select *
from AdventureWorksDW..FactInternetSalesReason
WHERE SalesOrderNumber like 'SI%'

If the use of a function can not be avoided, use an indexed computed column instead.

If the SQL Server is not configured to be case-sensitive, do not bother using LOWER() or UPPER() functions .

These are three identical queries that will return identical results:

select * 
from AdventureWorksDW..DimGeography 
WHERE CountryRegionCode = 'AU'

select * 
from AdventureWorksDW..DimGeography 
WHERE upper(CountryRegionCode) = 'au'

select * 
from AdventureWorksDW..DimGeography 
WHERE CountryRegionCode = 'au'

The second query, however, will not use an index.

Use the equal (=) operator to compare two strings instead of LIKE.

These two queries will return the same results:

select * from AdventureWorksDW..FactInternetSales
WHERE salesordernumber = 'SO43703'

select * from AdventureWorksDW..FactInternetSales
WHERE  salesordernumber LIKE 'SO43703'

The first query is more efficient than the second one. If the LIKE operator can not be avoided, use as much leading character as much as possible. If the application performs too many LIKE operation, consider SQL Server's full-text search option instead.

select * from AdventureWorksDW..FactInternetSalesReason
WHERE salesordernumber like 'S%1'

The above query will perform faster than the query below:

select * from AdventureWorksDW..FactInternetSalesReason
WHERE salesordernumber like '%1'

Although they are not identical, given the choice, use the former rather than the latter.

Here are the most common operators in WHERE clause arranged based on best performing first:

=
>, <, >=, <= 
LIKE 
<>, NOT

Avoid using NOT operator as much as possible. Although not always the case, WHERE clause that uses the NOT operator does not utilize index.

select * from AdventureWorksDW..FactInternetSales
WHERE not ShipDateKey >= 10

will perform faster as:

select * from AdventureWorksDW..FactInternetSales
where ShipDateKey <10

Given the choice, use EXISTS() instead of IN(). Moreover, IN() have some issues handling with NULL values.

Force the optimizer to utilize an index by using index hint on the query. Use this as the last resort for optimization.

Given the choice, use BETWEEN instead of IN. The BETWEEN operator performs faster than IN.

If the clause have multiple logical expression connected by two or more AND operators, locate the expression that will be LEAST likely to be true. This way, if it's false, the clause will immediately end. If both expressions are equally likely to be false, test the least complex expression first. This way, if it's false, the more complex one need not be tested. Also, consider creating an index for a selective column or a covering index for the query.

If the clause have multiple logical expression connected by two or more OR operators, locate the expression that will be MOST likely to be true. This way, if it's true, the clause will immediately end. If both expressions are likely to be true, test the least complex expression first. This way, if it's true, the more complex one need not be tested.

Remember that IN operator is actually another form of OR, so place the most probable value at the start of the list.

A query will perform a table/index scan if it contains OR operator and if any of the referenced column does not have a useful index.

This query will perform a table/index scan. For this query to utilize an index, there must be an index on all three columns in the clause. Even if two out of the four columns have an index, it will still perform a table/index scan.

select *
from AdventureWorksDW..FactInternetSales
where ShipDateKey = 3
or PromotionKey = 3
or SalesTerritoryKey = 5
or SalesOrderLIneNumber = 3

If creating an index for each of these columns is not an option, rewrite the query to use UNION ALL (not UNION) instead. This way the query with useful index will be utilize, even if it's just the one or two out of the four query. It will still execute more efficiently.

select *
from AdventureWorksDW..FactInternetSales
where ShipDateKey = 3
union all
select *
from AdventureWorksDW..FactInternetSales
where PromotionKey = 3
union all
select *
from AdventureWorksDW..FactInternetSales
where SalesTerritoryKey = 5
union all
select *
from AdventureWorksDW..FactInternetSales
where SalesOrderLIneNumber = 3

The above query will give the same results but will run faster than using multiple ORs. If ShipDateKey and PromotionKey have useful index, it will use their respective index for that part of the query, improving the speed of the entire query.

Given the option, use EXIST or a LEFT JOIN instead of IN to test/compare the relationship of a parent-child tables.

Beware of redundant clause.

This query have a redundant WHERE clause:

select DueDateKey, ShipDateKey,
PromotionKey, CustomerKey
from AdventureWorksDW..FactInternetSales
WHERE
DueDateKey = 8 and ShipDateKey = 8 and
PromotionKey = 1 or ShipDateKey = 8 and PromotionKey = 1

ShipDateKey = 8 and PromotionKey = 1 is a subset of DueDateKey = 8 and ShipDateKey = 8 and PromotionKey = 1. What the optimizer will do is it will return all the rows requested.

Here's the result set:

DueDateKey  ShipDateKey PromotionKey CustomerKey
----------- ----------- ------------ -----------
13          8           1            11003
13          8           1            14501
13          8           1            21768
13          8           1            25863
13          8           1            28389

It may look like the query returned the requested rows efficiently. In reality what happen is, it returned these rows twice then perform a DISTINCT to remove the redundant rows.

Convert frequently running batch to stored procedure, specially, if the queries in the batch use user-defined variables. The optimizer might not take advantage of a useful index to run the queries in the batch. This is because the optimizer does not know the value of these variables when it chooses a way to access the data. If the batch is not frequently executed, consider including an INDEX hit on the query.

As much as possible, include a WHERE clause to the query. This will improve the way SQL Server retrieve the data.

~~CK