Wednesday, August 4, 2010

A Sample Table-to-XML conversion.

A question was posted on the forum that I am a member of to convert a table into an XML data. The content of table look like this:
STUDENT_ID   grade
---------- ---------
 abc123     0.2
 abc123     0.4
 abc123     0.7
 def123     0.1
 def123     0.4
 def123     0.5

The request is to convert the above data into:
<Students StudentID="abc123">
  <StudentGrade Grades="0.2" />
  <StudentGrade Grades="0.4" />
  <StudentGrade Grades="0.7" />
</Students>
<Students StudentID="def123">
  <StudentGrade Grades="0.1" />
  <StudentGrade Grades="0.4" />
  <StudentGrade Grades="0.5" />
</Students>

Here's the code I posted as a reply:
declare @TestData Table
(
STUDENT_ID VARCHAR(6),
grade Decimal(2,1)
)

insert into @TestData (student_id, grade)
select 'abc123', 0.2 UNION
select 'abc123', 0.7 UNION
select 'abc123', 0.4 UNION
select 'def123', 0.1 UNION
select 'def123', 0.5 UNION
select 'def123', 0.4

select * from @TestData

;with tag1
as
(
  select distinct student_id
  from @testdata
)
select
  1 as tag,
  0 as parent,
  student_id as [Students!1!StudentID],
  null as [StudentGrade!2!Grades]
from tag1
union all
select
  2 as tag,
  1 as parent,
  t1.student_id as [Students!1!StudentID],
  grade as [StudentGrade!2!Grades]
from tag1 t1
  left join @testdata t2 on t1.student_id = t2.student_id
order by [Students!1!StudentID], [StudentGrade!2!Grades]
for xml explicit

As always, comments are welcome...


~~CK

Returning Sequence of Numbers

For quite sometime, I use COLORDER column of syscolumns table to return a sequential number. I will usually pick a table with large number of columns enough to return my requirement. Until I come across Common Table Expression (CTE). One of the major difference between a CTE and a derived table or sub-query, is that a CTE can be self-referencing or recursive.

Consider returning a series of numbers 1-31.

Here's the code:
WITH Number_Sequence
AS
(
   SELECT 1 AS sequence
   UNION ALL
   SELECT sequence+1
   FROM Number_Sequence W
      WHERE sequence < 31
)
SELECT *
FROM Number_Sequence

Now that it's a result set, it may be used to join to another table. Since this is a CTE and not a physical table, it's not indexed and could degrade processing. Table size should always be considered before joining this CTE to another table.

Here's another sample code that simulate bit positions in a byte:
with GroupNumbers
as
(
   select 1 as byte, 1 as from_bit, 8 as to_bit
   union all
   select byte + 1, from_bit + 8, to_bit + 7
   from GroupNumbers
   where to_bit < 64
)  select * from GroupNumbers

As always, comments are welcome...


~~CK

Sunday, March 7, 2010

Returning a random set of data

A number of times, I've been requested to provide a random data for various purposes, including testing and sampling. I have found a number of ways of doing it which mostly include creating a multiple layer of sub-query and even creating temporary tables. My usual mindset is that I can not really use the TOP n keyword because, most of the time, it will return the same records every time. I also tried to use the RAND() function that can generate random number. Closer look with this function, however, says repetitive calls of RAND() with the same seed value return the same results (BOL). When I received another project that requires a randomly generated set of data, I started searching the net on how to do this in a simpler way. This time, I leverage an existing function that will always generate a random value. The NEWID() function. This function always returns a unique value of uniqueidentifier data type.

Here's a sample table:
BookId         BookTitle
------      ---------------
1          Yesterday
2          In My Life
3          Hey Junde
4          Fool On the Hill
5          If I Fell
6          Let It Be
7          Till There Was You
8          Yellow Submarine
9          I Should Have Known Better
Running a simple TOP X% query always return the same set of records.
select  TOP 30 PERCENT BookId, BookTitle from books
Using the RAND() function always returns the same set of records:
select  TOP 30 PERCENT BookId, BookTitle from books
order by rand(rand(1))

BookId         BookTitle
------      ---------------
1          Yesterday
2          In My Life
3          Hey Junde
Using the NEWID() function always returns a random set of records
select  TOP 30 PERCENT BookId, BookTitle from books
order by newid()

BookId         BookTitle
------      ---------------
4          Fool On the Hill
5          If I Fell
1          Yesterday
Here's another run
select  TOP 30 PERCENT BookId, BookTitle from books
order by newid()

BookId         BookTitle
------      ---------------
5          If I Fell
8          Yellow Submarine
2          In My Life
So far, I have not seen any issue using this function in a big table.


~CK