Search This Blog

Tuesday 20 May 2008

Storing Hierarchical Data - HeirarchyID

In SQL Server 2000 we were limited by the 32 level recursion limit for TSQL, and storing and querying hierarchical data in the form  of trees was really difficult and inefficient, We used Cursors or temporary tables to write these queries. But simplicity, maintenance or performances were sacrificed. Ofcourse we could bundle a bit of code in the data layer of your application to share the load however this didn't the solve the problem in reporting scenarios where the data processing was to be done on the database server

As we moved on to SQL Server 2005 it improved because of the introduction of CTE's, CTE's where beautiful solutions to solving querying hierarchical data,  I use the word beautiful because it looked nicer on the outset when you used it without knowing the limitations it worked well on development environments For e.g using the AdventureWorks database we could use a CTE to query employee manager data as shown below

WITH UpperHierarchy(EmployeeId, LastName, Manager, HierarchyOrder)
 AS
 (
    SELECT emp.EmployeeId, emp.LoginId, emp.LoginId, 1 AS HierarchyOrder
    FROM HumanResources.Employee AS emp
      WHERE emp.ManagerId isNull
    UNION ALL
    SELECT emp.EmployeeId, emp.LoginId, Parent.LastName, HierarchyOrder + 1
    FROM HumanResources.Employee AS emp
           INNER JOIN UpperHierarchy AS Parent
                 ON emp.ManagerId = parent.EmployeeId
 )
 SELECT *
 From UpperHierarchy

Although this decreased the complexity of writing queries, performance of these queries was still challenged on large databases, the optimisation of the CTE execution plans did improve things but as in any database situation the optimizer is handicapped without indexing capabilities. Indexes reduce the load on the query increasing performance and scalability. In addition in SQL 2005 the underlying storage structure was still something users had to design to suit there requirements. This just got better with the introduction of a new managed SQL CLR data type called HeirarchyID in SQL Server 2008., It is available ready to use in the databarse server now... If you now look back and remember the introduction of the CLR into SQL Server in Yukon you will appreciate this feature even more and how it has been panned out..

This data type does not store the identifier of the parent element but a set of information to locate the element in the hierarchy. This type represents a node in the tree structure. If you look at values contained in a column of HeirarchyID type, you realize that they are binary values.It is extremely compact and supports arbitrary inserts and deletions. As per MS a node in an organizational hierarchy of 100,000 people with an average fanout of 6 levels takes about 38 bits. This is rounded up to 40 bits, or 5 bytes, for storage. Because it stores the elements hierarchy in its entirety it is also indexable now..

As always there are a few limitations

  • It can hold upto 892 bytes but to be honest that should allow you to span out into a really big tree structure under a single node.
  • A query with the FOR XML clause will fail on a table with HeirarchyID unless the column is first converted to a character data type. Use the ToString() method to convert the HeirarchyID value to the logical representation as a nvarchar(4000) data type

We can represent the HeirarchyID type in a string format. This format shows clearly information carried by this type. Indeed, string representation is formatted as is:/<index level 1>/<index level 2>/…/<index level N>. This representation corresponds to atree structure . Note that first child of a node does not have a value of 1 all the time but can have the /1.2/ value. So to play around a bit we first need a table with a column of HeirarchyID type and an index on the hierarchy ID column

CREATE TABLE Organization
(
    EmployeeID heirarchyid NOT NULL,
    EmployeeName nvarchar(50) NOT NULL
)

ALTER TABLE dbo.Organization
ADD HierarchyLevel As EmployeeID.GetLevel()

CREATE INDEX IX_Employee
ON Organization(HierarchyLevel,EmployeeID);

To populate the data table we use the CTE we mentioned earlier to  just modify the SELECT statement like the following

Insert Into dbo.Organization(EmployeeId, EmployeeName)
Select Node, LastName 
 From UpperHierarchy

Now Hierarchical data can be queried using the functions

HeirarchyID data type can be manipulated through a set of functions.· GetAncestor, GetDescendant, GetLevel, GetRoot, ToString, IsDescendant, Parse, Read, Reparent, Write, for details on the functions please refer to the CTP documentation of Katmai but most of them are self explanatory #

For e.g to see how we use these functions, let us insert a node as the last child of an existing node.To do this we first retrieve the sibling node.

--finding sibling node
SELECT @sibling = Max(EmployeeID)
FROM dbo.Organization
WHERE EmployeeId.GetAncestor(1)= @Parent;
--inserting node
INSERT dbo.Organization(EmployeeId, EmployeeName)
VALUES(@Parent.GetDescendant(@sibling,NULL), @Name)

We do not always want to (or can) recover the sibling node to perform insertion. There is perhaps an implied policy to determine node position. For example, let’s say we have an [order] column which position nodes among its siblings. We can compute node path as string: In this example, since the node @Parent is the root, that will give/<order>/. Thanks to the Parse() function, we can use this value to create the new node.

Declare @Parent As HeirarchyID = HeirarchyID::GetRoot() 
Declare @NewPath As varchar(10)= @Parent.ToString()+ CAST([Order] AS varchar(3))+ '/'
INSERT dbo.Organization(EmployeeId, EmployeeName) VALUES(HierarchyId::Parse(@NewPath),'aChild')

You will have note the new syntax of SQL Server 2008 to declare and assign variables in only one line. :: denotes a static method on a SQL CLR type in TSQL. So what am i getting to finally the CTE is not so much of a beauty anymore, just run this query to see what it returns

Select *
From dbo.Organization
Where @BossNode.IsDescendant(EmployeeId)

If  you run this query along side the CTE query and compare the execution plan of these queries you would see why this new feature is being talked about :)

1 comment:

Unknown said...

I am fresher in sql and this site really has some helpful pack of articles! Hierarchy Structure