Search This Blog

Thursday, 22 May 2008

RESEED

Just learnt a new TSQL term , although i did know this functionality is achievable in other ways, i hadn't come across this one before. RESEED resets the seed value of the IDENTITY. However, SQL Server 2000 works differently on RESEED for virgin tables when compared with 2005/2008.

For the table_name, to reset the seed value of the identity column to a new_value DBCC CHECKIDENT(table_name, RESEED, new_value) does the trick.

In SQL Server 2000 RESEED always increments the seed value but on 2005/2008, it doesn't increment but starts with the new_value.

Tuesday, 20 May 2008

Cruise Control .Net farms

In the present organisation that i work for , we work as four teams continuously developing code and checking it into TFS and as a consequence of this we have adopted the process of continuous integration using Cruise Control.Net . Well using Cruise Control.Net along side Nant to have automated build processes is nothing new, but what we have achieved and are planning to achieve is hopefully a unique implementation of a automated build platform. We have about nine different projects having there own automated builds and since Nant is not able to scale across multiple CPUs, the resources on the server are underutilized and wait times have increased considerably. So as a solution to this we have not installed two instances of Cruise Control on the main build server one called the current build which developers use to build binaries and test , while the nightly build runs on a scheduled basis generating the end product installers. Sometimes people may still need a setup and nightly build allows forcing builds manually using the CC Tray application.

So to achieve this there were a few bottle necks we had to work around.

  • Cruise Control does not install two instances completely, for starters it does install the windows service for itself when you install a second instance. So you have to manually install the service using the installutil.exe in the .Net framework.
  • Since you will have two instances building simultaneously you will need to isolate the identities under which these build run and the physical locations of these builds.
  • The cruise control server web dashboard also has to be configured manually as separate virtual directories.
  • The cruise control manager for each instance has to be configured to use a different port, the default install of CC.Net using 21234, we could use something like 21244 as a series . In case you have a firewall make sure your network admin allows requests to this port (CC Tray uses these ports to communicate with the server)

Now that said we have a single server which has two instances of Cruise Control, The build outputs are copied on to a network share. We now have another build server where we are going to replicate the set up of the first server and then split the CC projects across the two physical servers to share the load of the builds. It all looks simple, but thanks to the guy who authored a Templated build where I have been able to enhance it to achieve the following,  We have been constantly updating the build or the last one year and we are close to achieving end to end automation for the product, so when we check-in code to TFS , we are able to dish out a CD ISO image for the products to the network share in the nightly build. We also have switches as properties in the build scripts which can allow the current builds to create setups incase the nightly build is overloaded. Due to constant development work and support issues we have to squeeze work like this as part of our non functional sprint work. But then scrum treats engineering work as Non functional and is rightly justified in doing so.

Storing Hierarchical Data - HeirarchyID

In SQL Server 2000 we were limited by the 32 level recursion limit for TSQL, and storing and querying hierarchical data in the form  of trees was really difficult and inefficient, We used Cursors or temporary tables to write these queries. But simplicity, maintenance or performances were sacrificed. Ofcourse we could bundle a bit of code in the data layer of your application to share the load however this didn't the solve the problem in reporting scenarios where the data processing was to be done on the database server

As we moved on to SQL Server 2005 it improved because of the introduction of CTE's, CTE's where beautiful solutions to solving querying hierarchical data,  I use the word beautiful because it looked nicer on the outset when you used it without knowing the limitations it worked well on development environments For e.g using the AdventureWorks database we could use a CTE to query employee manager data as shown below

WITH UpperHierarchy(EmployeeId, LastName, Manager, HierarchyOrder)
 AS
 (
    SELECT emp.EmployeeId, emp.LoginId, emp.LoginId, 1 AS HierarchyOrder
    FROM HumanResources.Employee AS emp
      WHERE emp.ManagerId isNull
    UNION ALL
    SELECT emp.EmployeeId, emp.LoginId, Parent.LastName, HierarchyOrder + 1
    FROM HumanResources.Employee AS emp
           INNER JOIN UpperHierarchy AS Parent
                 ON emp.ManagerId = parent.EmployeeId
 )
 SELECT *
 From UpperHierarchy

Although this decreased the complexity of writing queries, performance of these queries was still challenged on large databases, the optimisation of the CTE execution plans did improve things but as in any database situation the optimizer is handicapped without indexing capabilities. Indexes reduce the load on the query increasing performance and scalability. In addition in SQL 2005 the underlying storage structure was still something users had to design to suit there requirements. This just got better with the introduction of a new managed SQL CLR data type called HeirarchyID in SQL Server 2008., It is available ready to use in the databarse server now... If you now look back and remember the introduction of the CLR into SQL Server in Yukon you will appreciate this feature even more and how it has been panned out..

This data type does not store the identifier of the parent element but a set of information to locate the element in the hierarchy. This type represents a node in the tree structure. If you look at values contained in a column of HeirarchyID type, you realize that they are binary values.It is extremely compact and supports arbitrary inserts and deletions. As per MS a node in an organizational hierarchy of 100,000 people with an average fanout of 6 levels takes about 38 bits. This is rounded up to 40 bits, or 5 bytes, for storage. Because it stores the elements hierarchy in its entirety it is also indexable now..

As always there are a few limitations

  • It can hold upto 892 bytes but to be honest that should allow you to span out into a really big tree structure under a single node.
  • A query with the FOR XML clause will fail on a table with HeirarchyID unless the column is first converted to a character data type. Use the ToString() method to convert the HeirarchyID value to the logical representation as a nvarchar(4000) data type

We can represent the HeirarchyID type in a string format. This format shows clearly information carried by this type. Indeed, string representation is formatted as is:/<index level 1>/<index level 2>/…/<index level N>. This representation corresponds to atree structure . Note that first child of a node does not have a value of 1 all the time but can have the /1.2/ value. So to play around a bit we first need a table with a column of HeirarchyID type and an index on the hierarchy ID column

CREATE TABLE Organization
(
    EmployeeID heirarchyid NOT NULL,
    EmployeeName nvarchar(50) NOT NULL
)

ALTER TABLE dbo.Organization
ADD HierarchyLevel As EmployeeID.GetLevel()

CREATE INDEX IX_Employee
ON Organization(HierarchyLevel,EmployeeID);

To populate the data table we use the CTE we mentioned earlier to  just modify the SELECT statement like the following

Insert Into dbo.Organization(EmployeeId, EmployeeName)
Select Node, LastName 
 From UpperHierarchy

Now Hierarchical data can be queried using the functions

HeirarchyID data type can be manipulated through a set of functions.· GetAncestor, GetDescendant, GetLevel, GetRoot, ToString, IsDescendant, Parse, Read, Reparent, Write, for details on the functions please refer to the CTP documentation of Katmai but most of them are self explanatory #

For e.g to see how we use these functions, let us insert a node as the last child of an existing node.To do this we first retrieve the sibling node.

--finding sibling node
SELECT @sibling = Max(EmployeeID)
FROM dbo.Organization
WHERE EmployeeId.GetAncestor(1)= @Parent;
--inserting node
INSERT dbo.Organization(EmployeeId, EmployeeName)
VALUES(@Parent.GetDescendant(@sibling,NULL), @Name)

We do not always want to (or can) recover the sibling node to perform insertion. There is perhaps an implied policy to determine node position. For example, let’s say we have an [order] column which position nodes among its siblings. We can compute node path as string: In this example, since the node @Parent is the root, that will give/<order>/. Thanks to the Parse() function, we can use this value to create the new node.

Declare @Parent As HeirarchyID = HeirarchyID::GetRoot() 
Declare @NewPath As varchar(10)= @Parent.ToString()+ CAST([Order] AS varchar(3))+ '/'
INSERT dbo.Organization(EmployeeId, EmployeeName) VALUES(HierarchyId::Parse(@NewPath),'aChild')

You will have note the new syntax of SQL Server 2008 to declare and assign variables in only one line. :: denotes a static method on a SQL CLR type in TSQL. So what am i getting to finally the CTE is not so much of a beauty anymore, just run this query to see what it returns

Select *
From dbo.Organization
Where @BossNode.IsDescendant(EmployeeId)

If  you run this query along side the CTE query and compare the execution plan of these queries you would see why this new feature is being talked about :)

Monday, 19 May 2008

SQL Profiler for 2005 Express Edition

I was trying to see how i can profile SQL 2005 Express Edition as the management studio does not have a profiler. I found one at the following location though , it is very useful and quite a light weight tool http://code.google.com/p/sqlexpressprofiler/downloads/list

Friday, 16 May 2008

NEW INSERT STATEMENT

In SQL 2008 some new row value constructors have been added, we are familiar with the INSERT statement which has been around for ages the ANSI way

INSERT INTO Table1 (column1 ,column2 ,... columnN) VALUES (value1,value2,....valueN)

Another way of inserting a single row of data is as follows

INSERT INTO Table1 SELECT value1,value2,....valueN

Similarly for multiple rows of data

INSERT INTO Table1 
SELECT value1,value2,....valueN
UNION SELECT value1,value2,....valueN
UNION SELECT value1,value2,....valueN

Now the new ROW VALUE CONSTRUCTOR allows the following to add multiple rows of data

INSERT INTO Table1(column1 ,column2 ,... columnN) VALUES
(value1 , value2 , ... valueN),
(value1 , value2 , ... valueN),
(value1 , value2 , ... valueN),
(value1 , value2 , ... valueN),
(value1 , value2 , ... valueN),

We normally use INSERT statements in stored procedures using parameters to the stored procedure, so the above row value constructor is not very useful , it seems to be use of a table valued parameter to insert multiple rows of data is a better option in SQL Server 2008

Databases and Software Development

Database is a form of software development and yet all too often the database is thought of as a secondary entity when development teams discuss architecture and test plans—many developers do not seem to believe, understand or leave alone feel the need to understand that standard software development best practices apply to database development. Virtually every application imaginable requires some form of data store. And many in the development community go beyond simply persisting data, creating applications that are data driven. Given this dependency upon data and databases, Data is the central factor that dictates the value any application can bring to its users. Without the data, there is no need for the application.

The very reason we use the word legacy to refer to an application as old as 3 years is more often than not because of the database. As applications grow in size with new features , the amount of thought put into refactoring front end code or developing new code by developers is not put into the database development, ,so what happens essentially is a situation where your front end is two years ahead of the database , and as time progresses your applications capabilities and features get pulled back by the limitations of database quality and standard. We can all deny this but in reality it results either in a limitation or extra cost on creating work arounds.

The central argument on many a database forum is what to do with that ever-present required logic. Sadly, try as we might, developers have still not figured out how to develop an application without the need to implement business requirements. And so the debate rages on. Does “business logic” belong in the database? In the application tier? What about the user interface? And what impact do newer application architectures have on this age-old question?In recent times I have been able to have a look at technologies like Astoria, Linq and the Entity Model framework and Katmai. I was amazed at how little or no database code needs to be written by a developer who is doing UI or business layer development in a software. At the same time being a SQL Server fan myself.. I was worried that my database skills will slowly vaporise into thin air. Hmm that's not as bad as it sounds. All these new technologies such as Astoria, Linq or the Entity Framework are abstractions of the database and allow developing a logical data layer which maps to a physical database , so all developers who work primarily in the UI level or business layer level will slowly stop doing any SQL code at all, instead churning code that interests them against a logical data layer, but what contradicts this is the need to learn a new syntax in the form Linq, . On the other hand, database development will shift to being the specialist job...the design development and management activities of the database developer and Administrator will begin emerging as specialist skills as opposed to generalist skills in the near future. The future of the database specialist seems to be bright.. but what is to be seen is how organisations look at this shift in software development ideology.. To be fair this model of specialist and generalists is not new..

Thursday, 15 May 2008

How to consume a .Net ArrayList in classic ASP

I first followed the procedure to register a .Net assembly using regasm on the web server as below (http://weblogs.asp.net/dneimke/archive/2004/01/31/65330.aspx)

  1. Make sure your assembly has a strong name to install it in the GAC
  2. Run regasm /tlb “<path of your assembly>”
  3. Add the assembly to the GAC

This allows your assembly to be accessible in your ASP code to create server side objects

In my example I have a .Net assembly which has a class as below

 

using System; 
using System.Collections; 
namespace API 
{ 
      /// <summary> 
      /// Summary description for Versions. 
      /// </summary> 
      public class Versions 
      { 
            ArrayList versions = new ArrayList();
            public Versions() 
            { 
                  versions.Add(“1”); 
                  versions.Add(“2”); 
                  versions.Add(“3”); 
            } 

            public ArrayList List
            { 
                  get 
                  { 
                        return versions; 
                  } 
            } 
      } 
} 

I initially tried to have a method in my .Net class return an ArrayList, Then I tried to create an object of type “System.Collections.ArrayList” in ASP and iterate the object in ASP, as the array list type definition was not supported when the asp code executed. So I created the class above which had a public property having a type of System.Collections.ArrayList

 

So now in ASP I did the following and it printed out the arrray list on my asp page

<%@ codePage="65001" %> 
<html> 
      <head> 
            <title>Test Page</title> 
            <%
   1:  
   2:                   set versions = Server.CreateObject("Versions") 
   3:                   For Each v in versions.List 
   4:                         Response.Write(v & "</br>") 
   5:                   Next 
   6:             
%> </head> <body></body> </html>

That can be further enhanced in your ASP code to achieve the functionality you need to

 

 

TSQL - GROUP BY and ALL


We sure have used Group By in our SQL queries to group data and get result sets with aggregates from SQL Server, But before I explain lets create a temporary table with order details table using the Northwind database in SQL 2000

I will just join orders and [order detail] table in Northwind to get the data i need into a temporary table as shown below

SELECT 
  O.OrderID, 
  ProductID, 
  UnitPrice, 
  Quantity, 
  (UnitPrice*Quantity) AS Amount,
  CustomerID
INTO 
  #tempOrders
FROM 
  Orders O 
INNER JOIN
  [order details] D 
ON
  O.[orderid] = D.[orderid]
ORDER BY
  ProductID

So now i have a table called #tempOrders with the order details i need.

Now suppose I'd like to see the customers that were sold Product #1 along with the total amount that they spent. I will usea query with a GROUP BY clause as below with a WHERE condition to filter records

SELECT 
   CustomerID,
   SUM(Amount) AS TotalAmount
 FROM
   #tempOrders
 WHERE 
   ProductID = 1
 GROUP BY 
   CustomerID


Now, let's say that I'd like to see all customers that have been sold any products, but we still just want to see the "TotalAmount" for ProductID #1. For customers that have never ordered ProductID #1, it should output a "TotalAmount" value of 0. One way to do this is with a CASE expression as shown below

SELECT 
  CustomerID,
  SUM(CASE WHEN ProductID = 1 THEN Amount ELSE 0 END) AS TotalAmount
FROM
  #tempOrders
GROUP BY 
  CustomerID

Now this would return customers who haven't purchased Product #1 with a total of 0. In situations like these the SUM(CASE...) expression can be replaced with a GROUP BY ALL.

SELECT 
    CustomerID,
    ISNULL(SUM(Amount), 0) AS TotalAmount
  FROM
    #tempOrders
  WHERE 
    ProductID = 1
  GROUP BY ALL CustomerID

Values that are excluded from the aggregation according to the WHERE clause have NULL values returned, the ISNULL function makes sure all customers who haven't ordered Product #1 have a total of 0 instead of NULL. The ALL option basically says "ignore the WHERE clause when doing the GROUPING, but still apply it for any aggregate functions". So, in this case, the WHERE clause is not considered when generating the population of CustomerID values, but it is applied when calculating the SUM. This is very much like our first solution, where we removed the WHERE clause completely, and used a SUM(CASE...) expression to conditionally calculate the aggregate.

GROUP BY ALL is kind of obscure and neat to know, but not really useful in most situations since there are usually easier or better ways to get this result. This won't work if we want all Customers to be displayed, since a customer must have at least one order to show up in the result.Another limitation is we can not use GROUP BY ALL if we want to return a grand total for all orders

Wednesday, 14 May 2008

How to consume a .Net ArrayList in classic ASP

I first followed the procedure to register a .Net assembly using regasm on the web server as below (http://weblogs.asp.net/dneimke/archive/2004/01/31/65330.aspx)

  1. Make sure your assembly has a strong name to install it in the GAC
  2. Run regasm /tlb “<path of your assembly>”
  3. Add the assembly to the GAC

This allows your assembly to be accessible in your ASP code to create server side objects

In my example I have a .Net assembly which has a class as below  

using System;                                                     
using System.Collections; 

namespace API 
{ 
      /// <summary> 
      /// Summary description for Versions. 
      /// </summary> 
      public class Versions 
      { 
            ArrayList versions = new ArrayList(); 

            public Versions() 
            { 
                  versions.Add(“1”); 
                  versions.Add(“2”); 
                  versions.Add(“3”); 
            }   

            public ArrayList List 
            { 
                  get { return versions; } 
            } 
      } 
} 

I initially tried to have a method in my .Net class return an ArrayList, Then I tried to create an object of type “System.Collections.ArrayList” in ASP and iterate the object in ASP, as the array list type definition was not supported when the asp code executed. So I created the class above which had a public property having a type of System.Collections.ArrayList  .

So now in ASP I did the following and it printed out the arrray list on my asp page 

<%@ codePage="65001" %> 
<html> 
      <head> 
            <title>Test Page</title>
            <%
   1:     set versions = Server.CreateObject("Versions") 
   2:                   For Each v in versions.List 
   3:                         Response.Write(v & "</br>")
   4:                   Next
   5:             
%> </head> <body></body> </html>

  That can be further enhanced in your ASP code to achieve the functionality you need to

 

 

Tuesday, 13 May 2008

PIVOT and UNPIVOT

There may be situations while writing SQL code where we would like to have totals or aggregates of columns alongside a column, e.g number of points scored by a sales person with their sales, as well as separate sales by month or year.

So lets first create a table which represents sales data as below

CREATE TABLE #tempTest
(
  empName VARCHAR(25),
  noOfSales INT,
  [monthName] VARCHAR(3),
  Points INT
)

Just add a few rows of data to this table

INSERT INTO #tempTest VALUES('James',2,'Jan', 2);
INSERT INTO #tempTest VALUES('Jason',1,'Jan', 1);
INSERT INTO #tempTest VALUES('Mark',3,'Jan', 3);
INSERT INTO #tempTest VALUES('Mark',1,'Jan', 1);
INSERT INTO #tempTest VALUES('James',1,'Feb', 1);
INSERT INTO #tempTest VALUES('James',2,'Feb', 2);
INSERT INTO #tempTest VALUES('Jason',3,'Feb', 3);
INSERT INTO #tempTest VALUES('Jason',2,'Feb', 2);
INSERT INTO #tempTest VALUES('Mark',1,'Mar', 1);
INSERT INTO #tempTest VALUES('James',2,'Mar', 2);
INSERT INTO #tempTest VALUES('Jason',2,'Mar', 2);

Now to show how many sales each sales person gets per month we could use PIVOT to achieve this easily


SELECT
  empName,
  [Jan] AS Total_Sales_Jan,
  [Feb] AS Total_Sales_Feb,
  [Mar] AS Total_Sales_Mar
FROM
  (
    SELECT
      empName, noOfSales, [monthName]
    FROM
      #tempTest
  ) AS Source PIVOT (SUM(noOfSales) for [monthName] IN ([Jan],[Feb],[Mar]) )AS PivotTable

This query results in the following result

Results Pivot 1

Please note that PIVOT can have only one non pivoted column, Example below shows how PIVOT can be used incorrectly

SELECT
  empName,
  SUM([Jan]) AS Jan,
  SUM([Feb]) AS Feb,
  SUM([Mar]) AS Mar,
  SUM(Points) AS Points
FROM
  (
    SELECT
      empName,
      noOfSales,
      [monthName],
      Points
    FROM
      #tempTest
  ) AS Source PIVOT (SUM(noOfSales) for [monthName] IN ([Jan],[Feb],[Mar]) )AS PivotTable
GROUP BY
  empName

Results Pivot 2

Note that the results are incorrect for Points, they dont sum up correctly for each sales person as expected, this is because you cannot use more than one unpivoted column

UNPIVOT is the exact opposite of PIVOT although the usage of it is likely to be minimal there are chances this may be used in some occassions

Friday, 9 May 2008

Deleting Duplicates using CTE's

While writing stored procedures and dealing with temporary data within the scope of a stored procedure we come across a situation where our temporary data has duplicate records and we want to get rid of duplicated. What we do normally in SQL 2000 would be to create a new temporary table and select distinct records from the temporary table which has duplicates. Then we select the data from the new cleaned up table.

This is not very performant as is evident, SQL 2005 onwards we have been able to do this with CTE's efficiently as below

Deleting from the CTE actually changes the underlying table. Be careful how you setup your CTE. You could have some unintended deletes without the right logic.

CREATE TABLE #itemtable(
  Item_Code varchar(10),
  Item_Name varchar(100)
)

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('1','Item 1');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('2','Item 2');
  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('3','Item 3');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('3','Item 3');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('4','Item 4');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('5','Item 5');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('6','Item 6');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('6','Item 6');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('7','Item 7');

GO

  SELECT * FROM #itemtable;

  WITH Duplicates AS
  (
    SELECT
      *,
      row_number() OVER(PARTITION BY Item_Code ORDER BY Item_Code) as RowNum
    FROM
      #itemtable
  )

  DELETE FROM Duplicates WHERE RowNum > 1; --

SELECT * FROM #itemtable;

Wednesday, 7 May 2008

Table Valued Parameters

In recent times I haven’t written any complex TSQL code which needed to scale, possible a scenario you may come across while working on a reporting application where most of the processing is done on the database server using TSQL. In SQL 2005 user defined types were available but in Katmai this has been furthered to allow table valued parameters to be passed to stored procedures. Well we can also think of them as instances of user defined types. At the end of the day it boils down to being able to efficient and more performant code , but then this feature seems to have a limitation at the moment.

The table data type allows variables that are, in effect, tables. They also allow you to return tabular data from functions without doing a SELECT at the end of the function. That's great, but don't temporary tables do the same? Yes and No …. While table variables and temporary tables are similar, there are several key differences that determine which option you use.
E.g. Consider the sample below in SQL 2005 to add orders to the database table using TSQL, I will need a stored procedure which takes each column in a row as a parameter, and more over each row needs a round trip to the database server..

CREATE PROC sp_add_order(@order_id int, @customer_id int, @order_total money )
AS
INSERT INTO Order
(
order_id,
customer_id,
order_total
)
VALUES
(
@order_id,
@customer_id,
@order_total
)
GO

Now with SQL 2008 we are able to insert several rows of data with just one round trip to the server by using a user defined type called OrderType which has a table structure. That is efficient and quite performant…

CREATE TYPE OrderType AS TABLE
(
[order_id] int NOT NULL,
customer_id int NOT NULL,
[OrderTotal] [money] NULL
)
GO

ALTER PROC sp_new_order (@order OrderType READONLY)
AS
INSERT INTO [OrderTest]
(
order_id,
customer_id,
[OrderTotal]
)
SELECT * FROM @order

RETURN
GO

The one limitation to note here is the READONLY option on the parameter, At the moment the CTP only allows us to pass read-only table valued parameters, We are unable to use OUT parameters of Table Value Type, It would be great to be able to do this, as we don’t have to make nested calls to stored procedures in reporting scenarios… It is to be seen how this pans out in the final release, although the feature doesn’t look big, the sheer benefits of being able to use Table Valued Parameters comes as blessing in disguise for many TSQL developers.

Tuesday, 6 May 2008

Spatial Data - Delivering Location based Intelligence


I have been on and off blogging recently due to my part time course at the Uni which has kept me busy.. By the way I did look through most of the new features on Katmai and actually did a small lunch time session at work on Katmai, one of the topics I did speak in details was spatial data support on Katmai.

SQL Server 2008 provides the geography (for geodetic spatial data) and the geometry(for planar spatial data) data types. Both are .NET CLR types, just like your Hierarchy data type which I will post about in another blog soon. The presence of the CLR in SQL Server provides an avenue for some really new and exciting features in SQL Server. When the CLR was introduced on the Server in Yukon I did think it is a bit too much to be present in a database server, but then now that some of the new features in Katmai have panned out as a result of that it does all seem to make sense now.

The geography data type provides a storage structure for spatial data which is ellipsoidal in nature, and is represented by latitude and longitude coordinates. Typical uses of this kind of data include defining roads, buildings, or geographical features as vector data that can be overlaid onto a raster-based map, basically any round earth data.

The geometry data type provides a storage structure for spatial data which is planar or in other words Euclidean in nature. This kind of data is commonly used for maps and interior floor plans where the curvature of the Earth does not need to be taken into account.

To create data that is of the type geometry or geography we could create instances using input in the format called Well Known Text (WKT) or Well Known Binary(WKB). Well WKB is the format specified by the Open Geospatial Consortium that permits the geometry data to be exchanged between the client and the SQL server. The geometry data type provides properties and methods that are aligned with the Open Geospatial Consortium (OGC) Simple Features Specification for SQL and enable you to perform operations on geometric data that produce industry-standard behavior.

SQL Server 2008 increases the maximum size for CLR types in the database from the 8000 bytes limit that was imposed in SQL Server 2005 to two gigabytes (2 GB), which makes it possible to store extremely complex spatial data elements, such as polygons, which are defined by a large number of points. By storing spatial data in relational tables, SQL Server 2008 makes it possible to combine spatial data with any other kind of business data; this removes the need to maintain a separate, dedicated spatial data store and enables high performance queries that do not need to combine data from multiple external sources. Performance of queries against spatial data is further enhanced by the inclusion of spatial index support in SQL Server 2008.

Both spatial data types in SQL Server 2008 provide a comprehensive set of instance and static methods that you can use to perform queries and operations on spatial data. If we explore these two data types we find that Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection are all instances of spatial data types recognized by both geometry and geography data types.

To construct instances of a point, Multipoint etc. from WKT format Katmai provides functions as following
Point instance from WKT input use STPointFromText
MultiPoint instance from WKT input use STMPointFromText
LineString instance from WKT input use STLineFromText
MultiLineString instance from WKT input use STMLineFromText
Polygon instance from WKT input use STPolyFromText
MultiPolygon instance from WKT input use STMPolyFromText
GeometryCollection instance from WKT input STGeomCollFromText


To construct instances from WKB format Katmai provides the following functions

Point instance from WKB input use STPointFromWKB
MultiPoint instance from WKB input use STMPointFromWKB.
LineString instance from WKB input use STLineFromWKB.
MultiLineString instance from WKB input use STMLineFromWKB
Polygon instance from WKB input use STPolyFromWKB
MultiPolygon instance from WKB input use STMPolyFromWKB.
GeometryCollection instance from WKB input use STGeomCollFromWKB

There are also methods which allow developers to compare or determine the relationship between geometrical instances such as the ones listed below

STEquals determines if two instances comprise the same point set
STDisjoint determines if two instances are disjoint
STIntersects determines if two instances intersect.
STTouches determines if two instances touch.
STOverlaps determines if two instances overlap.
STCrosses determines if two instances cross .
STWithin determines if one instance is within another .
STContains determines if one instance contains another.
STRelate determines if two instances are spatially related
STDistance determines the shortest distance between points in two geometries

Equivalent set of functions and methods are available for Geographical types as well .

To Create and add Spatial Data

USE Adventureworks
GO

IF OBJECT_ID ( 'dbo.SpatialTable', 'U' ) IS NOT NULL DROP TABLE dbo.SpatialTable;
GO

CREATE TABLE SpatialTable
( id int IDENTITY (1,1),
GeomCol1 geometry,
GeomCol2 AS GeomCol1.STAsText() );
GO


INSERT INTO SpatialTable (GeomCol1)
VALUES (geometry::STGeomFromText('LINESTRING (100 100, 20 180, 180 180)', 0));

INSERT INTO SpatialTable (GeomCol1)
VALUES (geometry::STGeomFromText('POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0))', 0));
GO


To create a Geography instance from WKT format

DECLARE @g geography;
SET @g = geography::STGeomFromText('LINESTRING(47.656 -122.360, 47.656 -122.343)', 4326);
SELECT @g.ToString();

To create a Geometry instance from WKB format

DECLARE @g geometry;
SET @g = geometry::STGeomFromWKB(0x010200000003000000000000000000594000000000000059400000000000003440000000000080664000000000008066400000000000806640, 0);
SELECT @g.STAsText();


-- Point from WKB
SET @g = geometry::STPointFromWKB(0x010100000000000000000059400000000000005940, 0);
SELECT @g.STAsText();

To create a Geometry instance from WKT format

-- Creating an instance with WKT
DECLARE @g geometry;
SET @g = geometry::STGeomFromText('LINESTRING (100 100, 20 180, 180 180)', 0);
SELECT @g.ToString();

-- Parse is also useful to convert data from WKT
SET @g = geometry::Parse('LINESTRING (100 100, 20 180, 180 180)');
SELECT @g.ToString();

-- Point instance from WKT
SET @g = geometry::STPointFromText('POINT (100 100)', 0);
SELECT @g.ToString();

Determining if the instances intersect

USE AdventureWorks
GO

DECLARE @geom1 geometry;
DECLARE @geom2 geometry;
DECLARE @result geometry;

SELECT @geom1 = GeomCol1 FROM SpatialTable WHERE id = 1;
SELECT @geom2 = GeomCol1 FROM SpatialTable WHERE id = 2;
SELECT @result = @geom1.STIntersection(@geom2);
SELECT @result.STAsText();

Sample code


As the integration of geospatial information into applications becomes more prevalent, we will require database systems that can store and manipulate spatial data. With the introduction of the spatial types, SQL Server 2008 provides a data storage solution for spatial data, and enables organizations of any scale to integrate geospatial features into their applications and services.