Search This Blog

Friday, 9 May 2008

Deleting Duplicates using CTE's

While writing stored procedures and dealing with temporary data within the scope of a stored procedure we come across a situation where our temporary data has duplicate records and we want to get rid of duplicated. What we do normally in SQL 2000 would be to create a new temporary table and select distinct records from the temporary table which has duplicates. Then we select the data from the new cleaned up table.

This is not very performant as is evident, SQL 2005 onwards we have been able to do this with CTE's efficiently as below

Deleting from the CTE actually changes the underlying table. Be careful how you setup your CTE. You could have some unintended deletes without the right logic.

CREATE TABLE #itemtable(
  Item_Code varchar(10),
  Item_Name varchar(100)
)

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('1','Item 1');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('2','Item 2');
  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('3','Item 3');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('3','Item 3');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('4','Item 4');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('5','Item 5');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('6','Item 6');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('6','Item 6');

  INSERT INTO #itemtable(Item_Code, Item_Name)
  VALUES ('7','Item 7');

GO

  SELECT * FROM #itemtable;

  WITH Duplicates AS
  (
    SELECT
      *,
      row_number() OVER(PARTITION BY Item_Code ORDER BY Item_Code) as RowNum
    FROM
      #itemtable
  )

  DELETE FROM Duplicates WHERE RowNum > 1; --

SELECT * FROM #itemtable;

1 comment:

Anonymous said...

excelent. thats great. it does work and very efficient