Delete Duplicate

download Delete Duplicate

of 2

Transcript of Delete Duplicate

  • 8/3/2019 Delete Duplicate

    1/2

    SQLAuthority News TechEd on Road Ahmedabad June 20, 2009 An Astounding SuccessSQLAuthority News Update on pinaldave.com and SQLAuthority.com SQL SERVER 2005 2008 Delete Duplicate RowsJune 23, 2009 by pinaldave

    I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code

    snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.

    This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.

    /* Create Table with 7 entries - 3 are duplicate entries */CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)INSERT INTO DuplicateRcordTableSELECT 1, 1

    UNION ALLSELECT 1, 1 --duplicateUNION ALLSELECT 1, 1 --duplicateUNION ALLSELECT 1, 2UNION ALLSELECT 1, 2 --duplicateUNION ALLSELECT 1, 3UNION ALLSELECT 1, 4GO

    The above table has total 7 records, out of which 3 are duplicate records. Oncethe duplicates are removed we will have only 4 records left.

    /* It should give you 7 rows */SELECT *FROM DuplicateRcordTableGO

    The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case,we have Col1 and Col2 and both the columns qualify as duplicate rows. It may bea different set of rows for each different query like this. Another point to note here is that once CTE is created DELETE statement can be run on it. We willput a condition here when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

    /* Delete Duplicate records */WITH CTE (COl1,Col2, DuplicateCount)AS(

    SELECT COl1,Col2,ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCountFROM DuplicateRcordTable

  • 8/3/2019 Delete Duplicate

    2/2

    )DELETEFROM CTEWHERE DuplicateCount > 1GO

    ---------------------------------------------

    The script below defines my CTE. I am using a windowing function named DENSE_RANK to group the records together based on the Product, SaleDate, and SalePrice fields, and assign them a sequential value randomly. This means that if I have tworecords with the exact same Product, SaleDate, and SalePrice values, the firstrecord will be ranked as 1, the second as 2, and so on.

    ;WITH SalesCTE(Product, SaleDate, SalePrice, Ranking)AS(SELECT Product, SaleDate, SalePrice,Ranking = DENSE_RANK() OVER(PARTITION BY Product, SaleDate, SalePrice ORDER BY NEWID() ASC)FROM SalesHistory)DELETE FROM SalesCTEWHERE Ranking > 1B

    ecause a CTE acts as a virtual table, I am able to process data modification statements against it, and the underlying table will be affected. In this case, I am removing any record from the SalesCTE that is ranked higher than 1. This willremove all of my duplicate records.