Hierarchical Data Deletion

Applicability:

SQL Server 2000: Tested

SQL Server 2005: Tested

SQL Server 2008: Tested

SQL Server 2008R2: Tested

SQL Server 2012: Not Tested

Credits:

Author: Unknown (multiple)

Date: 2008/2009

Description

Often, there is a need to remove data from some or all tables in a relational database, or even to remove tables completely. The challenge here is navigating the complexities of the PK/FK relationships between tables, especially in an unfamiliar system.

A greater challenge can be creating a robust scheduled process (for data archiving or refresh) as the data structures are subject to change.

There are 3 techniques that I have used in the past to achieve this:

    1. Trial and Error - initially time consuming and an ongoing pain in the rear end.

    2. Disable all constraints, delete and enable all constraints

    3. Create dynamic deletion statements with the ordering determined using system metadata

Technique #1 is the quickest an easiest, especially with the ability of Management Studio GUI to continue on error and to resubmit the same list repeatedly until all tables are gone. Not the most elegant or supportable in the long term though and very much a brute force method to be used only when the rioting crowd around your desk demands immediate action

I will concentrate here on the last 2 techniques

NOTE: For ease of typing, I will refer to Deletion, but the technique can be equally applied to Archiving as this just involves an additional step of transferring the data somewhere else before deletion.

Delete Hierarchically - Disable Constraint Method

This is still a somewhat brute force method, as all constraints on the tables are disabled, data is deleted followed by a reinstatement of the constraints. This can be dangerous if only partial data is to be deleted - the deletion code must be coded carefully to ensure that constraints remain viable when enabled again. This effectively means the constraints are duplicated in code and is not very maintainable in the long term.

The code provided generates 3 sets of results for each invocation. They are lists of TSQL commands to:

  1. Disable constraints on all tables

    1. Delete data from all tables (this is where customisation would be necessary for archiving purposes)

    2. Enable constraints on all tables

Delete Hierarcichally - Ordered Delete Method

This is a more challenging method to design and code, but ultimately produces a more robust and maintainable process. Full or partial data deletions are carried out in an order determined by the PK/FK relationships. This ensures that each delete operation only succeeds if the constraint rules are not broken.

The code provided here also generates 3 sets of results, but only one at a time dependent on parameters passed in. The results are also not TSQL commands but are lists of objects to be consumed by custom deletion code. The main point here is that the complex work of producing the hierarchically ordered list is done for you:

    1. List of all PK/FK relationships, with parent and child table names

    2. List of tables in data deletion order (bottom up - from child to parent)

    3. List of tables in data insertion order (top down - from parent to child)

NOTE: Only tables with a PK or FK defined will appear in the lists.

There is a function-based implementation of the ordered table list which also produces a hirearchy level indicator which can also be used to drive custom INSERT/DELETE code.

Code

Delete Hierarchichally - Disable Constraint Method

DROP PROCEDURE [dbo].[usp_DeleteHirearchically_ConstraintMethod]

GO

CREATE PROCEDURE [dbo].[usp_DeleteHirearchically_ConstraintMethod]

AS

-- generate list of commands to disable all constraints

SELECT

'alter table '

+ QUOTENAME(table_schema)

+ '.'

+ QUOTENAME(table_name)

+ ' nocheck constraint all;'

FROM information_schema.tables

WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)

+ '.'

+ QUOTENAME(table_name)), 'ismsshipped') = 0

AND table_type = 'base table'

--AND table_schema = 'dbo'

-- generate list of commands to delete data from all tables

SELECT

'delete from '

+ QUOTENAME(table_schema)

+ '.'

+ QUOTENAME(table_name)

+ ' ;'

FROM information_schema.tables

WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)

+ '.'

+ QUOTENAME(table_name)), 'ismsshipped') = 0

AND table_type = 'base table'

--AND table_schema = 'dbo'

-- generate list of commands to enable all constraints

SELECT

'alter table '

+ QUOTENAME(table_schema)

+ '.'

+ QUOTENAME(table_name)

+ ' check constraint all;'

FROM information_schema.tables

WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)

+ '.'

+ QUOTENAME(table_name)), 'ismsshipped') = 0

AND table_type = 'base table'

--AND table_schema = 'dbo'

Delete Hierarchichally - Ordered Delete Method

/*

-- To Display hierarchical list of the tables with foreign keys and the tables those foreign keys are referencing

EXEC usp_DeleteHirearchically_OrderedDeletionMethod

-- To Display the list of tables to delete in bottom-up order

EXEC usp_DeleteHirearchically_OrderedDeletionMethod 'DELETE'

-- To Display the list of tables to insert in top-down order

EXEC usp_DeleteHirearchically_OrderedDeletionMethod 'INSERT'

*/

DROP PROCEDURE [usp_DeleteHirearchically_OrderedDeletionMethod]

GO

CREATE PROCEDURE [dbo].[usp_DeleteHirearchically_OrderedDeletionMethod](@Operation VARCHAR(6)=NULL)

AS

SET NOCOUNT ON

-- From SysReferences Table: Get the Foreign Key and Reference Key table list

-- From Sys.Objects Table : Get the schema name of the tables

-- Self Join SysReference Table again to remove the conflicted tables (2 tables involved in circular reference)

DECLARE

@rc INT

SELECT SCHEMA_NAME(SOF.SCHEMA_ID) AS FSC_NAME,

T1.FKEYID AS FKEYID,

SCHEMA_NAME(SOR.SCHEMA_ID) AS RSC_NAME,

T1.RKEYID AS RKEYID

INTO #sysref

FROM SYSREFERENCES T1

LEFT OUTER JOIN sys.objects SOF

ON SOF.OBJECT_ID = T1.FKEYID

LEFT OUTER JOIN sys.objects SOR

ON SOR.OBJECT_ID = T1.RKEYID

LEFT OUTER JOIN SYSREFERENCES T2

ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)

WHERE T2.FKEYID IS NULL

--- Below Query is same as above , but it will give the list of conflicted tables (2 tables in a circular reference)

-- You need to manually delete the data from these tables, if listed.

SELECT SCHEMA_NAME(SOF.SCHEMA_ID)+'.'+OBJECT_NAME(T1.FKEYID) AS '--',

SCHEMA_NAME(SOR.SCHEMA_ID)+'.'+OBJECT_NAME(T1.RKEYID) AS '--'

FROM SYSREFERENCES T1

LEFT OUTER JOIN sys.objects SOF

ON SOF.OBJECT_ID = T1.FKEYID

LEFT OUTER JOIN sys.objects SOR

ON SOR.OBJECT_ID = T1.RKEYID

JOIN SYSREFERENCES T2

ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)

AND T1.FKEYID <> T1.RKEYID

SET @rc =@@ROWCOUNT

IF @rc > 0

BEGIN

PRINT '--Above ' + CAST(@rc AS VARCHAR(90))+' tables are referenced each other and can''t delete with delete statement'

END

--Recursive CTE to return the Hierarchial Data

;WITH CTE(FKEYID,LEVEL)

AS

(

SELECT

FKEYID,

0

FROM SYSREFERENCES

UNION ALL

SELECT

T1.RKEYID ,

LEVEL+1

FROM #sysref T1

JOIN CTE T2 ON T1.FKEYID = T2.FKEYID

)

--Dump The Hierarchial Data Into A Temp Table With Identity Column.

--Remove The Duplicate Table Names From The List (comes when one table refers to 2 tables or more) Through Using The Max Function

SELECT

IDENTITY(INT,1,1) AS iid,

FSC_NAME AS FSC_NAME,

T1.FKEYID AS FKEYID,

RSC_NAME AS RSC_NAME,

T1.RKEYID AS RKEYID,

MAX(LEVEL) AS LEVEL

INTO #T

FROM #sysref T1

INNER JOIN CTE T2

ON T1.RKEYID = T2.FKEYID

WHERE LEVEL >0

GROUP BY

T1.FSC_NAME,

T1.FKEYID,

T1.RSC_NAME,

T1.RKEYID

ORDER BY MAX(LEVEL) ASC

-- Display hierarchical list of the tables with foreign keys and the tables those foreign keys are referencing

IF @Operation IS NULL

BEGIN

SELECT

RSC_NAME AS PK_SCHEMA_NAME,

OBJECT_NAME(RKEYID) AS PRIMARY_KEY_TABLE_NAME,

FSC_NAME AS FK_SCHEMA_NAME,

OBJECT_NAME(FKEYID)AS FOREIGN_KEY_TABLE_NAME

FROM #T

ORDER BY 2,4

END

-- Below Block will give 2 Data Sets To Delete The Tables In Hierarchial Manner

-- #1 Is tables involved in child relationships (may be multiple levels)

-- #2 Is top level parent tables

ELSE IF @Operation ='DELETE'

BEGIN

SELECT 'DELETE FROM ' + FSC_NAME +'.'+ FKEYID + ';' AS 'SEQUENTIAL DELETE TABLE LIST (Child Tables)'

FROM

(

SELECT

TOP 10000 MIN( IID) AS IID ,

FSC_NAME,OBJECT_NAAME(FKEYID)AS FKEYID

FROM #t

GROUP BY FSC_NAME,FKEYID

ORDER BY 1

) X;

SELECT 'DELETE FROM ' + SCHEMA_NAME(SCHEMA_ID)+'.'+NAME AS 'SEQUENTIAL DELETE TABLE LIST (Parent Tables)'

FROM SYS.OBJECTS

WHERE TYPE = 'U'

AND NAME <> 'sysdiagrams'

AND OBJECT_ID NOT IN

(

SELECT FKEYID FROM #t

)

AND OBJECT_ID NOT IN

(

SELECT (T1.FKEYID) AS FKEYID

FROM SYSREFERENCES T1

JOIN SYSREFERENCES T2

ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)

WHERE T1.FKEYID <> T1.RKEYID

);

END

-- Below Block will give 2 Data Sets To Insert The Tables In Hierarchial Manner

-- #1 Is top level parent tables

-- #2 Is tables involved in child relationships (may be multiple levels)

ELSE IF @Operation ='INSERT'

BEGIN

--TO INSERT

SELECT

NAME AS 'SEQUENTIAL INSERT TABLE LIST (Parent Tables)'

FROM SYSOBJECTS

WHERE XTYPE = 'U'

AND NAME <> 'sysdiagrams'

AND ID NOT IN

(

SELECT FKEYID FROM #t

)

AND ID NOT IN

(

SELECT (T1.FKEYID

) AS FKEYID

FROM SYSREFERENCES T1

JOIN SYSREFERENCES T2

ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)

WHERE T1.FKEYID <> T1.RKEYID)

SELECT FKEYID AS 'SEQUENTIAL INSERT TABLE LIST (Child Tables)'

FROM

(

SELECT

TOP 10000 MIN(IID) AS IID,

OBJECT_NAME(FKEYID) AS FKEYID

FROM #t

GROUP BY FKEYID

ORDER BY MIN(IID) DESC

) X

END