Hierarchical Data Deletion
Applicability:
SQL Server 2000: Tested
SQL Server 2005: Tested
SQL Server 2008: Tested
SQL Server 2008R2: Tested
SQL Server 2012: Not Tested
Credits:
Author: Unknown (multiple)
Date: 2008/2009
Description
Often, there is a need to remove data from some or all tables in a relational database, or even to remove tables completely. The challenge here is navigating the complexities of the PK/FK relationships between tables, especially in an unfamiliar system.
A greater challenge can be creating a robust scheduled process (for data archiving or refresh) as the data structures are subject to change.
There are 3 techniques that I have used in the past to achieve this:
Trial and Error - initially time consuming and an ongoing pain in the rear end.
Disable all constraints, delete and enable all constraints
Create dynamic deletion statements with the ordering determined using system metadata
Technique #1 is the quickest an easiest, especially with the ability of Management Studio GUI to continue on error and to resubmit the same list repeatedly until all tables are gone. Not the most elegant or supportable in the long term though and very much a brute force method to be used only when the rioting crowd around your desk demands immediate action
I will concentrate here on the last 2 techniques
NOTE: For ease of typing, I will refer to Deletion, but the technique can be equally applied to Archiving as this just involves an additional step of transferring the data somewhere else before deletion.
Delete Hierarchically - Disable Constraint Method
This is still a somewhat brute force method, as all constraints on the tables are disabled, data is deleted followed by a reinstatement of the constraints. This can be dangerous if only partial data is to be deleted - the deletion code must be coded carefully to ensure that constraints remain viable when enabled again. This effectively means the constraints are duplicated in code and is not very maintainable in the long term.
The code provided generates 3 sets of results for each invocation. They are lists of TSQL commands to:
Disable constraints on all tables
Delete data from all tables (this is where customisation would be necessary for archiving purposes)
Enable constraints on all tables
Delete Hierarcichally - Ordered Delete Method
This is a more challenging method to design and code, but ultimately produces a more robust and maintainable process. Full or partial data deletions are carried out in an order determined by the PK/FK relationships. This ensures that each delete operation only succeeds if the constraint rules are not broken.
The code provided here also generates 3 sets of results, but only one at a time dependent on parameters passed in. The results are also not TSQL commands but are lists of objects to be consumed by custom deletion code. The main point here is that the complex work of producing the hierarchically ordered list is done for you:
List of all PK/FK relationships, with parent and child table names
List of tables in data deletion order (bottom up - from child to parent)
List of tables in data insertion order (top down - from parent to child)
NOTE: Only tables with a PK or FK defined will appear in the lists.
There is a function-based implementation of the ordered table list which also produces a hirearchy level indicator which can also be used to drive custom INSERT/DELETE code.
Code
Delete Hierarchichally - Disable Constraint Method
DROP PROCEDURE [dbo].[usp_DeleteHirearchically_ConstraintMethod]
GO
CREATE PROCEDURE [dbo].[usp_DeleteHirearchically_ConstraintMethod]
AS
-- generate list of commands to disable all constraints
SELECT
'alter table '
+ QUOTENAME(table_schema)
+ '.'
+ QUOTENAME(table_name)
+ ' nocheck constraint all;'
FROM information_schema.tables
WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)
+ '.'
+ QUOTENAME(table_name)), 'ismsshipped') = 0
AND table_type = 'base table'
--AND table_schema = 'dbo'
-- generate list of commands to delete data from all tables
SELECT
'delete from '
+ QUOTENAME(table_schema)
+ '.'
+ QUOTENAME(table_name)
+ ' ;'
FROM information_schema.tables
WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)
+ '.'
+ QUOTENAME(table_name)), 'ismsshipped') = 0
AND table_type = 'base table'
--AND table_schema = 'dbo'
-- generate list of commands to enable all constraints
SELECT
'alter table '
+ QUOTENAME(table_schema)
+ '.'
+ QUOTENAME(table_name)
+ ' check constraint all;'
FROM information_schema.tables
WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)
+ '.'
+ QUOTENAME(table_name)), 'ismsshipped') = 0
AND table_type = 'base table'
--AND table_schema = 'dbo'
Delete Hierarchichally - Ordered Delete Method
/*
-- To Display hierarchical list of the tables with foreign keys and the tables those foreign keys are referencing
EXEC usp_DeleteHirearchically_OrderedDeletionMethod
-- To Display the list of tables to delete in bottom-up order
EXEC usp_DeleteHirearchically_OrderedDeletionMethod 'DELETE'
-- To Display the list of tables to insert in top-down order
EXEC usp_DeleteHirearchically_OrderedDeletionMethod 'INSERT'
*/
DROP PROCEDURE [usp_DeleteHirearchically_OrderedDeletionMethod]
GO
CREATE PROCEDURE [dbo].[usp_DeleteHirearchically_OrderedDeletionMethod](@Operation VARCHAR(6)=NULL)
AS
SET NOCOUNT ON
-- From SysReferences Table: Get the Foreign Key and Reference Key table list
-- From Sys.Objects Table : Get the schema name of the tables
-- Self Join SysReference Table again to remove the conflicted tables (2 tables involved in circular reference)
DECLARE
@rc INT
SELECT SCHEMA_NAME(SOF.SCHEMA_ID) AS FSC_NAME,
T1.FKEYID AS FKEYID,
SCHEMA_NAME(SOR.SCHEMA_ID) AS RSC_NAME,
T1.RKEYID AS RKEYID
INTO #sysref
FROM SYSREFERENCES T1
LEFT OUTER JOIN sys.objects SOF
ON SOF.OBJECT_ID = T1.FKEYID
LEFT OUTER JOIN sys.objects SOR
ON SOR.OBJECT_ID = T1.RKEYID
LEFT OUTER JOIN SYSREFERENCES T2
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
WHERE T2.FKEYID IS NULL
--- Below Query is same as above , but it will give the list of conflicted tables (2 tables in a circular reference)
-- You need to manually delete the data from these tables, if listed.
SELECT SCHEMA_NAME(SOF.SCHEMA_ID)+'.'+OBJECT_NAME(T1.FKEYID) AS '--',
SCHEMA_NAME(SOR.SCHEMA_ID)+'.'+OBJECT_NAME(T1.RKEYID) AS '--'
FROM SYSREFERENCES T1
LEFT OUTER JOIN sys.objects SOF
ON SOF.OBJECT_ID = T1.FKEYID
LEFT OUTER JOIN sys.objects SOR
ON SOR.OBJECT_ID = T1.RKEYID
JOIN SYSREFERENCES T2
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
AND T1.FKEYID <> T1.RKEYID
SET @rc =@@ROWCOUNT
IF @rc > 0
BEGIN
PRINT '--Above ' + CAST(@rc AS VARCHAR(90))+' tables are referenced each other and can''t delete with delete statement'
END
--Recursive CTE to return the Hierarchial Data
;WITH CTE(FKEYID,LEVEL)
AS
(
SELECT
FKEYID,
0
FROM SYSREFERENCES
UNION ALL
SELECT
T1.RKEYID ,
LEVEL+1
FROM #sysref T1
JOIN CTE T2 ON T1.FKEYID = T2.FKEYID
)
--Dump The Hierarchial Data Into A Temp Table With Identity Column.
--Remove The Duplicate Table Names From The List (comes when one table refers to 2 tables or more) Through Using The Max Function
SELECT
IDENTITY(INT,1,1) AS iid,
FSC_NAME AS FSC_NAME,
T1.FKEYID AS FKEYID,
RSC_NAME AS RSC_NAME,
T1.RKEYID AS RKEYID,
MAX(LEVEL) AS LEVEL
INTO #T
FROM #sysref T1
INNER JOIN CTE T2
ON T1.RKEYID = T2.FKEYID
WHERE LEVEL >0
GROUP BY
T1.FSC_NAME,
T1.FKEYID,
T1.RSC_NAME,
T1.RKEYID
ORDER BY MAX(LEVEL) ASC
-- Display hierarchical list of the tables with foreign keys and the tables those foreign keys are referencing
IF @Operation IS NULL
BEGIN
SELECT
RSC_NAME AS PK_SCHEMA_NAME,
OBJECT_NAME(RKEYID) AS PRIMARY_KEY_TABLE_NAME,
FSC_NAME AS FK_SCHEMA_NAME,
OBJECT_NAME(FKEYID)AS FOREIGN_KEY_TABLE_NAME
FROM #T
ORDER BY 2,4
END
-- Below Block will give 2 Data Sets To Delete The Tables In Hierarchial Manner
-- #1 Is tables involved in child relationships (may be multiple levels)
-- #2 Is top level parent tables
ELSE IF @Operation ='DELETE'
BEGIN
SELECT 'DELETE FROM ' + FSC_NAME +'.'+ FKEYID + ';' AS 'SEQUENTIAL DELETE TABLE LIST (Child Tables)'
FROM
(
SELECT
TOP 10000 MIN( IID) AS IID ,
FSC_NAME,OBJECT_NAAME(FKEYID)AS FKEYID
FROM #t
GROUP BY FSC_NAME,FKEYID
ORDER BY 1
) X;
SELECT 'DELETE FROM ' + SCHEMA_NAME(SCHEMA_ID)+'.'+NAME AS 'SEQUENTIAL DELETE TABLE LIST (Parent Tables)'
FROM SYS.OBJECTS
WHERE TYPE = 'U'
AND NAME <> 'sysdiagrams'
AND OBJECT_ID NOT IN
(
SELECT FKEYID FROM #t
)
AND OBJECT_ID NOT IN
(
SELECT (T1.FKEYID) AS FKEYID
FROM SYSREFERENCES T1
JOIN SYSREFERENCES T2
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
WHERE T1.FKEYID <> T1.RKEYID
);
END
-- Below Block will give 2 Data Sets To Insert The Tables In Hierarchial Manner
-- #1 Is top level parent tables
-- #2 Is tables involved in child relationships (may be multiple levels)
ELSE IF @Operation ='INSERT'
BEGIN
--TO INSERT
SELECT
NAME AS 'SEQUENTIAL INSERT TABLE LIST (Parent Tables)'
FROM SYSOBJECTS
WHERE XTYPE = 'U'
AND NAME <> 'sysdiagrams'
AND ID NOT IN
(
SELECT FKEYID FROM #t
)
AND ID NOT IN
(
SELECT (T1.FKEYID
) AS FKEYID
FROM SYSREFERENCES T1
JOIN SYSREFERENCES T2
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
WHERE T1.FKEYID <> T1.RKEYID)
SELECT FKEYID AS 'SEQUENTIAL INSERT TABLE LIST (Child Tables)'
FROM
(
SELECT
TOP 10000 MIN(IID) AS IID,
OBJECT_NAME(FKEYID) AS FKEYID
FROM #t
GROUP BY FKEYID
ORDER BY MIN(IID) DESC
) X
END