Data Archiving - Hierarchical Data Deletion

Applicability:

                 SQL Server 2000:        Tested
                 SQL Server 2005:        Tested
                 SQL Server 2008:        Tested
                 SQL Server 2008R2:    Tested
                 SQL Server 2012:        Not Tested        

Credits:

Author:    Unknown (multiple)
Date:        2008/2009

Description

Often, there is a need to remove data from some or all tables in a relational database, or even to remove tables completely.  The challenge here is navigating the complexities of the PK/FK relationships between tables, especially in an unfamiliar system.
A greater challenge can be creating a robust scheduled process (for data archiving or refresh) as the data structures are subject to change.
 
There are 3 techniques that I have used in the past to achieve this:
  1. Trial and Error - initially time consuming and an ongoing pain in the rear end.
  2. Disable all constraints, delete and enable all constraints
  3. Create dynamic deletion statements with the ordering determined using system metadata
Technique #1 is the quickest an easiest, especially with the ability of Management Studio GUI to continue on error and to resubmit the same list repeatedly until all tables are gone.  Not the most elegant or supportable in the long term though and very much a brute force method to be used only when the rioting crowd around your desk demands immediate action
 
I will concentrate here on the last 2 techniques
 
NOTE:  For ease of typing, I will refer to Deletion, but the technique can be equally applied to Archiving as this just involves an additional step of transferring the data somewhere else before deletion.
 

Delete Hierarchically - Disable Constraint Method

This is still a somewhat brute force method, as all constraints on the tables are disabled, data is deleted followed by a reinstatement of the constraints.  This can be dangerous if only partial data is to be deleted - the deletion code must be coded carefully to ensure that constraints remain viable when enabled again.  This effectively means the constraints are duplicated in code and is not very maintainable in the long term.
 
The code provided generates 3 sets of results for each invocation.  They are lists of TSQL commands to:
  1. Disable constraints on all tables
  2. Delete data from all tables (this is where customisation would be necessary for archiving purposes)
  3. Enable constraints on all tables

 

Delete Hierarcichally - Ordered Delete Method

This is a more challenging method to design and code, but ultimately produces a more robust and maintainable process.   Full or partial data deletions are carried out in an order determined by the PK/FK relationships.  This ensures that each delete operation only succeeds if the constraint rules are not broken. 
 
The code provided here also generates 3 sets of results, but only one at a time dependent on parameters passed in.  The results are also not TSQL commands but are lists of objects to be consumed by custom deletion code.  The main point here is that the complex work of producing the hierarchically ordered list is done for you:
  1. List of all PK/FK relationships, with parent and child table names
  2. List of tables in data deletion order  (bottom up - from child to parent)
  3. List of tables in data insertion order (top down - from parent to child)
NOTE:  Only tables with a PK or FK defined will appear in the lists.
 
There is a function-based implementation of the ordered table list which also produces a hirearchy level indicator which can also be used to drive custom INSERT/DELETE code.

Code 

Delete Hierarchichally - Disable Constraint Method

DROP PROCEDURE [dbo].[usp_DeleteHirearchically_ConstraintMethod]
GO

CREATE PROCEDURE [dbo].[usp_DeleteHirearchically_ConstraintMethod]
AS

-- generate list of commands to disable all constraints
SELECT
    
'alter table '
  
+ QUOTENAME(table_schema)
   +
'.'
  
+ QUOTENAME(table_name)
   +
' nocheck constraint all;'
FROM information_schema.tables
WHERE  OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)
           +
'.'
          
+ QUOTENAME(table_name)), 'ismsshipped') = 0
AND table_type = 'base table'
--AND table_schema = 'dbo'


-- generate list of commands to delete data from all tables
SELECT
    
'delete from '
  
+ QUOTENAME(table_schema)
   +
'.'
  
+ QUOTENAME(table_name)
   +
' ;'
FROM information_schema.tables
WHERE  OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)
           +
'.'
          
+ QUOTENAME(table_name)), 'ismsshipped') = 0
AND table_type = 'base table'
--AND table_schema = 'dbo'


-- generate list of commands to enable all constraints
SELECT
    
'alter table '
  
+ QUOTENAME(table_schema)
   +
'.'
  
+ QUOTENAME(table_name)
   +
' check constraint all;'
FROM information_schema.tables
WHERE  OBJECTPROPERTY(OBJECT_ID(QUOTENAME(table_schema)
           +
'.'
          
+ QUOTENAME(table_name)), 'ismsshipped') = 0
AND table_type = 'base table'
--AND table_schema = 'dbo'


Delete Hierarchichally - Ordered Delete Method 

/*
-- To Display hierarchical list of the tables with foreign keys and the tables those foreign keys are referencing
   EXEC usp_DeleteHirearchically_OrderedDeletionMethod

-- To Display the list of tables to delete in bottom-up order
   EXEC usp_DeleteHirearchically_OrderedDeletionMethod 'DELETE'

-- To Display the list of tables to insert in top-down order
   EXEC usp_DeleteHirearchically_OrderedDeletionMethod 'INSERT'

*/

DROP PROCEDURE [usp_DeleteHirearchically_OrderedDeletionMethod]
GO


CREATE PROCEDURE [dbo].[usp_DeleteHirearchically_OrderedDeletionMethod](@Operation VARCHAR(6)=NULL)
AS

SET NOCOUNT ON

-- From SysReferences Table: Get the Foreign Key and Reference Key table list
-- From Sys.Objects Table : Get the schema name of the tables
-- Self Join SysReference Table again to remove the conflicted tables (2 tables involved in circular reference)

DECLARE
  
@rc INT

SELECT
SCHEMA_NAME(SOF.SCHEMA_ID)           AS FSC_NAME,
      
T1.FKEYID                          AS FKEYID,
      
SCHEMA_NAME(SOR.SCHEMA_ID)         AS RSC_NAME,
      
T1.RKEYID                          AS RKEYID
INTO   #sysref
FROM   SYSREFERENCES T1
LEFT OUTER JOIN sys.objects SOF
  
ON SOF.OBJECT_ID = T1.FKEYID
LEFT OUTER JOIN sys.objects SOR
  
ON SOR.OBJECT_ID = T1.RKEYID
LEFT OUTER JOIN SYSREFERENCES T2
  
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
WHERE  T2.FKEYID IS NULL

--- Below Query is same as above , but it will give the list of conflicted tables (2 tables in a circular reference)
--  You need to manually delete the data from these tables, if listed.
SELECT SCHEMA_NAME(SOF.SCHEMA_ID)+'.'+OBJECT_NAME(T1.FKEYID) AS '--',
      
SCHEMA_NAME(SOR.SCHEMA_ID)+'.'+OBJECT_NAME(T1.RKEYID) AS '--'
FROM SYSREFERENCES T1
LEFT OUTER JOIN sys.objects SOF
  
ON SOF.OBJECT_ID = T1.FKEYID
LEFT OUTER JOIN sys.objects SOR
  
ON SOR.OBJECT_ID = T1.RKEYID
JOIN SYSREFERENCES T2    
  
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
    AND
T1.FKEYID <> T1.RKEYID

SET @rc =@@ROWCOUNT

IF @rc > 0
BEGIN
   PRINT
'--Above ' + CAST(@rc AS VARCHAR(90))+' tables are referenced each other and can''t delete with delete statement'
END


--Recursive CTE to return the Hierarchial Data
;WITH CTE(FKEYID,LEVEL)
AS
  
(
      
SELECT
          
FKEYID,
          
0
      
FROM SYSREFERENCES

      
UNION ALL
      
      
SELECT
          
T1.RKEYID ,
          
LEVEL+1
      
FROM #sysref T1
      
JOIN CTE T2 ON T1.FKEYID = T2.FKEYID
  
)

--Dump The Hierarchial Data Into A Temp Table With Identity Column.
--Remove The Duplicate Table Names From The List (comes when one table refers to 2 tables or more) Through Using The Max Function
SELECT
  
IDENTITY(INT,1,1)  AS iid,
  
FSC_NAME               AS FSC_NAME,
  
T1.FKEYID          AS FKEYID,
  
RSC_NAME               AS RSC_NAME,
  
T1.RKEYID          AS RKEYID,
  
MAX(LEVEL)         AS LEVEL
INTO
#T
FROM #sysref T1
INNER JOIN CTE T2
  
ON T1.RKEYID = T2.FKEYID
WHERE LEVEL >0
GROUP BY
  
T1.FSC_NAME,
  
T1.FKEYID,
  
T1.RSC_NAME,
  
T1.RKEYID
ORDER BY MAX(LEVEL) ASC


-- Display hierarchical list of the tables with foreign keys and the tables those foreign keys are referencing
IF @Operation IS NULL
BEGIN
   SELECT
      
RSC_NAME AS PK_SCHEMA_NAME,
      
OBJECT_NAME(RKEYID) AS PRIMARY_KEY_TABLE_NAME,
      
FSC_NAME AS FK_SCHEMA_NAME,
      
OBJECT_NAME(FKEYID)AS FOREIGN_KEY_TABLE_NAME
  
FROM #T
  
ORDER BY 2,4
END




-- Below Block will give 2 Data Sets To Delete The Tables In Hierarchial Manner
-- #1 Is tables involved in child relationships (may be multiple levels)
-- #2 Is top level parent tables
ELSE IF @Operation ='DELETE'
BEGIN

   SELECT
'DELETE FROM ' + FSC_NAME +'.'+ FKEYID + ';' AS 'SEQUENTIAL DELETE TABLE LIST (Child Tables)'
  
FROM
      
(
          
SELECT
               TOP
10000 MIN( IID) AS IID ,
              
FSC_NAME,OBJECT_NAAME(FKEYID)AS FKEYID
              
FROM #t
              
GROUP BY FSC_NAME,FKEYID
              
ORDER BY 1
      
) X;

  
SELECT 'DELETE FROM ' + SCHEMA_NAME(SCHEMA_ID)+'.'+NAME  AS 'SEQUENTIAL DELETE TABLE LIST (Parent Tables)'
  
FROM SYS.OBJECTS
  
WHERE TYPE = 'U'
  
AND NAME <> 'sysdiagrams'
  
AND OBJECT_ID NOT IN
      
(
          
SELECT FKEYID FROM #t
      
)
   AND
OBJECT_ID NOT IN
      
(
          
SELECT (T1.FKEYID) AS FKEYID
          
FROM SYSREFERENCES T1
          
JOIN SYSREFERENCES T2
              
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
          
WHERE T1.FKEYID <> T1.RKEYID
      
);
END


-- Below Block will give 2 Data Sets To Insert The Tables In Hierarchial Manner
-- #1 Is top level parent tables
-- #2 Is tables involved in child relationships (may be multiple levels)
ELSE IF @Operation ='INSERT'
BEGIN

--TO INSERT

  
SELECT
      
NAME AS 'SEQUENTIAL INSERT TABLE LIST (Parent Tables)'
      
FROM SYSOBJECTS
      
WHERE XTYPE = 'U'
      
AND NAME <> 'sysdiagrams'
      
AND ID NOT IN
          
(
              
SELECT FKEYID FROM #t
          
)
       AND
ID NOT IN
          
(
              
SELECT (T1.FKEYID
          
) AS FKEYID
  
FROM SYSREFERENCES T1
  
JOIN SYSREFERENCES T2
      
ON OBJECT_NAME(T1.FKEYID)+OBJECT_NAME(T1.RKEYID) = OBJECT_NAME(T2.RKEYID)+OBJECT_NAME(T2.FKEYID)
  
WHERE T1.FKEYID <> T1.RKEYID)


  
SELECT FKEYID AS 'SEQUENTIAL INSERT TABLE LIST (Child Tables)'
  
FROM
      
(
          
SELECT
               TOP
10000 MIN(IID)   AS IID,
              
OBJECT_NAME(FKEYID)    AS FKEYID
          
FROM #t
          
GROUP BY FKEYID
          
ORDER BY MIN(IID) DESC
      
) X
END


ċ
Hierachical Deletion - Disabled Constraint Method.sql
(1k)
Andy Hughes,
Jun 14, 2012, 10:13 AM
ċ
Hierachical Deletion - Ordered Deletion Method.sql
(6k)
Andy Hughes,
Jun 14, 2012, 10:13 AM
Comments