Move Data from Table to File and Replace with URI
SQL Server 2000: Tested
SQL Server 2005: Tested
SQL Server 2008: Not Tested
SQL Server 2008R2: Not Tested
SQL Server 2012: Not Tested
Date: 29 Jul 2007
Several years ago, I was presented with a mass data relocation challenge that needed to be performed on a 24*7 system.
The company concerned had, in its early days, made a data design decision that was now coming back to bite them hard. They had decided to store their bulk mailing information in the OLTP database, and to retain full mailing history for CRM purposes. A valid decision when the customer base was small and shipping volumes were low, but with daily emails approaching 30,000, the system was starting to creak. To compound the problem, not only was the text of the email stored, but the full 15KB HTML (including static images) was also streamed to another column in the table.
The challenge was to extract and store the email text and html components into a network file share, and to replace the data with a URI location string.
The task was broken down into several discrete steps:
Identify the appropriate method for extracting and saving the emails into files.
Change the application to put new emails in the same file based storage and to insert a URI in the previous columns.
Extract all emails up to point #2 to file
Backfill all table entries up to point #2 with URI data
Recover any free space in the table
Step #1 - Identify Method
I did consider using DTS/SSIS to loop through all files and perform the extraction, but this seemed to be a bit like using a sledgehammer to crack a nut. This task were going to have to be batched and monitored closely to avoid interfering with normal production activities, and it seemed like the BCP command line suited this better.
I chose to invest time in producing a parameterised script that would produce BCP batch scripts as a results set (batch script #1 below).
These batches could then be run from the command line by and administrator without the need to use xp_cmdshell or worry about the complexities of DTS/SSIS proxy account, parameter and logging configuration.
A specific file name generation algorithm was developed (function definitions below) that would be used by both the migration process and the ongoing storage by the application. This assumed a new folder for each day, and that file names would include the PK and the date that the email was queued for sending.
Step #2 - Change Application
This one was purely down to the development team.
Once the application change had been rolled out, no more emails would be stored in the table and an endpoint for the export and backfill process would be known.
Step #3 - Extract Emails to file
The batch generator was run to generate batches to extract 1 months worth of emails each.
The scope was from day #1 up to the point that step #2 was implemented.
The batches were then manually run in sequence.
Step #4 - Replace Email with URI in table
Once all files were confirmed as extracted, a modified version of the batch generator was used to overwrite the table version of the email with a URI that the application could use to access the email (batch script #2 below)
Step #5 - Recover unused space in table
An index rebuild was tried, but this still left unused space in the table rows, so ultimately the table was recreated from scratch and data copied across. This was encapsulated in a transaction and run during a time of lower impact (when shipping was closed)
Points to note:
This code is written with a specific table and column set in mind, but can easily be adapted for any table, or even just to archive data to offline files in batches. The basic table structure is:
DateQueued DATETIME (date the email was added to the queue)
Textmessage VARCHAR(1000) (text of the message body)
HTMLMessage TEXT (entire HTML email)
I had originally saved this code in its working state, but obviously there needed to be some parameterisation and sanitizing of the code to keep anonymity of the company, so I created a test table with some data which is included as the last code block below, but the actual implementation code only contains <Tags> where values should be inserted as your server and folder names will be different.
Function #1 - Construct File Path:
EmailQueueID INT (the PK for the table
UserID INT (the PK for the customer to whom the message applies)
Function #2 - Construct Record Identifier:
Batch Script #1 - Create data export batch files:
Batch Script #2 - Update columns to URI values:
Test Script - Create test table and data: