Cool Tips‎ > ‎Data Formatting‎ > ‎

Data Formatting - Strip HTML Tags from a String

Applicability:

                 SQL Server 2000:        Tested **
                 SQL Server 2005:        Tested
                 SQL Server 2008:        Tested
                 SQL Server 2008R2:    Tested
                 SQL Server 2012:        Not Tested        

Credits:

Author:    Unknown
Date:        28 Jun 2011

Description

A function to remove all HTML tags from a string.
This is a fairly basic process that merely looks for  '<'  '>' pairs.  This will therefore strip a not equals sign from an equation or code, but the function is really intended to work on text.
It will also not strip out any ASCII codes or non tag HTML codes such as &nbsp
 
It may not be automation-worthy, but it will still save a lot of manual editing work.
 
** NOTE:  For this to work in SQL 2000, just substitute the VARCHAR(MAX) variables with VARCHAR(8000).

Code

 

DROP FUNCTION [dbo].[udf_StripHTML]
GO

CREATE FUNCTION [dbo].[udf_StripHTML]
(
  
@HTMLText VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
   DECLARE
      
@Start INT,
      
@End INT,
      
@Length INT
  
-- get the start and end points in the string of the first tag
  
SET @Start = CHARINDEX('<',@HTMLText)
  
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
  
SET @Length = (@End - @Start) + 1

  
-- process all tag in the string in a loop
  
WHILE ((@Start > 0) AND (@End > 0) AND (@Length > 0))
  
BEGIN
      
-- replace all occurrences of the tag with an empty string
       -- this is more efficient than just iterating through the string one tag at a time
      
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
      
      
-- get the next tag location
      
SET @Start = CHARINDEX('<',@HTMLText)
      
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
      
SET @Length = (@End - @Start) + 1
  
END
  
  
-- return the stripped string
  
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

 
 Testing:

DECLARE @InputString    VARCHAR(MAX),
      
@OutputString   VARCHAR(MAX)

-- an HTML extract from a code section one of the web pages in this site
SELECT @InputString = '</SPAN><SPAN style="COLOR: blue">END<BR>
<BR>
<BR>
</SPAN><SPAN style="COLOR: green">/*<BR>
select [dbo].[udf_padstring] (''Hello World'',''x'',1,0)<BR>
select [dbo].[udf_padstring] (''Hello World'',''x'',1,1)<BR>
select [dbo].[udf_padstring] (''Hello World'',''x'',1,2)<BR>
select [dbo].[udf_padstring] (''Hello World'',''x'',1,3)<BR>
<BR>
*/<BR>
</SPAN></CODE></P>'

SELECT @OutputString = [dbo].[udf_StripHTML](@InputString)

SELECT @OutputString


ċ
Strip all HTML tags from Text String.sql
(2k)
Andy Hughes,
Jun 7, 2012, 12:20 PM
Comments