Strip HTML Tags from a String

Applicability:

SQL Server 2000: Tested **

SQL Server 2005: Tested

SQL Server 2008: Tested

SQL Server 2008R2: Tested

SQL Server 2012: Not Tested

Credits:

Author: Unknown

Date: 28 Jun 2011

Description

A function to remove all HTML tags from a string.

This is a fairly basic process that merely looks for '<' '>' pairs. This will therefore strip a not equals sign from an equation or code, but the function is really intended to work on text.

It will also not strip out any ASCII codes or non tag HTML codes such as &nbsp

It may not be automation-worthy, but it will still save a lot of manual editing work.

** NOTE: For this to work in SQL 2000, just substitute the VARCHAR(MAX) variables with VARCHAR(8000).

Code

DROP FUNCTION [dbo].[udf_StripHTML]

GO

CREATE FUNCTION [dbo].[udf_StripHTML]

(

@HTMLText VARCHAR(MAX)

)

RETURNS VARCHAR(MAX)

AS

BEGIN

DECLARE

@Start INT,

@End INT,

@Length INT

-- get the start and end points in the string of the first tag

SET @Start = CHARINDEX('<',@HTMLText)

SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))

SET @Length = (@End - @Start) + 1

-- process all tag in the string in a loop

WHILE ((@Start > 0) AND (@End > 0) AND (@Length > 0))

BEGIN

-- replace all occurrences of the tag with an empty string

-- this is more efficient than just iterating through the string one tag at a time

SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')

-- get the next tag location

SET @Start = CHARINDEX('<',@HTMLText)

SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))

SET @Length = (@End - @Start) + 1

END

-- return the stripped string

RETURN LTRIM(RTRIM(@HTMLText))

END

GO

Testing:

DECLARE @InputString VARCHAR(MAX),

@OutputString VARCHAR(MAX)

-- an HTML extract from a code section one of the web pages in this site

SELECT @InputString = '</SPAN><SPAN style="COLOR: blue">END<BR>

<BR>

<BR>

</SPAN><SPAN style="COLOR: green">/*<BR>

select [dbo].[udf_padstring] (''Hello World'',''x'',1,0)<BR>

select [dbo].[udf_padstring] (''Hello World'',''x'',1,1)<BR>

select [dbo].[udf_padstring] (''Hello World'',''x'',1,2)<BR>

select [dbo].[udf_padstring] (''Hello World'',''x'',1,3)<BR>

<BR>

*/<BR>

</SPAN></CODE></P>'

SELECT @OutputString = [dbo].[udf_StripHTML](@InputString)

SELECT @OutputString