Managing SQL Agent Jobs in an Availability Group

Applicability:

SQL Server 2000: N/A

SQL Server 2005: N/A

SQL Server 2008: N/A

SQL Server 2008R2: N/A

SQL Server 2012: Tested

SQL Server 2014: Tested

SQL Server 2016: Tested

SQL Server 2017: Tested

Credits:

Author: ChillyDBA

Date: 7 Jun 2020

Description

Occasionally, as a DBA, you will come across a problem that just causes you to say 'Ummmmmm....!?'

On a recent upgrade project, one of the application databases was due to be upgraded from SQL 2008 to SQL 2016 and placed in an Availability Group.

Simple right? Nothing I haven't done before. Right!

Well, this database had 2 additional features implemented

    • Change Data Capture - with SQL Agent jobs to control the service involved

    • SQL Audting - with SQL Agent jobs to harvest changes with an ETL job

Somebody had a real 'lets play with new toys' day when they built this one.

The problem here is that neither of these tools (and SQL Agent) are really cluster aware, which means that the associated SQL Agent jobs must be manually Stopped/Started/Disabled on failover, and I could find no easy way of doing this as part of the failover event. With legacy SQL Clustering this was simpler as the services/jobs could only run on the active node, but with SQL AGs, all nodes are effectively active, so the services or jobs could run on all nodes without immediate error. The problems would only surface after a failover and failback cycle, when CDC and SQLAudit data would be stored in 2 places

I therefore constructed a SQL Agent job that runs continuously on each AG node and which will take care of reacting to a failover event having happened. It's not perfect, but the delay is only seconds, during which the AG failover will mostly still be happening, so the potential loss of data is mitigated.

It can also be extended to manage any other SQL Agent jobs such as ETL or batch processing jobs.

The code below will create the SQL Agent job, but it must be altered to reflect the specific jobs that are in scope before you create it:

    • #AgentJobList table - to manage CDC and ETL/Batch SQL Agent Jobs

      • AgentJobName - name of the SQL Agent job to be managed

      • AgentJobRunContinuous - does this job run continuously but have a schedule to ensure it starts? 1 = Yes, 0 = No

      • AgentJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time

      • AgentJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time

    • #AuditJobList table - To manage SQL Audit SQL Agent Jobs

      • AuditJobName - name of the SQL Agent job to be managed

      • AuditJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time

      • AuditJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time

    • If there are no SQL Audit jobs, leave the table empty

The SQL Agent job that is created is set to run continuously

The actions performed are basically to enable/disable/start/stop the required jobs depending on AG status (PRIMARY/SECONDARY) so that the jobs will only be active/running on the PRIMARY.

This has been tested with several AGs, but I only have the one type of AG configuration, so it may not be absolutely bullet-proof.

The code prints out confirmation of every action, so the results can be examined using SQLAgent job history dialog.

However, before letting it loose in a PROD environment, the code should be extracted from the only job step and can be run manually on each node to check that it having the desired effect.

Have fun.

Code

USE [msdb]

GO


/****** Object: Job [AAAA - Set all AAAA SQL Agent Jobs to corect state] Script Date: 12/19/2019 10:09:42 AM ******/

BEGIN TRANSACTION

DECLARE @ReturnCode INT

SELECT @ReturnCode = 0

/****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 12/19/2019 10:09:42 AM ******/

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)

BEGIN

EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback


END


DECLARE @jobId BINARY(16)

EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'AVAILABILITY GROUP - Set all scheduled Jobs to corect state',

@enabled=1,

@notify_level_eventlog=0,

@notify_level_email=0,

@notify_level_netsend=0,

@notify_level_page=0,

@delete_level=0,

@description=N'This job will examine the AG node state (Primary or Secondary) and ENABLE/DISABLE all CDC, SQL Audit and SQLAgent jobs that are involved in the maitneance of these 2 data sets to the correct state


ie. All enabled/running on primary and disabled/stopped on secodnary',

@category_name=N'[Uncategorized (Local)]',

@owner_login_name=N'sa', @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object: Step [Perform chexcks and change job startes] Script Date: 12/19/2019 10:09:42 AM ******/

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Perform chexcks and change job startes',

@step_id=1,

@cmdexec_success_code=0,

@on_success_action=1,

@on_success_step_id=0,

@on_fail_action=2,

@on_fail_step_id=0,

@retry_attempts=0,

@retry_interval=0,

@os_run_priority=0, @subsystem=N'TSQL',

@command=N'SET NOCOUNT ON


IF (SELECT OBJECT_ID(''tempdb..#AgentJobList'') ) IS NOT NULL

DROP TABLE #AgentJobList


CREATE TABLE #AgentJobList

(

AgentJobName SYSNAME

,AgentJobRunContinuous INT -- does this job run continuously but have a schedule to ensure it starts?

,AgentJobExists INT -- a flag that tracks whether the specified SQL Agent job exists - populated at run time

,AgentJobRunning INT -- a flag that tracks whether the specified SQL Agent job is running - populated at run time

)


IF (SELECT OBJECT_ID(''tempdb..#AuditJobList'') ) IS NOT NULL

DROP TABLE #AuditJobList


CREATE TABLE #AuditJobList

(

AuditJobName SYSNAME

,AuditJobExists INT

,AuditJobRunning INT

)


DECLARE

@AgentJobRunning INT

,@AgentJobName SYSNAME

,@IsPrimary INT

,@AgentJobRunContinuous INT

,@AuditJobRunning INT

,@AuditJobName SYSNAME

,@Command NVARCHAR(MAX)



-- populate the table that determines the scope of the checks for SQL Agent Jobs

-- To exclude this functionality, just comment out the code below that loads the temp table. No other changes are required

INSERT #AgentJobList (AgentJobName, AgentJobRunContinuous, AgentJobExists, AgentJobRunning)

VALUES

(''cdc.AAAA_capture'', 1, 0, 0)

,(''cdc.AAAA_cleanup'', 0, 0, 0)

,(''CHECK THAT AAAA AUDIT IS RUNNING'', 0, 0, 0)

,(''LOAD AAAA AUDIT RECORD'', 0, 0, 0)

,(''LOAD AAAA CDC ARCHIVE RECORDS LSN'', 0, 0, 0)



-- populate the table that determines the SQL Audit jobs to check

-- SQL Audit jobs are a core DBMS function controlled by an API that is separate from SQLAgent

-- To exclude this functionality, just comment out the code below that loads the temp table. No other changes are required

INSERT #AuditJobList (AuditJobName, AuditJobExists, AuditJobRunning)

VALUES

(''AAAA ACCESS AUDIT'', 0, 0)



--determine whether the current node is primary

--this assumes that there is only one AG on the server

IF EXISTS

(

SELECT

sd.name AS DatabaseName --return the DB name in case this needs to be adapted for multiple AGs

,hars.role_desc

FROM sys.databases sd

INNER JOIN sys.dm_hadr_availability_replica_states hars

ON sd.replica_id = hars.replica_id

WHERE sd.replica_id IS NOT NULL

AND hars.role_desc = ''PRIMARY''

--AND sd.name = @DatabaseName --filter the DB name in case this needs to be adapted for multiple AGs

)


SELECT @IsPrimary = 1

ELSE

SELECT @IsPrimary = 0


--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- BEGIN - Check and alter status of SQL Agent Jobs

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


--check which agent jobs exist

UPDATE jl

SET jl.AgentJobExists = CASE

WHEN sj.name IS NULL THEN 0

ELSE 1

END

FROM #AgentJobList jl

LEFT OUTER JOIN msdb..sysjobs sj

ON jl.AgentJobName = sj.name


--check which agent jobs are running

UPDATE jl

SET jl.AgentJobRunning = CASE

WHEN activity.run_Requested_date IS NOT NULL AND activity.stop_execution_date IS NULL THEN 1

ELSE 0

END

--select *

FROM #AgentJobList jl

LEFT OUTER JOIN msdb.dbo.sysjobs_view job

ON job.name = jl.AgentJobName

INNER JOIN msdb.dbo.sysjobactivity activity

ON job.job_id = activity.job_id


INNER JOIN msdb.dbo.syssessions sess

ON sess.session_id = activity.session_id

INNER JOIN

(

SELECT

MAX( agent_start_date ) AS max_agent_start_date

FROM msdb.dbo.syssessions

) sess_max

ON sess.agent_start_date = sess_max.max_agent_start_date

WHERE run_requested_date IS NOT NULL

AND stop_execution_date IS NULL


--now iterate through existing agent jobs and change the state appropriately

SELECT @AgentJobName = MIN(AgentJobName)

FROM #AgentJobList

WHERE AgentJobExists = 1


WHILE @AgentJobName IS NOT NULL

BEGIN

SELECT

@AgentJobRunning = AgentJobRunning

,@AgentJobRunContinuous = AgentJobRunContinuous

FROM #AgentJobList

WHERE AgentJobName = @AgentJobName


IF @IsPrimary = 1

BEGIN

IF @AgentJobRunContinuous = 1

BEGIN

IF @AgentJobRunning = 1

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1

--no run action

PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''

PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change''

END

ELSE IF @AgentJobRunning = 0

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1

EXEC msdb..sp_start_job @job_name = @AgentJobName

PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''

PRINT ''Job: ['' + @AgentJobName + ''] STARTED''

END

END

ELSE IF @AgentJobRunContinuous = 0

BEGIN

IF @AgentJobRunning = 1

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1

--no run action

PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''

PRINT ''Job: ['' + @AgentJobName + ''] STARTED''

END

ELSE IF @AgentJobRunning = 0

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1

--no run action

PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''

PRINT ''Job: ['' + @AgentJobName + ''] STARTED''

END

END

END

ELSE IF @IsPrimary = 0

BEGIN

IF @AgentJobRunContinuous = 1

BEGIN

IF @AgentJobRunning = 1

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0

EXEC msdb..sp_stop_job @job_name = @AgentJobName

PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''

PRINT ''Job: ['' + @AgentJobName + ''] STOPPED''

END

ELSE IF @AgentJobRunning = 0

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0

--no run action

PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''

PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change''

END

END

ELSE IF @AgentJobRunContinuous = 0

BEGIN

IF @AgentJobRunning = 1

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0

EXEC msdb..sp_stop_job @job_name = @AgentJobName

PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''

PRINT ''Job: ['' + @AgentJobName + ''] STOPPED''

END

ELSE IF @AgentJobRunning = 0

BEGIN

EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0

--no run action

PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''

PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change''

END

END

END

SELECT

@AgentJobName = MIN(AgentJobName)

FROM #AgentJobList

WHERE AgentJobExists = 1

AND AgentJobName > @AgentJobName

END

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- END - Check and alter status of SQL Agent Jobs

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- BEGIN - Check and alter status of SQL Audit Jobs

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


--check which audit jobs exist

UPDATE jl

SET jl.AuditJobExists = CASE

WHEN astat.name IS NULL THEN 0

ELSE 1

END

FROM #AuditJobList jl

LEFT OUTER JOIN sys.dm_server_audit_status astat

ON jl.AuditJobName = astat.name



--check which audit jobs are running

UPDATE jl

SET jl.AuditJobRunning = CASE

WHEN astat.status_desc = ''STARTED'' THEN 1

ELSE 0

END

FROM #AuditJobList jl

LEFT OUTER JOIN sys.dm_server_audit_status astat

ON jl.AuditJobName = astat.name


--now iterate through existing agent jobs and change the state appropriately


SELECT @AuditJobName = MIN(AuditJobName)

FROM #AuditJobList

WHERE AuditJobExists = 1


WHILE @AuditJobName IS NOT NULL

BEGIN

SELECT

@AuditJobRunning = AuditJobRunning

FROM #AuditJobList

WHERE AuditJobName = @AuditJobName


IF @IsPrimary = 1

IF @AuditJobRunning = 1

BEGIN

--no run action

PRINT ''Job: ['' + @AuditJobName + ''] STARTED''

END

ELSE IF @AuditJobRunning = 0

BEGIN

SELECT @Command = ''ALTER SERVER AUDIT '' + @AuditJobName + '' AUDIT WITH (STATE = ON)''

EXEC(@Command)

PRINT ''Job: ['' + @AuditJobName + ''] STARTED''

END


ELSE IF @IsPrimary = 0

IF @AuditJobRunning = 1

BEGIN

SELECT @Command = ''ALTER SERVER AUDIT '' + @AuditJobName + '' AUDIT WITH (STATE = OFF)''

EXEC(@Command)

PRINT ''Job: ['' + @AuditJobName + ''] STOPPED''


END

ELSE IF @AuditJobRunning = 0

BEGIN

--no run action

PRINT ''Job: ['' + @AuditJobName + ''] STOPPED''

END


SELECT @AuditJobName = MIN(AuditJobName)

FROM #AuditJobList

WHERE AuditJobExists = 1

AND AuditJobName > @AuditJobName

END


--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- END - Check and alter status of SQL Audit Jobs

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


--select * from #AgentJobList

--select * from #AuditJobList



SET NOCOUNT OFF



',

@database_name=N'master',

@flags=8

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every minute',

@enabled=1,

@freq_type=4,

@freq_interval=1,

@freq_subday_type=4,

@freq_subday_interval=1,

@freq_relative_interval=0,

@freq_recurrence_factor=0,

@active_start_date=20191219,

@active_end_date=99991231,

@active_start_time=0,

@active_end_time=235959,

@schedule_uid=N'fdd9f0a2-807d-423a-82b0-c5cb4f30ba92'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:


GO