Managing SQL Agent Jobs in an Availability Group
Applicability:
SQL Server 2000: N/A
SQL Server 2005: N/A
SQL Server 2008: N/A
SQL Server 2008R2: N/A
SQL Server 2012: Tested
SQL Server 2014: Tested
SQL Server 2016: Tested
SQL Server 2017: Tested
Credits:
Author: ChillyDBA
Date: 7 Jun 2020
Description
Occasionally, as a DBA, you will come across a problem that just causes you to say 'Ummmmmm....!?'
On a recent upgrade project, one of the application databases was due to be upgraded from SQL 2008 to SQL 2016 and placed in an Availability Group.
Simple right? Nothing I haven't done before. Right!
Well, this database had 2 additional features implemented
Change Data Capture - with SQL Agent jobs to control the service involved
SQL Audting - with SQL Agent jobs to harvest changes with an ETL job
Somebody had a real 'lets play with new toys' day when they built this one.
The problem here is that neither of these tools (and SQL Agent) are really cluster aware, which means that the associated SQL Agent jobs must be manually Stopped/Started/Disabled on failover, and I could find no easy way of doing this as part of the failover event. With legacy SQL Clustering this was simpler as the services/jobs could only run on the active node, but with SQL AGs, all nodes are effectively active, so the services or jobs could run on all nodes without immediate error. The problems would only surface after a failover and failback cycle, when CDC and SQLAudit data would be stored in 2 places
I therefore constructed a SQL Agent job that runs continuously on each AG node and which will take care of reacting to a failover event having happened. It's not perfect, but the delay is only seconds, during which the AG failover will mostly still be happening, so the potential loss of data is mitigated.
It can also be extended to manage any other SQL Agent jobs such as ETL or batch processing jobs.
The code below will create the SQL Agent job, but it must be altered to reflect the specific jobs that are in scope before you create it:
#AgentJobList table - to manage CDC and ETL/Batch SQL Agent Jobs
AgentJobName - name of the SQL Agent job to be managed
AgentJobRunContinuous - does this job run continuously but have a schedule to ensure it starts? 1 = Yes, 0 = No
AgentJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time
AgentJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time
#AuditJobList table - To manage SQL Audit SQL Agent Jobs
AuditJobName - name of the SQL Agent job to be managed
AuditJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time
AuditJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time
If there are no SQL Audit jobs, leave the table empty
The SQL Agent job that is created is set to run continuously
The actions performed are basically to enable/disable/start/stop the required jobs depending on AG status (PRIMARY/SECONDARY) so that the jobs will only be active/running on the PRIMARY.
This has been tested with several AGs, but I only have the one type of AG configuration, so it may not be absolutely bullet-proof.
The code prints out confirmation of every action, so the results can be examined using SQLAgent job history dialog.
However, before letting it loose in a PROD environment, the code should be extracted from the only job step and can be run manually on each node to check that it having the desired effect.
Have fun.
Code
USE [msdb]
GO
/****** Object: Job [AAAA - Set all AAAA SQL Agent Jobs to corect state] Script Date: 12/19/2019 10:09:42 AM ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 12/19/2019 10:09:42 AM ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'AVAILABILITY GROUP - Set all scheduled Jobs to corect state',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'This job will examine the AG node state (Primary or Secondary) and ENABLE/DISABLE all CDC, SQL Audit and SQLAgent jobs that are involved in the maitneance of these 2 data sets to the correct state
ie. All enabled/running on primary and disabled/stopped on secodnary',
@category_name=N'[Uncategorized (Local)]',
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Perform chexcks and change job startes] Script Date: 12/19/2019 10:09:42 AM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Perform chexcks and change job startes',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'SET NOCOUNT ON
IF (SELECT OBJECT_ID(''tempdb..#AgentJobList'') ) IS NOT NULL
DROP TABLE #AgentJobList
CREATE TABLE #AgentJobList
(
AgentJobName SYSNAME
,AgentJobRunContinuous INT -- does this job run continuously but have a schedule to ensure it starts?
,AgentJobExists INT -- a flag that tracks whether the specified SQL Agent job exists - populated at run time
,AgentJobRunning INT -- a flag that tracks whether the specified SQL Agent job is running - populated at run time
)
IF (SELECT OBJECT_ID(''tempdb..#AuditJobList'') ) IS NOT NULL
DROP TABLE #AuditJobList
CREATE TABLE #AuditJobList
(
AuditJobName SYSNAME
,AuditJobExists INT
,AuditJobRunning INT
)
DECLARE
@AgentJobRunning INT
,@AgentJobName SYSNAME
,@IsPrimary INT
,@AgentJobRunContinuous INT
,@AuditJobRunning INT
,@AuditJobName SYSNAME
,@Command NVARCHAR(MAX)
-- populate the table that determines the scope of the checks for SQL Agent Jobs
-- To exclude this functionality, just comment out the code below that loads the temp table. No other changes are required
INSERT #AgentJobList (AgentJobName, AgentJobRunContinuous, AgentJobExists, AgentJobRunning)
VALUES
(''cdc.AAAA_capture'', 1, 0, 0)
,(''cdc.AAAA_cleanup'', 0, 0, 0)
,(''CHECK THAT AAAA AUDIT IS RUNNING'', 0, 0, 0)
,(''LOAD AAAA AUDIT RECORD'', 0, 0, 0)
,(''LOAD AAAA CDC ARCHIVE RECORDS LSN'', 0, 0, 0)
-- populate the table that determines the SQL Audit jobs to check
-- SQL Audit jobs are a core DBMS function controlled by an API that is separate from SQLAgent
-- To exclude this functionality, just comment out the code below that loads the temp table. No other changes are required
INSERT #AuditJobList (AuditJobName, AuditJobExists, AuditJobRunning)
VALUES
(''AAAA ACCESS AUDIT'', 0, 0)
--determine whether the current node is primary
--this assumes that there is only one AG on the server
IF EXISTS
(
SELECT
sd.name AS DatabaseName --return the DB name in case this needs to be adapted for multiple AGs
,hars.role_desc
FROM sys.databases sd
INNER JOIN sys.dm_hadr_availability_replica_states hars
ON sd.replica_id = hars.replica_id
WHERE sd.replica_id IS NOT NULL
AND hars.role_desc = ''PRIMARY''
--AND sd.name = @DatabaseName --filter the DB name in case this needs to be adapted for multiple AGs
)
SELECT @IsPrimary = 1
ELSE
SELECT @IsPrimary = 0
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- BEGIN - Check and alter status of SQL Agent Jobs
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--check which agent jobs exist
UPDATE jl
SET jl.AgentJobExists = CASE
WHEN sj.name IS NULL THEN 0
ELSE 1
END
FROM #AgentJobList jl
LEFT OUTER JOIN msdb..sysjobs sj
ON jl.AgentJobName = sj.name
--check which agent jobs are running
UPDATE jl
SET jl.AgentJobRunning = CASE
WHEN activity.run_Requested_date IS NOT NULL AND activity.stop_execution_date IS NULL THEN 1
ELSE 0
END
--select *
FROM #AgentJobList jl
LEFT OUTER JOIN msdb.dbo.sysjobs_view job
ON job.name = jl.AgentJobName
INNER JOIN msdb.dbo.sysjobactivity activity
ON job.job_id = activity.job_id
INNER JOIN msdb.dbo.syssessions sess
ON sess.session_id = activity.session_id
INNER JOIN
(
SELECT
MAX( agent_start_date ) AS max_agent_start_date
FROM msdb.dbo.syssessions
) sess_max
ON sess.agent_start_date = sess_max.max_agent_start_date
WHERE run_requested_date IS NOT NULL
AND stop_execution_date IS NULL
--now iterate through existing agent jobs and change the state appropriately
SELECT @AgentJobName = MIN(AgentJobName)
FROM #AgentJobList
WHERE AgentJobExists = 1
WHILE @AgentJobName IS NOT NULL
BEGIN
SELECT
@AgentJobRunning = AgentJobRunning
,@AgentJobRunContinuous = AgentJobRunContinuous
FROM #AgentJobList
WHERE AgentJobName = @AgentJobName
IF @IsPrimary = 1
BEGIN
IF @AgentJobRunContinuous = 1
BEGIN
IF @AgentJobRunning = 1
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1
--no run action
PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''
PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change''
END
ELSE IF @AgentJobRunning = 0
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1
EXEC msdb..sp_start_job @job_name = @AgentJobName
PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''
PRINT ''Job: ['' + @AgentJobName + ''] STARTED''
END
END
ELSE IF @AgentJobRunContinuous = 0
BEGIN
IF @AgentJobRunning = 1
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1
--no run action
PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''
PRINT ''Job: ['' + @AgentJobName + ''] STARTED''
END
ELSE IF @AgentJobRunning = 0
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1
--no run action
PRINT ''Job: ['' + @AgentJobName + ''] ENABLED''
PRINT ''Job: ['' + @AgentJobName + ''] STARTED''
END
END
END
ELSE IF @IsPrimary = 0
BEGIN
IF @AgentJobRunContinuous = 1
BEGIN
IF @AgentJobRunning = 1
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0
EXEC msdb..sp_stop_job @job_name = @AgentJobName
PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''
PRINT ''Job: ['' + @AgentJobName + ''] STOPPED''
END
ELSE IF @AgentJobRunning = 0
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0
--no run action
PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''
PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change''
END
END
ELSE IF @AgentJobRunContinuous = 0
BEGIN
IF @AgentJobRunning = 1
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0
EXEC msdb..sp_stop_job @job_name = @AgentJobName
PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''
PRINT ''Job: ['' + @AgentJobName + ''] STOPPED''
END
ELSE IF @AgentJobRunning = 0
BEGIN
EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0
--no run action
PRINT ''Job: ['' + @AgentJobName + ''] DISABLED''
PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change''
END
END
END
SELECT
@AgentJobName = MIN(AgentJobName)
FROM #AgentJobList
WHERE AgentJobExists = 1
AND AgentJobName > @AgentJobName
END
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- END - Check and alter status of SQL Agent Jobs
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- BEGIN - Check and alter status of SQL Audit Jobs
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--check which audit jobs exist
UPDATE jl
SET jl.AuditJobExists = CASE
WHEN astat.name IS NULL THEN 0
ELSE 1
END
FROM #AuditJobList jl
LEFT OUTER JOIN sys.dm_server_audit_status astat
ON jl.AuditJobName = astat.name
--check which audit jobs are running
UPDATE jl
SET jl.AuditJobRunning = CASE
WHEN astat.status_desc = ''STARTED'' THEN 1
ELSE 0
END
FROM #AuditJobList jl
LEFT OUTER JOIN sys.dm_server_audit_status astat
ON jl.AuditJobName = astat.name
--now iterate through existing agent jobs and change the state appropriately
SELECT @AuditJobName = MIN(AuditJobName)
FROM #AuditJobList
WHERE AuditJobExists = 1
WHILE @AuditJobName IS NOT NULL
BEGIN
SELECT
@AuditJobRunning = AuditJobRunning
FROM #AuditJobList
WHERE AuditJobName = @AuditJobName
IF @IsPrimary = 1
IF @AuditJobRunning = 1
BEGIN
--no run action
PRINT ''Job: ['' + @AuditJobName + ''] STARTED''
END
ELSE IF @AuditJobRunning = 0
BEGIN
SELECT @Command = ''ALTER SERVER AUDIT '' + @AuditJobName + '' AUDIT WITH (STATE = ON)''
EXEC(@Command)
PRINT ''Job: ['' + @AuditJobName + ''] STARTED''
END
ELSE IF @IsPrimary = 0
IF @AuditJobRunning = 1
BEGIN
SELECT @Command = ''ALTER SERVER AUDIT '' + @AuditJobName + '' AUDIT WITH (STATE = OFF)''
EXEC(@Command)
PRINT ''Job: ['' + @AuditJobName + ''] STOPPED''
END
ELSE IF @AuditJobRunning = 0
BEGIN
--no run action
PRINT ''Job: ['' + @AuditJobName + ''] STOPPED''
END
SELECT @AuditJobName = MIN(AuditJobName)
FROM #AuditJobList
WHERE AuditJobExists = 1
AND AuditJobName > @AuditJobName
END
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- END - Check and alter status of SQL Audit Jobs
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--select * from #AgentJobList
--select * from #AuditJobList
SET NOCOUNT OFF
',
@database_name=N'master',
@flags=8
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every minute',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=4,
@freq_subday_interval=1,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20191219,
@active_end_date=99991231,
@active_start_time=0,
@active_end_time=235959,
@schedule_uid=N'fdd9f0a2-807d-423a-82b0-c5cb4f30ba92'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO