Cool Tips‎ > ‎Clustering‎ > ‎

Managing SQL Agent Jobs in an Availability Group

Applicability:

                 SQL Server 2000:        N/A
                 SQL Server 2005:        N/A
                 SQL Server 2008:        N/A
                 SQL Server 2008R2:    N/A
                 SQL Server 2012:        Tested
                 SQL Server 2014:        Tested
                 SQL Server 2016:        Tested
                 SQL Server 2017:        Tested

Credits:

Author: ChillyDBA
Date:    7 Jun 2020

Description

Occasionally, as a DBA, you will come across a problem that just causes you to say 'Ummmmmm....!?'

On a recent upgrade project, one of the application databases was due to be upgraded from SQL 2008 to SQL 2016 and placed in an Availability Group.
Simple right?  Nothing I haven't done before.  Right!

Well, this database had 2 additional features implemented
  • Change Data Capture - with SQL Agent jobs to control the service involved
  • SQL Audting - with SQL Agent jobs to harvest changes with an ETL job
Somebody had a real 'lets play with new toys' day when they built this one.

The problem here is that neither of these tools (and SQL Agent) are really cluster aware, which means that the associated SQL Agent jobs must be manually Stopped/Started/Disabled on failover, and I could find no easy way of doing this as part of the failover event.  With legacy SQL Clustering this was simpler as the services/jobs could only run on the active node, but with SQL AGs, all nodes are effectively active, so the services or jobs could run on all nodes without immediate error.  The problems would only surface after a failover and failback cycle, when CDC and SQLAudit data would be stored in 2 places

I therefore constructed a SQL Agent job that runs continuously on each AG node and which will take care of reacting to a failover event having happened.  It's not perfect, but the delay is only seconds, during which the AG failover will mostly still be happening, so the potential loss of data is mitigated.
It can also be extended to manage any other SQL Agent jobs such as ETL or batch processing jobs.

The code below will create the SQL Agent job, but it must be altered to reflect the specific jobs that are in scope before you create it:
  • #AgentJobList table - to manage CDC and ETL/Batch SQL Agent Jobs
    • AgentJobName - name of the SQL Agent job to be managed
    • AgentJobRunContinuous - does this job run continuously but have a schedule to ensure it starts?  1 = Yes, 0 = No
    • AgentJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time
    • AgentJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time

  • #AuditJobList table - To manage SQL Audit SQL Agent Jobs
    • AuditJobName  - name of the SQL Agent job to be managed
    • AuditJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time
    • AuditJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time
  • If there are no SQL Audit jobs, leave the table empty

The SQL Agent job that is created is set to run continuously
The actions performed are basically to enable/disable/start/stop the required jobs depending on AG status (PRIMARY/SECONDARY) so that the jobs will only be active/running on the PRIMARY.

This has been tested with several AGs, but I only have the one type of AG configuration, so it may not be absolutely bullet-proof.

The code prints out confirmation of every action, so the results can be examined using SQLAgent job history dialog.
However, before letting it loose in a PROD environment, the code should be extracted from the only job step and can be run manually on each node to check that it having the desired effect.

Have fun.

Code


USE [msdb] GO /****** Object: Job [AAAA - Set all AAAA SQL Agent Jobs to corect state] Script Date: 12/19/2019 10:09:42 AM ******/ BEGIN TRANSACTION DECLARE @ReturnCode INT SELECT @ReturnCode = 0 /****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 12/19/2019 10:09:42 AM ******/ IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1) BEGIN EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback END DECLARE @jobId BINARY(16) EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'AVAILABILITY GROUP - Set all scheduled Jobs to corect state', @enabled=1, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'This job will examine the AG node state (Primary or Secondary) and ENABLE/DISABLE all CDC, SQL Audit and SQLAgent jobs that are involved in the maitneance of these 2 data sets to the correct state ie. All enabled/running on primary and disabled/stopped on secodnary', @category_name=N'[Uncategorized (Local)]', @owner_login_name=N'sa', @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback /****** Object: Step [Perform chexcks and change job startes] Script Date: 12/19/2019 10:09:42 AM ******/ EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Perform chexcks and change job startes', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'SET NOCOUNT ON IF (SELECT OBJECT_ID(''tempdb..#AgentJobList'') ) IS NOT NULL DROP TABLE #AgentJobList CREATE TABLE #AgentJobList ( AgentJobName SYSNAME ,AgentJobRunContinuous INT -- does this job run continuously but have a schedule to ensure it starts? ,AgentJobExists INT -- a flag that tracks whether the specified SQL Agent job exists - populated at run time ,AgentJobRunning INT -- a flag that tracks whether the specified SQL Agent job is running - populated at run time ) IF (SELECT OBJECT_ID(''tempdb..#AuditJobList'') ) IS NOT NULL DROP TABLE #AuditJobList CREATE TABLE #AuditJobList ( AuditJobName SYSNAME ,AuditJobExists INT ,AuditJobRunning INT ) DECLARE @AgentJobRunning INT ,@AgentJobName SYSNAME ,@IsPrimary INT ,@AgentJobRunContinuous INT ,@AuditJobRunning INT ,@AuditJobName SYSNAME ,@Command NVARCHAR(MAX) -- populate the table that determines the scope of the checks for SQL Agent Jobs -- To exclude this functionality, just comment out the code below that loads the temp table. No other changes are required INSERT #AgentJobList (AgentJobName, AgentJobRunContinuous, AgentJobExists, AgentJobRunning) VALUES (''cdc.AAAA_capture'', 1, 0, 0) ,(''cdc.AAAA_cleanup'', 0, 0, 0) ,(''CHECK THAT AAAA AUDIT IS RUNNING'', 0, 0, 0) ,(''LOAD AAAA AUDIT RECORD'', 0, 0, 0) ,(''LOAD AAAA CDC ARCHIVE RECORDS LSN'', 0, 0, 0) -- populate the table that determines the SQL Audit jobs to check -- SQL Audit jobs are a core DBMS function controlled by an API that is separate from SQLAgent -- To exclude this functionality, just comment out the code below that loads the temp table. No other changes are required INSERT #AuditJobList (AuditJobName, AuditJobExists, AuditJobRunning) VALUES (''AAAA ACCESS AUDIT'', 0, 0) --determine whether the current node is primary --this assumes that there is only one AG on the server IF EXISTS ( SELECT sd.name AS DatabaseName --return the DB name in case this needs to be adapted for multiple AGs ,hars.role_desc FROM sys.databases sd INNER JOIN sys.dm_hadr_availability_replica_states hars ON sd.replica_id = hars.replica_id WHERE sd.replica_id IS NOT NULL AND hars.role_desc = ''PRIMARY'' --AND sd.name = @DatabaseName --filter the DB name in case this needs to be adapted for multiple AGs ) SELECT @IsPrimary = 1 ELSE SELECT @IsPrimary = 0 --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- BEGIN - Check and alter status of SQL Agent Jobs --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --check which agent jobs exist UPDATE jl SET jl.AgentJobExists = CASE WHEN sj.name IS NULL THEN 0 ELSE 1 END FROM #AgentJobList jl LEFT OUTER JOIN msdb..sysjobs sj ON jl.AgentJobName = sj.name --check which agent jobs are running UPDATE jl SET jl.AgentJobRunning = CASE WHEN activity.run_Requested_date IS NOT NULL AND activity.stop_execution_date IS NULL THEN 1 ELSE 0 END --select * FROM #AgentJobList jl LEFT OUTER JOIN msdb.dbo.sysjobs_view job ON job.name = jl.AgentJobName INNER JOIN msdb.dbo.sysjobactivity activity ON job.job_id = activity.job_id INNER JOIN msdb.dbo.syssessions sess ON sess.session_id = activity.session_id INNER JOIN ( SELECT MAX( agent_start_date ) AS max_agent_start_date FROM msdb.dbo.syssessions ) sess_max ON sess.agent_start_date = sess_max.max_agent_start_date WHERE run_requested_date IS NOT NULL AND stop_execution_date IS NULL --now iterate through existing agent jobs and change the state appropriately SELECT @AgentJobName = MIN(AgentJobName) FROM #AgentJobList WHERE AgentJobExists = 1 WHILE @AgentJobName IS NOT NULL BEGIN SELECT @AgentJobRunning = AgentJobRunning ,@AgentJobRunContinuous = AgentJobRunContinuous FROM #AgentJobList WHERE AgentJobName = @AgentJobName IF @IsPrimary = 1 BEGIN IF @AgentJobRunContinuous = 1 BEGIN IF @AgentJobRunning = 1 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1 --no run action PRINT ''Job: ['' + @AgentJobName + ''] ENABLED'' PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change'' END ELSE IF @AgentJobRunning = 0 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1 EXEC msdb..sp_start_job @job_name = @AgentJobName PRINT ''Job: ['' + @AgentJobName + ''] ENABLED'' PRINT ''Job: ['' + @AgentJobName + ''] STARTED'' END END ELSE IF @AgentJobRunContinuous = 0 BEGIN IF @AgentJobRunning = 1 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1 --no run action PRINT ''Job: ['' + @AgentJobName + ''] ENABLED'' PRINT ''Job: ['' + @AgentJobName + ''] STARTED'' END ELSE IF @AgentJobRunning = 0 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 1 --no run action PRINT ''Job: ['' + @AgentJobName + ''] ENABLED'' PRINT ''Job: ['' + @AgentJobName + ''] STARTED'' END END END ELSE IF @IsPrimary = 0 BEGIN IF @AgentJobRunContinuous = 1 BEGIN IF @AgentJobRunning = 1 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0 EXEC msdb..sp_stop_job @job_name = @AgentJobName PRINT ''Job: ['' + @AgentJobName + ''] DISABLED'' PRINT ''Job: ['' + @AgentJobName + ''] STOPPED'' END ELSE IF @AgentJobRunning = 0 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0 --no run action PRINT ''Job: ['' + @AgentJobName + ''] DISABLED'' PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change'' END END ELSE IF @AgentJobRunContinuous = 0 BEGIN IF @AgentJobRunning = 1 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0 EXEC msdb..sp_stop_job @job_name = @AgentJobName PRINT ''Job: ['' + @AgentJobName + ''] DISABLED'' PRINT ''Job: ['' + @AgentJobName + ''] STOPPED'' END ELSE IF @AgentJobRunning = 0 BEGIN EXEC msdb..sp_update_job @job_name = @AgentJobName, @enabled = 0 --no run action PRINT ''Job: ['' + @AgentJobName + ''] DISABLED'' PRINT ''Job: ['' + @AgentJobName + ''] No Run State Change'' END END END SELECT @AgentJobName = MIN(AgentJobName) FROM #AgentJobList WHERE AgentJobExists = 1 AND AgentJobName > @AgentJobName END --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- END - Check and alter status of SQL Agent Jobs --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- BEGIN - Check and alter status of SQL Audit Jobs --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --check which audit jobs exist UPDATE jl SET jl.AuditJobExists = CASE WHEN astat.name IS NULL THEN 0 ELSE 1 END FROM #AuditJobList jl LEFT OUTER JOIN sys.dm_server_audit_status astat ON jl.AuditJobName = astat.name --check which audit jobs are running UPDATE jl SET jl.AuditJobRunning = CASE WHEN astat.status_desc = ''STARTED'' THEN 1 ELSE 0 END FROM #AuditJobList jl LEFT OUTER JOIN sys.dm_server_audit_status astat ON jl.AuditJobName = astat.name --now iterate through existing agent jobs and change the state appropriately SELECT @AuditJobName = MIN(AuditJobName) FROM #AuditJobList WHERE AuditJobExists = 1 WHILE @AuditJobName IS NOT NULL BEGIN SELECT @AuditJobRunning = AuditJobRunning FROM #AuditJobList WHERE AuditJobName = @AuditJobName IF @IsPrimary = 1 IF @AuditJobRunning = 1 BEGIN --no run action PRINT ''Job: ['' + @AuditJobName + ''] STARTED'' END ELSE IF @AuditJobRunning = 0 BEGIN SELECT @Command = ''ALTER SERVER AUDIT '' + @AuditJobName + '' AUDIT WITH (STATE = ON)'' EXEC(@Command) PRINT ''Job: ['' + @AuditJobName + ''] STARTED'' END ELSE IF @IsPrimary = 0 IF @AuditJobRunning = 1 BEGIN SELECT @Command = ''ALTER SERVER AUDIT '' + @AuditJobName + '' AUDIT WITH (STATE = OFF)'' EXEC(@Command) PRINT ''Job: ['' + @AuditJobName + ''] STOPPED'' END ELSE IF @AuditJobRunning = 0 BEGIN --no run action PRINT ''Job: ['' + @AuditJobName + ''] STOPPED'' END SELECT @AuditJobName = MIN(AuditJobName) FROM #AuditJobList WHERE AuditJobExists = 1 AND AuditJobName > @AuditJobName END --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- END - Check and alter status of SQL Audit Jobs --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --select * from #AgentJobList --select * from #AuditJobList SET NOCOUNT OFF ', @database_name=N'master', @flags=8 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every minute', @enabled=1, @freq_type=4, @freq_interval=1, @freq_subday_type=4, @freq_subday_interval=1, @freq_relative_interval=0, @freq_recurrence_factor=0, @active_start_date=20191219, @active_end_date=99991231, @active_start_time=0, @active_end_time=235959, @schedule_uid=N'fdd9f0a2-807d-423a-82b0-c5cb4f30ba92' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback COMMIT TRANSACTION GOTO EndSave QuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION EndSave: GO





Comments