Managing SQL Agent Jobs in an Availability Group
SQL Server 2000: N/A
SQL Server 2005: N/A
SQL Server 2008: N/A
SQL Server 2008R2: N/A
SQL Server 2012: Tested
SQL Server 2014: Tested
SQL Server 2016: Tested
SQL Server 2017: Tested
Date: 7 Jun 2020
Occasionally, as a DBA, you will come across a problem that just causes you to say 'Ummmmmm....!?'
On a recent upgrade project, one of the application databases was due to be upgraded from SQL 2008 to SQL 2016 and placed in an Availability Group.
Simple right? Nothing I haven't done before. Right!
Well, this database had 2 additional features implemented
Change Data Capture - with SQL Agent jobs to control the service involved
SQL Audting - with SQL Agent jobs to harvest changes with an ETL job
Somebody had a real 'lets play with new toys' day when they built this one.
The problem here is that neither of these tools (and SQL Agent) are really cluster aware, which means that the associated SQL Agent jobs must be manually Stopped/Started/Disabled on failover, and I could find no easy way of doing this as part of the failover event. With legacy SQL Clustering this was simpler as the services/jobs could only run on the active node, but with SQL AGs, all nodes are effectively active, so the services or jobs could run on all nodes without immediate error. The problems would only surface after a failover and failback cycle, when CDC and SQLAudit data would be stored in 2 places
I therefore constructed a SQL Agent job that runs continuously on each AG node and which will take care of reacting to a failover event having happened. It's not perfect, but the delay is only seconds, during which the AG failover will mostly still be happening, so the potential loss of data is mitigated.
It can also be extended to manage any other SQL Agent jobs such as ETL or batch processing jobs.
The code below will create the SQL Agent job, but it must be altered to reflect the specific jobs that are in scope before you create it:
#AgentJobList table - to manage CDC and ETL/Batch SQL Agent Jobs
AgentJobName - name of the SQL Agent job to be managed
AgentJobRunContinuous - does this job run continuously but have a schedule to ensure it starts? 1 = Yes, 0 = No
AgentJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time
AgentJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time
#AuditJobList table - To manage SQL Audit SQL Agent Jobs
AuditJobName - name of the SQL Agent job to be managed
AuditJobExists - a flag that tracks whether the specified SQL Agent job exists - populated at run time
AuditJobRunning - a flag that tracks whether the specified SQL Agent job is running - populated at run time
If there are no SQL Audit jobs, leave the table empty
The SQL Agent job that is created is set to run continuously
The actions performed are basically to enable/disable/start/stop the required jobs depending on AG status (PRIMARY/SECONDARY) so that the jobs will only be active/running on the PRIMARY.
This has been tested with several AGs, but I only have the one type of AG configuration, so it may not be absolutely bullet-proof.
The code prints out confirmation of every action, so the results can be examined using SQLAgent job history dialog.
However, before letting it loose in a PROD environment, the code should be extracted from the only job step and can be run manually on each node to check that it having the desired effect.