Why would an availability group's number of worker threads in an HADR pool increase well beyond minimum usage of "typically, there are 3–10 shared threads" per replica?
In one case we've observed usage of 300+ threads with 3 availability groups and 10 databases total. SQL Server 2014 SP1.
Our leads are backup on secondary replica, high activity on primary replica, reports on secondary replica.
The AGs are in a datacenter on VMware. 16 schedulers total, usual worker threads are under 200 range. max_dop on server is 2.
- 3 AG, 10 DB, 4 replica each - primary, 2 readonly, 1 not readable.
- 1 secondary is synch, 2 async
- 16 vcores on 32 cores physical on large multi host cluster.
- No overprovision.
- Other smaller VMs 4-8 cores are colocated, but they don't press on CPU
We observed a spike in worker threads resulting in denial of service. Attribution of worker threads to AG is our assumption, as only those worker threads can cross the limit.
Below links from the SQL Server Premier Field Engineer Blog read in context don't give a complete answer to me: