Handling Azure SQL Connection Throttling

This guide is part of Azure SQL Connection Management. It addresses a single, specific failure: your application starts logging throttling errors under load even though the database CPU looks healthy. The exact strings vary by code but read like this:

Error 10928: Resource ID: 1. The request limit for the database is 200 and has been reached.
See 'https://...' for assistance.

Error 10929: Resource ID: 1. The minimum guarantee is 0, maximum limit is 200
and the current usage for the database is 200. However, the server is currently too busy.

Error 40501: The service is currently busy. Retry the request after 10 seconds.

These are not bugs in your code and they are not “the database is down.” They are Azure SQL’s resource governor telling you that the number of concurrent sessions or actively executing requests has hit the ceiling for your service tier. The fix is a combination of right-sizing the pool to the tier’s worker math, adding exponential-backoff retry for the transient cases, and — only when those are exhausted — scaling the tier.

Rapid incident diagnosis

When throttling errors appear, triage in this order. First, read the error code, because each points at a different exhaustion type:

10928 — the session or worker count limit was reached. This is the connection-count problem: too many open or executing connections against the per-database cap.
10929 — the resource governance limit (reserved minimum vs maximum) was reached, often inside an elastic pool where one database starves the others.
40501 — engine-level throttling; the service is busy and is shedding load. Always transient.
40197 / 40143 — the gateway is reconfiguring (a node failover or load-balancing event). Transient and unrelated to your pool size.

Second, distinguish exhaustion (you genuinely have too many connections) from starvation (a few long-running queries are holding workers). Run the live request query immediately:

SELECT COUNT(*) AS open_sessions
FROM sys.dm_exec_sessions WHERE is_user_process = 1;

SELECT session_id, status, command, wait_type, wait_time, cpu_time, total_elapsed_time
FROM sys.dm_exec_requests
WHERE session_id > 50
ORDER BY total_elapsed_time DESC;

If open_sessions is near your tier’s cap, it is pool-size exhaustion. If open_sessions is moderate but a handful of requests have huge total_elapsed_time and are consuming workers, it is starvation by slow queries — fix the queries, not the pool. The third check is the historical view to confirm the trend, covered under Validation below.

A third failure shape is worth ruling out early: proxy-side drop rather than throttle. If the errors are 40197/40143 rather than the 109xx/40501 family, the gateway is reconfiguring (a node failover or scaling event), not the resource governor rejecting you for volume. Proxy drops cluster in time — a burst over a few seconds then silence — whereas genuine throttling correlates with sustained high max_worker_percent. Reading the code first saves you from adding pool capacity to fix a problem that was actually a transient infrastructure event resolved by a single retry. Pull the breakdown of which codes you are seeing straight from the application’s connection logs and bucket them; a flood of 40501 with low 40197 means real load throttling, while the reverse means infrastructure churn.

One more distinction: a connection that errors at acquisition time versus execution time points at different ceilings. An error raised while the pool is trying to open a new physical connection (login phase) usually means the session cap is full. An error raised mid-query on an already-open connection means the worker cap was hit when the request tried to schedule. Note where in the stack trace the SQLException originates before deciding which limit to size against.

Mathematical sizing / parameter formula

The governing constraint is the worker ceiling, because a worker is occupied for the entire duration a request executes. The session ceiling (open connections) is usually far higher and rarely the first wall you hit.

Treat the pool with a Little’s Law adaptation. The number of workers in simultaneous use equals the request arrival rate multiplied by the average request service time:

concurrent_workers = arrival_rate (req/s) × avg_query_time (s)

Worked example. An S3 tier (DTU model) allows 200 concurrent workers. Your service receives 800 requests/second and the average query holds a connection for 0.15 s:

concurrent_workers = 800 × 0.15 = 120 workers in flight

120 is under 200, so the tier can serve it — but only with headroom. If you size the aggregate pool (summed across every application instance) to exactly the worker cap, a single latency spike that pushes avg_query_time from 0.15 s to 0.30 s doubles the in-flight workers to 240 and you immediately throttle with 10928. The safe aggregate ceiling is roughly 80% of the worker cap:

max aggregate pool = 0.8 × worker_cap = 0.8 × 200 = 160
per-instance maximumPoolSize = 160 ÷ instance_count

With 4 instances, set maximumPoolSize to 40 each. The per-database session and worker caps by tier are tabulated in the parent Azure SQL Connection Management guide; always size against the worker number for OLTP and leave the 20% buffer.

The formula also tells you the cheapest lever. Because concurrent_workers is the product of arrival rate and service time, halving avg_query_time halves worker consumption at no extra cost. A query that drops from 0.15 s to 0.075 s through an index or a removed N+1 pattern cuts the in-flight worker count from 120 to 60 — equivalent to buying a tier with double the worker cap, for free. Before scaling the tier or widening the pool, profile the slowest statements with sys.dm_exec_requests joined to sys.dm_exec_sql_text; shaving service time is almost always cheaper than provisioning more capacity, and it lowers tail latency for every request, not just the throttled ones.

A second worked case shows the multi-instance trap. Suppose the same 800 req/s workload runs on a serverless General Purpose database scaled to a 4 vCore maximum (≈420 workers) but with auto-pause active. At the auto-pause floor of 1 vCore the effective worker cap is ≈105, far below the 120 the workload needs. The first request burst after an idle period throttles with 40501 until the platform scales up — which it only does in response to load it has already started rejecting. The remedy is not a bigger pool: it is disabling auto-pause for latency-sensitive workloads, or pre-warming the database with a keep-alive query so it never sits at the floor when traffic arrives.

Exact remediation & configuration

Remediation has three layers, applied in order of cost.

Layer 1 — right-size and fail fast. Cap the aggregate pool below the worker ceiling and use a short acquisition timeout so threads fail fast instead of piling up. In HikariCP:

spring:
  datasource:
    hikari:
      maximum-pool-size: 40        # 160 aggregate ÷ 4 instances, under S3's 200 cap
      connection-timeout: 15000
      max-lifetime: 180000
      keepalive-time: 30000
      validation-timeout: 5000

Layer 2 — exponential backoff retry for transient codes. Errors 40501, 40197, 40143, and 10928/10929 are retryable. Retry with jitter; never retry in a tight loop, which only deepens the throttle.

int[] transient = {10928, 10929, 40197, 40501, 40143};
int maxAttempts = 5;
for (int attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
        return executeQuery(conn, sql);
    } catch (SQLException e) {
        boolean retryable = Arrays.stream(transient).anyMatch(c -> c == e.getErrorCode());
        if (!retryable || attempt == maxAttempts) throw e;
        long backoff = (long) (Math.pow(2, attempt) * 100);   // 200,400,800,1600 ms
        long jitter  = ThreadLocalRandom.current().nextLong(100);
        Thread.sleep(backoff + jitter);
    }
}

The .NET Microsoft.Data.SqlClient driver and EF Core’s EnableRetryOnFailure implement this same backoff against the documented transient code list — prefer the built-in execution strategy over hand-rolled loops where available.

Layer 3 — batch and scale. Reduce worker pressure by batching: combine many small writes into fewer round trips with table-valued parameters or multi-row INSERT, which holds a worker for less aggregate time. Only when the pool is correctly sized and queries are tuned should you scale the tier (S3 → S4, or to a vCore tier), which is a zero-downtime online operation:

az sql db update --resource-group rg-prod --server myserver \
  --name appdb --service-objective S6

Validation & verification

Confirm the fix from the database side, not just from application logs. The resource governor reports utilization in 15-second buckets through sys.dm_db_resource_stats. After applying the pool cap and retry logic, max_worker_percent and max_session_percent should stay below 100 even at peak:

SELECT end_time, avg_cpu_percent, max_worker_percent, max_session_percent
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC;

For a longer window (the last 14 days at one-hour granularity) use the master-database view:

SELECT database_name, start_time, end_time, max_worker_percent, max_session_percent
FROM sys.resource_stats
WHERE database_name = 'appdb'
ORDER BY start_time DESC;

A successful remediation shows max_worker_percent capped well under 100 at peak load, throttling errors disappearing from application logs, and the open-session count (sys.dm_exec_sessions) plateauing at your configured aggregate pool size rather than climbing to the tier ceiling. Pair this with client-side alerting from Detecting Connection Pool Saturation so the next event is caught before it becomes an incident.

Verify under load, not at rest. A pool capped correctly will look fine at idle regardless of whether the fix is right; the test is whether max_worker_percent stays under the buffer during your actual peak. Run a load test that reproduces the original arrival rate and watch all three series together — the pool’s pending-acquisition count, max_worker_percent, and the throttling-error rate. The pass condition is that the pending-acquisition queue may rise (the pool is correctly refusing to exceed its cap) while the database-side error rate stays at zero. That combination proves the back-pressure now lives in the client pool, where you control it with fail-fast timeouts and retry, rather than in the database, where it surfaces as 10928.

Finally, assert the fix in code so a future change cannot silently regress it. A load-test assertion that fails the build if the throttling-error count exceeds zero at the target arrival rate turns this from a one-time incident fix into a permanent guardrail. Capture the aggregate maximumPoolSize across instances as a tracked configuration value and alert if a deploy pushes it above 80% of the current tier’s worker cap.

Frequently Asked Questions

Is error 10928 the same as the database running out of CPU?

No. 10928 means the session or worker count limit for your service tier was reached, which can happen at low CPU if many connections are open but idle-in-transaction or executing cheap queries. Check max_session_percent and max_worker_percent in sys.dm_db_resource_stats — if they are near 100 while avg_cpu_percent is low, it is a count limit, not a compute limit. The fix is pool sizing, not a bigger compute tier.

Should I retry error 10928, or is it fatal?

Retry it. 10928, 10929, 40501, 40197, and 40143 are all on Azure SQL’s documented list of transient, retryable errors. Use exponential backoff with jitter and a cap of around 5 attempts. The only caveat: if you are throttling because your aggregate pool is genuinely oversized, retries alone will not help — they buy time while you cap maximumPoolSize to the tier’s worker ceiling.

How do I find the actual session and worker limit for my tier?

The per-database max concurrent sessions and max concurrent workers values are fixed by service tier and vCore/DTU count and are tabulated in the parent guide. To confirm what you are currently using, query SELECT COUNT(*) FROM sys.dm_exec_sessions WHERE is_user_process = 1 for sessions and watch max_worker_percent in sys.dm_db_resource_stats for the worker headroom.

My pool is sized correctly but I still throttle during deploys. Why?

During a rolling deploy you can briefly run double the instance count, doubling the aggregate pool against one fixed database ceiling. Either reduce per-instance maximumPoolSize to account for the overlap window, use connection draining so old instances release sessions before new ones ramp, or scale the tier temporarily for the deploy window.

Does switching from Proxy to Redirect connection policy help with throttling?

Indirectly. Redirect lowers per-query latency, which shortens how long each request holds a worker, which lowers concurrent_workers for the same arrival rate. It does not raise the tier’s ceiling, but by reducing avg_query_time it gives you more effective headroom under the same cap.

Azure SQL Connection Management — the parent guide on tier limits, connection policy, and gateway mechanics.
Detecting Connection Pool Saturation — client-side gauges and alerts that catch throttling before it errors.
Spring Boot DataSource Configuration — applying the pool cap and retry strategy in a HikariCP DataSource.