Tuning Connection Acquisition Timeout Under Burst Load
This guide is part of Connection Acquisition Timeout Strategies. Burst load exposes a specific failure mode: a traffic spike arrives faster than the pool can recycle connections, the acquisition queue grows, and a wall of requests times out simultaneously even though the database itself is healthy. The objective here is to set the acquisition timeout — connectionTimeout in HikariCP, borrow timeout in proxy layers — so that genuine overload fails fast and sheds load, while transient bursts that would clear within a few hundred milliseconds do not generate false timeouts.
The signature of a mistuned timeout under burst is a clustered exception log like this:
2026-06-20T14:03:11.482Z ERROR HikariPool-1 - Connection is not available, request timed out after 30000ms.
2026-06-20T14:03:11.483Z ERROR HikariPool-1 - Connection is not available, request timed out after 30000ms.
2026-06-20T14:03:11.484Z ERROR HikariPool-1 - Connection is not available, request timed out after 30000ms.
... 412 identical lines within 60ms ...
A 30-second timeout means every queued request held a worker thread hostage for the full 30 seconds before failing. The threads were occupied, upstream load balancers retried, and the burst amplified into a timeout storm. The fix is rarely “raise the timeout” — it is to align the timeout with realistic queue-drain time and let excess load fail quickly.
Rapid incident diagnosis
Pull these signals in order. The goal is to separate genuine pool exhaustion from a momentary borrow queue that a shorter timeout would have absorbed cleanly.
- Read the timeout value in the exception.
timed out after 30000mstells you the configured ceiling. If every failure reports the identical full duration, threads are parking for the maximum — the timeout is too long, not too short. - Correlate active vs. idle at the spike. If
activeequalsmaximumPoolSizeandidleis zero for the entire burst window, the pool is genuinely saturated. Ifidlewas non-zero during the burst, the problem is acquisition contention or a stalledconnectionTimeout, not capacity. - Check the pending/waiting gauge. HikariCP exposes
hikaricp.connections.pending. A pending count that exceeds pool size means the queue is deeper than the pool can serve within the timeout. Compare with Detecting Connection Pool Saturation to confirm the pool — not the database — is the bottleneck. - Inspect database-side wait. Run a quick
pg_stat_activitysnapshot. If backends are mostlyidle in transactionor blocked on locks, the pool drains slowly because connections are not returning — raising the timeout only lengthens the storm. - Look for retry amplification. A burst of N requests with client retries becomes 2N or 3N borrow attempts. Confirm whether the upstream client, gateway, or service mesh is retrying on timeout.
The decision rule: if connections are returning to the pool quickly (low query latency, low transaction hold time) but the queue still times out, the timeout is mismatched to drain rate. If connections are not returning, fix the hold time or pool size first — see Optimizing HikariCP maximumPoolSize for High Concurrency.
Mathematical sizing / parameter formula
The acquisition timeout is a function of how long a queued request must wait for a connection to free up. Model the pool as an M/M/c queue and apply Little’s Law to derive the expected wait.
Define:
P= pool size (maximumPoolSize)S= mean service time per borrow (query execution + transaction hold, in seconds)λ= arrival rate during the burst (borrows per second)Q= expected queue depth at the burst peak
Throughput ceiling of the pool is P / S borrows per second. When λ exceeds that ceiling, the queue grows without bound and no finite timeout helps — you must shed load. When λ is below the ceiling, the expected time a borrow waits before acquiring a connection is approximately:
expected_wait ≈ Q / (P / S) = (Q × S) / P
Set the acquisition timeout to cover the realistic peak queue plus a safety margin, but never longer than the worst latency you are willing to expose to a user:
connectionTimeout ≈ (Q_peak × S) / P × 1.5 (capped at the upstream request budget)
Worked example. A service runs P = 40 connections. Mean borrow service time S = 0.025 s (25 ms: a fast indexed query plus transaction overhead). Pool throughput ceiling is 40 / 0.025 = 1600 borrows/s. A burst pushes λ = 1400 borrows/s — below the ceiling, so the queue is transient. Measured peak queue depth Q_peak = 60 pending borrows.
expected_wait = (60 × 0.025) / 40 = 0.0375 s ≈ 38 ms
connectionTimeout = 38 ms × 1.5 ≈ 57 ms → round to 250 ms for headroom
A 250 ms timeout absorbs this burst with margin, yet fails within a quarter second when the queue is genuinely unbounded. The original 30000 ms value was 120× too large: it converted a sheddable overload into 30-second thread occupancy. If λ had instead been 1800 borrows/s (above the 1600 ceiling), no timeout would rescue the system — that case requires raising P or adding a circuit breaker to reject excess load at the edge.
| Burst regime | Condition | Correct response |
|---|---|---|
| Transient queue | λ < P/S |
Timeout = peak-wait × 1.5; pool absorbs the burst |
| Sustained overload | λ > P/S |
Short timeout + circuit breaker / load shedding |
| Slow drain | high S, returning slowly |
Fix query/transaction hold time, not the timeout |
Exact remediation & configuration
Set a short, drain-aligned acquisition timeout and pair it with a bounded request-handling thread pool and a circuit breaker so excess load is rejected at the edge rather than queued in the connection pool.
HikariCP (Spring Boot application.yml)
spring:
datasource:
hikari:
maximum-pool-size: 40
minimum-idle: 40 # keep the pool warm so a burst hits full capacity instantly
connection-timeout: 250 # fail fast: aligned to peak queue-drain time, not 30s
validation-timeout: 200 # must be < connection-timeout
max-lifetime: 1740000
keepalive-time: 120000
A short connection-timeout only helps if the pool is already warm. Setting minimum-idle equal to maximum-pool-size prevents the pool from lazily growing during the burst, which would otherwise add connection-establishment latency on top of queue wait.
Bound the upstream so the queue cannot grow unbounded (Tomcat/Spring)
server:
tomcat:
threads:
max: 80 # cap concurrent requests near pool capacity, not 10x it
accept-count: 50 # short accept backlog; reject rather than deep-queue
A request thread pool far larger than the connection pool just relocates the queue from the connector to the borrow queue. Keep threads.max within roughly 2× pool size so backpressure reaches the load balancer.
Circuit breaker to shed sustained overload (Resilience4j)
resilience4j:
circuitbreaker:
instances:
dbCalls:
sliding-window-size: 50
failure-rate-threshold: 50
slow-call-duration-threshold: 250ms # treat near-timeout borrows as failures
slow-call-rate-threshold: 60
wait-duration-in-open-state: 5s
When the slow-call rate crosses the threshold, the breaker opens and rejects calls immediately for 5 seconds, draining the borrow queue and giving the pool time to recover instead of feeding the storm.
Apply changes with a rolling restart so each instance drains its existing borrow queue before adopting the new timeout. Because minimum-idle now equals maximum-pool-size, expect a brief connection-establishment burst against the database at startup; stagger instance restarts to avoid a thundering-herd of new connections.
Validation & verification
Confirm the new timeout absorbs transient bursts and fails fast under genuine overload.
Database-side: confirm the pool is the constraint, not the server.
SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = current_database()
AND backend_type = 'client backend'
GROUP BY state;
Active backends should track the HikariCP active gauge during the burst. If active backends stay well below maximumPoolSize while the application reports timeouts, the bottleneck is acquisition orchestration, not database capacity.
Pool-side: watch pending and acquisition timing. Scrape hikaricp.connections.pending, hikaricp.connections.acquire (timer), and hikaricp.connections.timeout (counter). After tuning, the acquire timer p99 should sit comfortably below connection-timeout, and the timeout counter should be near zero during transient bursts.
Load-test assertion. Reproduce the burst with a step-load tool and assert two things:
# Transient burst just below the pool throughput ceiling — expect near-zero timeouts
hey -c 300 -z 30s -m POST https://api.example.com/endpoint
# Overload burst above the ceiling — expect fast rejections, not 30s hangs
hey -c 1500 -z 30s -m POST https://api.example.com/endpoint
In the first run, the timeout counter must stay at zero and p99 latency under the request budget. In the second run, failures must return within ~connection-timeout plus breaker latency (sub-second), never multi-second hangs. A multi-second tail in the overload run means the timeout is still too long or the breaker is not engaging.
Frequently Asked Questions
Should I raise the acquisition timeout to stop burst-load timeouts?
pool_size / service_time), a short timeout aligned to peak queue-drain time absorbs them and a longer one only increases thread occupancy. If bursts are sustained overload, no timeout helps — you must add capacity or shed load with a circuit breaker. Raising the timeout almost always makes a timeout storm worse by holding threads longer.What is a sane default for connectionTimeout under bursty traffic?
(peak_queue_depth × service_time) / pool_size × 1.5, then cap it at your upstream request budget. For most warm pools with fast queries this lands between 200 ms and 2 s. The HikariCP default of 30000 ms is a safety net for cold starts, not a production burst value; it is far too long for fail-fast behavior.How do I stop client retries from amplifying the burst?
Why do all my timeouts report the exact same duration?
connectionTimeout before failing — the queue never drained within the window. Identical durations are the fingerprint of a timeout that is longer than the achievable drain time. Shorten it so requests fail at the realistic wait boundary, and pair it with backpressure so the queue stops growing.Does a bigger request thread pool help absorb bursts?
Related
- Connection Acquisition Timeout Strategies — the parent topic covering timeout semantics across pools and proxies.
- Optimizing HikariCP maximumPoolSize for High Concurrency — size the pool before tuning its acquisition timeout.
- Detecting Connection Pool Saturation — confirm whether a burst is true saturation or transient queueing.
- Configuring Connection Validation Queries for AWS RDS Proxy — sibling guide on validation behavior that affects borrow latency.