Diagnosing RDS Proxy Borrow Timeouts
This guide is part of AWS RDS Proxy Connection Pooling. A borrow timeout is the proxy telling you it could not obtain a backend connection within ConnectionBorrowTimeout. The application sees a connection acquisition failure that looks like database slowness but is not — the database may be nearly idle while the proxy’s backend pool sits at its ceiling. The client-side symptom is a stalled acquire followed by an error such as:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not
available, request timed out after 120000ms
or, surfaced directly by the proxy when the backend pool is exhausted:
ERROR: request to borrow a connection from the pool timed out
The wait of roughly 120 seconds is the giveaway: that is the RDS Proxy default ConnectionBorrowTimeout. The proxy held the request, waited for a backend connection to free up, none did, and it failed the borrow. This guide isolates whether the cause is a true ceiling, excessive pinning, or slow-returning transactions, then applies the ceiling math and remediation.
Rapid incident diagnosis
The failure has three candidate causes. Distinguish them with three CloudWatch metrics before changing anything.
-
DatabaseConnectionsBorrowLatency— the time requests wait to borrow a backend connection. A baseline of microseconds spiking into seconds is the direct signature of a borrow timeout. This metric rising is your confirmation; everything else explains why. -
DatabaseConnections— backend connections currently open. Compare it against the ceilingmax_connections × MaxConnectionsPercent ÷ 100. IfDatabaseConnectionsis pinned at that ceiling, the pool is genuinely exhausted. -
DatabaseConnectionsCurrentlySessionPinned— backend connections locked 1:1 by session pinning. If this is high, pinning is consuming the pool and the real fix is to eliminate the state, per Resolving RDS Proxy Session Pinning.
The decision tree:
DatabaseConnections vs ceiling |
...SessionPinned |
Diagnosis |
|---|---|---|
| At ceiling | Low | True undersizing — raise MaxConnectionsPercent or instance max_connections |
| At ceiling | High | Pinning exhaustion — fix the session state, not the ceiling |
| Below ceiling | Low | Slow-returning transactions holding connections too long |
A fourth check rules out the database itself: if ConnectionRequestsBorrowed (the rate of successful borrows) has flatlined while DatabaseConnectionsBorrowLatency climbs, connections are not being returned — long-running transactions or a lock wait on the backend are holding them. Confirm with pg_stat_activity for long state = 'active' or idle in transaction sessions.
Do not be misled by client-side metrics in isolation. A HikariCP pool reporting high PendingThreads and connectionTimeout exceptions points at the proxy as the bottleneck, but the same client symptom appears whether the cause is the proxy ceiling, pinning, or a slow backend query. The proxy-side metrics above are authoritative; the client metrics only tell you the wait is happening upstream of the application. Correlate the two: a client timeout that lines up with a DatabaseConnectionsBorrowLatency spike confirms the proxy is the wait point, while a client timeout with flat proxy borrow latency points back into the client pool itself — the distinction drawn in the HikariCP Configuration Deep Dive.
One more distinction matters for triage: borrow timeouts are not acquisition timeouts on the client pool. The client connection-timeout governs how long the application waits for a slot in its own pool; ConnectionBorrowTimeout governs how long the proxy waits for a backend connection. When both fire, the shorter one wins and produces the error you see. If the client timeout is the default 30 s but waits chain up to the proxy’s 120 s, the error message and stack frame tell you which layer gave up first.
Mathematical sizing / ceiling formula
The backend pool ceiling is fixed and computable:
backend_ceiling = floor(instance_max_connections × MaxConnectionsPercent / 100)
For a db.r6g.large with max_connections = 1365 (PostgreSQL derives max_connections from instance memory) and MaxConnectionsPercent = 90:
backend_ceiling = floor(1365 × 90 / 100) = 1228
Now apply Little’s Law to find whether that ceiling can sustain the offered load. Let:
λ= transaction arrival rate (transactions/second)T= mean transaction hold time on a backend connection (seconds), including query execution and the borrow round trip
The number of backend connections the workload needs concurrently is:
required = λ × T
Worked example: 4,000 transactions/second, each holding a backend connection for 8 ms:
required = 4000 × 0.008 = 32 backend connections
That fits comfortably under the 1228 ceiling — so if you are seeing borrow timeouts at this load, the cause is not the ceiling; it is pinning or long-held transactions inflating T. Conversely, if T balloons to 400 ms (a slow query or lock wait):
required = 4000 × 0.4 = 1600 > 1228 → borrow timeouts
The lever is almost always T, not the arrival rate. Halving query latency halves required connections. This is the same Little’s Law sizing applied to client pools in Optimizing HikariCP maximumPoolSize for High Concurrency; the only difference is that the proxy ceiling is expressed as a percentage of the instance limit rather than an absolute.
Also subtract non-proxy consumers. If admin tools, replication, and direct app connections hold 150 backend slots, your effective ceiling is 1228 − 150 = 1078. Sizing MaxConnectionsPercent to 100 on a shared instance invites FATAL: remaining connection slots are reserved.
Exact remediation & configuration
Match the remediation to the diagnosis from the decision tree.
If genuinely undersized (at ceiling, low pinning): raise MaxConnectionsPercent, or if already near 100 on a dedicated instance, scale the instance class to raise max_connections.
connection_pool_config {
max_connections_percent = 95
max_idle_connections_percent = 30
connection_borrow_timeout = 120
}
max_idle_connections_percent = 30 keeps more backend connections warm so bursts do not pay cold-connection latency on top of the borrow.
If pinning exhaustion (at ceiling, high pinning): do not raise the ceiling — fix the state. Move SET to SET LOCAL or role defaults and disable named prepared statements, then rotate connections. Full procedure in Resolving RDS Proxy Session Pinning.
If slow-returning transactions (below ceiling): the proxy is fine; the backend is holding connections. Add a backend statement_timeout and find the long transactions.
ALTER ROLE app_user SET statement_timeout = '10s';
ALTER ROLE app_user SET idle_in_transaction_session_timeout = '15s';
idle_in_transaction_session_timeout reaps sessions that opened a transaction and stalled, which would otherwise hold a backend connection indefinitely.
Tune the client pool to fail faster than the proxy so you get a clean error instead of a 120-second hang. Set the client connection-timeout (HikariCP) or connectionTimeoutMillis (node-postgres) to a few seconds — well under ConnectionBorrowTimeout:
spring:
datasource:
hikari:
connection-timeout: 5000
maximum-pool-size: 15
A 5-second client timeout fails fast and lets a circuit breaker or retry kick in, rather than blocking an application thread for two minutes. The trade-offs of acquisition timeout under bursty load are covered in Tuning Connection Acquisition Timeout Under Burst Load.
Apply all of these without downtime: proxy pool config and role defaults take effect on the next borrow/connection, so a rolling restart (or natural max-lifetime rotation) propagates them.
Validation & verification
Confirm DatabaseConnectionsBorrowLatency falls back to its microsecond baseline and ConnectionRequestsBorrowed resumes its normal rate.
On the database, verify backend connections are well under the computed ceiling and not stuck:
SELECT state, count(*)
FROM pg_stat_activity
WHERE usename = 'app_secrets_user'
GROUP BY state;
Healthy output shows most connections idle (returned to the proxy pool) and few active. A pile of idle in transaction rows means the slow-transaction remediation has not taken hold.
Run a synthetic load test at peak arrival rate and assert borrow latency stays flat:
pgbench -c 200 -j 8 -T 120 -h app-proxy.proxy-abc123.us-east-1.rds.amazonaws.com app
During the run, DatabaseConnections should plateau below the ceiling and DatabaseConnectionsBorrowLatency should remain in the sub-millisecond range. Any climb toward ConnectionBorrowTimeout means the workload still exceeds λ × T capacity.
Frequently Asked Questions
Why is the timeout almost exactly 120 seconds?
ConnectionBorrowTimeout on RDS Proxy. The proxy holds the borrow request for that long waiting for a backend connection to free up, then fails it. Lower the value if you would rather fail fast, but the better fix is usually to relieve whatever is exhausting the pool.Is a borrow timeout the same as the database being out of connections?
max_connections × MaxConnectionsPercent. You can hit a borrow timeout while the database still has free slots, because the proxy refuses to exceed its configured percentage. Check DatabaseConnections against the computed ceiling to tell which limit you hit.Should I raise MaxConnectionsPercent to fix borrow timeouts?
FATAL: remaining connection slots are reserved without fixing the root cause.Why do borrow timeouts appear under load but not in steady state?
λ × T. A latency spike that increases T (a slow query, a lock wait, or a flood of pinned sessions) multiplies demand at the same arrival rate, pushing it past the ceiling only during the spike. Sustained borrow latency that tracks query latency confirms T is the lever.Can the client pool mask or worsen borrow timeouts?
connection-timeout (the HikariCP default is 30 s, but waits can stack to the proxy’s 120 s) makes the application hang instead of failing cleanly. Set the client timeout well below ConnectionBorrowTimeout so the application fails fast and a circuit breaker can shed load.Related
- AWS RDS Proxy Connection Pooling — the parent guide on multiplexing, sizing, and the backend ceiling.
- Resolving RDS Proxy Session Pinning — when pinning is what exhausts the borrow pool.
- Optimizing HikariCP maximumPoolSize for High Concurrency — the same Little’s Law sizing applied to the client pool.
- Tuning Connection Acquisition Timeout Under Burst Load — client-side acquisition timeout trade-offs.