Measuring Connection Acquisition Latency Percentiles

This guide is part of Java Connection Pool Benchmarks. It covers how to instrument and interpret the time a thread spends waiting to borrow a connection from the pool — the single most actionable latency signal for a saturated or undersized pool. The objective is precise: capture p50, p95, and p99 of getConnection() wait time, distinguish a healthy pool (sub-millisecond acquisition) from a saturated one (p99 climbing toward the connection timeout), and avoid the classic mistake of averaging away the tail that actually breaks request SLAs.

Connection acquisition latency is invisible to most application dashboards because it hides inside the data-access layer. A request that takes 800ms looks like a slow query, but the query may have run in 4ms after the thread waited 780ms for a free connection. When that happens you will eventually see HikariCP log Connection is not available, request timed out after 30000ms — the terminal symptom of acquisition latency exceeding connectionTimeout. Long before that hard failure, rising p99 acquisition latency is the leading indicator of pool exhaustion. This guide instruments it with Micrometer, reads the hikaricp_connections_acquire_seconds histogram, and separates JMH microbenchmark numbers from production telemetry.

Rapid incident diagnosis

When request latency rises but query execution time is flat, suspect acquisition latency. Triage in this order.

First, check whether the pool is exhausted. HikariCP exposes hikaricp_connections_pending (threads currently blocked in getConnection()) and hikaricp_connections_active (connections in use). When pending > 0 sustained and active == maximumPoolSize, threads are queuing for connections and acquisition latency is the bottleneck — not the database.

Second, read the acquire-time percentiles directly rather than the mean. A mean of 2ms can hide a p99 of 400ms when 1% of borrows block behind a full pool. The tail is what violates SLAs, so always pull p95/p99 from the histogram, never the average.

Third, distinguish acquisition latency from connection creation latency. hikaricp_connections_creation_seconds measures how long opening a brand-new physical connection takes (TCP + TLS + auth). A slow creation time points at the network or database, not pool sizing; a slow acquire time with fast creation points at an undersized pool or a connection leak.

Metric Healthy Saturated / suspect
hikaricp_connections_acquire_seconds p99 < 1ms rising toward connectionTimeout
hikaricp_connections_pending 0 > 0 sustained
hikaricp_connections_active < maximumPoolSize pinned at maximumPoolSize
hikaricp_connections_creation_seconds stable, low spikes → network/DB, not pool
log: request timed out after Nms absent present → acquisition exceeded timeout

The wider context of how pools are benchmarked across implementations is covered in the parent topic, Java Connection Pool Benchmarks, and exporting these metrics to a dashboard is detailed in Exposing HikariCP Metrics with Micrometer and Prometheus.

Acquisition latency distribution and percentiles A latency histogram where the mean sits near the body of the distribution while p95 and p99 mark the long right tail caused by pool saturation. acquisition latency (ms) mean / p50 p95 p99 (the tail that breaks SLAs)
The mean sits in the dense body of the distribution; p95 and p99 capture the saturation tail that averaging hides.

Capturing percentiles with Micrometer

HikariCP publishes a Micrometer Timer named hikaricp.connections.acquire (exported to Prometheus as hikaricp_connections_acquire_seconds). To get true percentiles you must enable client-side distribution statistics — by default Micrometer exports only count and sum, from which p99 cannot be derived.

// Enable histogram buckets + client-side percentiles for the acquire timer
@Bean
MeterFilter hikariAcquireHistogram() {
    return new MeterFilter() {
        @Override
        public DistributionStatisticConfig configure(Meter.Id id,
                DistributionStatisticConfig config) {
            if (id.getName().startsWith("hikaricp.connections.acquire")) {
                return DistributionStatisticConfig.builder()
                    .percentilesHistogram(true)                 // export buckets for server-side quantiles
                    .percentiles(0.5, 0.95, 0.99)               // client-side pre-computed quantiles
                    .serviceLevelObjectives(                    // explicit SLO boundaries
                        Duration.ofMillis(1).toNanos(),
                        Duration.ofMillis(10).toNanos(),
                        Duration.ofMillis(100).toNanos())
                    .build()
                    .merge(config);
            }
            return config;
        }
    };
}

percentilesHistogram(true) emits cumulative histogram buckets so Prometheus can compute quantiles across instances with histogram_quantile(). percentiles(0.5, 0.95, 0.99) adds per-instance pre-computed values, useful for a single JVM but not aggregatable across pods — never average pre-computed percentiles from multiple instances. For a fleet, rely on the histogram buckets and aggregate server-side.

Bind the HikariCP datasource to the registry so the metrics are published:

HikariDataSource ds = new HikariDataSource(config);
ds.setMetricRegistry(meterRegistry);   // Micrometer-backed registry

In Prometheus, the p99 across the whole fleet over a 5-minute window is:

histogram_quantile(0.99,
  sum(rate(hikaricp_connections_acquire_seconds_bucket[5m])) by (le, pool))

Mathematical sizing and the timeout relationship

Acquisition latency is governed by queuing theory. When borrow demand exceeds the number of connections, threads queue, and wait time follows the pool’s utilization. A useful operational bound comes from Little’s Law applied to the pool: the average number of in-use connections equals the borrow arrival rate multiplied by the average hold time.

active_connections = borrow_rate (req/s) × hold_time (s)

Worked example: a service handles 2,000 req/s, each request holds a connection for 5ms (query + result processing). Required concurrent connections:

2000 req/s × 0.005 s = 10 connections in use on average

Average use is 10, but tail behavior demands headroom — set maximumPoolSize above the average to absorb bursts and variance in hold time. If you size at exactly 10, any burst pushes borrows into the wait queue and p99 acquisition latency spikes immediately. A pool sized at 15–20 keeps p99 acquisition near zero for this load. The HikariCP guidance of a small, fixed pool holds: a right-sized pool of 20 beats an oversized pool of 100, because oversizing pushes contention to the database instead of the pool. See Optimizing HikariCP maximumPoolSize for High Concurrency for the full sizing derivation.

The hard ceiling on acquisition latency is connectionTimeout. Any borrow that cannot be satisfied within that window fails with the timeout exception. So the practical target is: keep p99 acquisition latency at least an order of magnitude below connectionTimeout (e.g. p99 < 50ms when connectionTimeout = 30000ms). When p99 approaches the timeout, exhaustion failures are imminent.

Exact remediation and configuration

When p99 acquisition latency is high, the fix is one of: increase maximumPoolSize (if the database can absorb more connections), reduce hold time (shorten transactions, move work outside the connection scope), or shed load. Apply pool sizing changes carefully — oversizing moves the bottleneck to the database.

# application.yml — HikariCP sized for the worked example (2000 req/s, 5ms hold)
spring:
  datasource:
    hikari:
      maximum-pool-size: 20          # > average in-use (10) for burst headroom
      minimum-idle: 20               # keep pool full; avoid cold-acquire latency
      connection-timeout: 10000      # fail fast; tighten from default 30000
      max-lifetime: 1800000          # recycle below DB/proxy idle cutoff
      metric-registry-enabled: true  # ensure Micrometer timers are published

minimum-idle set equal to maximum-pool-size keeps connections warm, eliminating the creation-latency spike that otherwise shows up in the acquire tail when the pool grows on demand. Lowering connection-timeout to 10s makes saturation fail fast and visible rather than silently inflating request latency. Apply changes via rolling restart; HikariCP does not support live resizing of maximumPoolSize without recreating the datasource.

Validation and verification

After tuning, confirm the tail has flattened and the pool is no longer queuing.

Query the live percentiles and pending count from the Prometheus endpoint or JMX:

# p50, p95, p99 acquisition latency, fleet-wide
histogram_quantile(0.50, sum(rate(hikaricp_connections_acquire_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(hikaricp_connections_acquire_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(hikaricp_connections_acquire_seconds_bucket[5m])) by (le))

# threads currently blocked waiting for a connection
max(hikaricp_connections_pending)

For a controlled measurement before deploying, run a JMH microbenchmark of getConnection()/close() against the configured pool. JMH gives clean, isolated borrow-latency numbers without application noise, but remember it measures the uncontended or fixed-contention borrow cost — it cannot reproduce production hold-time variance. Treat JMH as the floor (best-case borrow cost) and production telemetry as the truth (real tail under real load).

@Benchmark
@BenchmarkMode(Mode.SampleTime)            // SampleTime captures the latency distribution, not just mean
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void borrowAndReturn(Blackhole bh) throws SQLException {
    try (Connection c = dataSource.getConnection()) {
        bh.consume(c);
    }
}

Mode.SampleTime is essential — it records a histogram of individual call times so JMH reports p99/p99.9, which Mode.AverageTime discards. Compare the JMH p99 (microseconds, uncontended) against production p99 (often milliseconds, contended); the gap is your queuing penalty.

Validation step Command / query Expected result
Fleet p99 acquire histogram_quantile(0.99, ...acquire_seconds_bucket...) < 10% of connectionTimeout
Pending threads max(hikaricp_connections_pending) 0 sustained
Active vs max hikaricp_connections_active vs maximumPoolSize active < max under peak
JMH borrow floor @BenchmarkMode(SampleTime) p99 sub-millisecond, uncontended
Timeout errors grep log for request timed out after zero occurrences

Frequently Asked Questions

Why is my mean acquisition latency low but requests are still slow?
The mean is dominated by the many fast borrows and hides the slow tail. When the pool occasionally saturates, a small fraction of borrows block for tens or hundreds of milliseconds, which is exactly the p95/p99 region. Those slow borrows land on real user requests and break SLAs. Always alert on p99, not the mean.
What is the difference between hikaricp_connections_acquire_seconds and hikaricp_connections_usage_seconds?
acquire measures time spent waiting to borrow a connection from the pool — the queuing cost. usage measures how long the connection is held before being returned, which is roughly your transaction duration. High acquire with normal usage means the pool is undersized; high usage means transactions are holding connections too long, which in turn inflates acquire for everyone else.
Can I compute fleet-wide p99 by averaging each pod’s reported p99?
No. Averaging pre-computed percentiles is mathematically invalid and routinely understates the real tail. Export histogram buckets (percentilesHistogram(true)) and let Prometheus compute the quantile across all instances with histogram_quantile() over the summed bucket rates.
Should I trust JMH numbers or production telemetry for sizing?
Both, for different purposes. JMH isolates the raw borrow cost and is ideal for comparing pool implementations or validating that the borrow path itself is cheap. Production telemetry captures real hold-time variance and contention, which is what actually determines tail latency. Use JMH as the best-case floor and production p99 as the operational target. Comparisons across pool implementations are covered in Benchmarking Connection Pool Algorithms for Read-Heavy Workloads.
What p99 acquisition latency is acceptable?
For a well-sized pool, p99 should be sub-millisecond — borrowing should be nearly free when an idle connection is available. Anything climbing into double-digit milliseconds indicates the pool is occasionally empty and threads are queuing. Set an alert when p99 exceeds 10% of connectionTimeout; that gives early warning well before hard timeout failures begin.