Understanding latency percentiles: p50, p95, p99
Average latency lies. Learn why percentiles matter, how to read them, and what they reveal about your system's real performance.
Michael Torres
Why Your Average Latency Lies to You
Your application's average response time is 50ms. Sounds good, right?
But what if this average is hiding the story? What if while most requests complete in 50ms, 1% of requests take 5 seconds?
If even 1% of your users experience 5-second delays, your application feels broken to them. An average of 50ms is meaningless to a user whose request took 5 seconds.
This is the critical insight behind latency percentiles.
What Are Latency Percentiles?
A percentile tells you: "X% of requests completed in Y milliseconds."
Common Percentiles
p50 (median): 50% of requests complete within this time
- Half your users experience this or better
- Fast requests
p95: 95% of requests complete within this time
- 95% of your users have this experience
- The level most users actually feel
p99: 99% of requests complete within this time
- The worst 1% of users experience this or worse
- The "tail" of the distribution
p99.9: 99.9% of requests complete within this time
- The very worst requests
- Often called "tail latency"
A Real Example
Let's imagine you're monitoring an API endpoint. You get 10,000 requests in a minute.
Request 1: 25ms
Request 2: 28ms
Request 3: 27ms
...
Request 5,000: 52ms ← This is approximately p50 (median)
Request 5,001: 65ms
...
Request 9,500: 85ms ← This is approximately p95
Request 9,501: 120ms
...
Request 9,900: 410ms
Request 9,901: 450ms ← This is approximately p99
Request 9,902: 520ms
...
Request 10,000: 5,200ms
So you might report:
- p50: 52ms - Half of requests finish in 52ms or less
- p95: 85ms - 95% of requests finish in 85ms or less
- p99: 450ms - 99% of requests finish in 450ms or less
Why Percentiles Matter More Than Averages
The Average Can Mislead
Suppose you have response times: 10ms, 20ms, 30ms, 40ms, 5000ms
Average = (10 + 20 + 30 + 40 + 5000) / 5 = 1020ms
Median (p50) = 30ms
The average is pulled way up by that one slow request. But 80% of your users had a great experience (under 40ms).
The average tells a lie. The percentile tells the truth.
Users Experience Percentiles, Not Averages
Your users don't care about average performance. Each user cares only about their experience:
- A user in the p50 cares that their request was fast
- A user in the p95 cares their request was reasonable
- A user in the p99 cares their request was slow
If you only track averages, you're ignoring the worst experiences.
Percentiles Show Distribution
Latency percentiles tell you about the distribution:
Healthy system:
p50: 50ms
p95: 80ms
p99: 120ms
- Tight distribution, predictable performance
Degraded system:
p50: 50ms
p95: 200ms
p99: 2500ms
- Wide spread, unpredictable performance, tail latency issues
The same p50 doesn't mean the same performance.
What Causes Tail Latency (p95, p99)?
Tail latency—those outlier requests that take much longer—comes from:
1. Lock Contention
When multiple requests compete for the same database lock, one wins immediately, others wait.
Request 1: Acquires lock immediately -> 50ms
Request 2: Waits for lock -> 450ms (p99)
2. Garbage Collection
A garbage collection pause stops all request processing:
Requests 1-100: Normal 50ms latency
GC pause for 200ms
Requests 101-150: All delayed by 200ms
Requests 151-200: Back to 50ms
3. Cache Misses
A cache miss requires a database hit:
Cache hit: 5ms
Cache miss: 300ms (p99)
4. Network Timeouts
Occasionally, a downstream service is slow:
Normal call: 100ms
Timeout call: 10,000ms (p99.9)
5. Thundering Herd
When many clients retry simultaneously:
Normal load: 50ms
Retry spike: 2,000ms (p99)
Setting Performance Thresholds
You should monitor and alert on percentiles, not just averages.
Bad Approach
Alert if average latency > 100ms
This misses the tail latency problem that affects your worst users.
Good Approach
Alert if p50 latency > 80ms
Alert if p95 latency > 150ms
Alert if p99 latency > 400ms
This tracks the distribution and alerts when things actually degrade.
Better Approach
Define SLOs by percentile:
p50 latency: < 100ms (fast response)
p95 latency: < 200ms (good response)
p99 latency: < 500ms (acceptable worst case)
Alert if you're trending toward SLO breach
Alert if you breach SLO
Track error budget consumption
Real-World SLO Examples
High-performance API Service
p50: < 50ms
p95: < 100ms
p99: < 200ms
Why these numbers? Customers expect sub-100ms response times. The p99 being 2x p95 is acceptable for a healthy system.
Data Processing Service
p50: < 500ms
p95: < 2000ms
p99: < 5000ms
Processing takes longer, so thresholds are higher. But the ratio (p99 is 10x p95) indicates potential issues.
Interactive Web Application
p50: < 200ms
p95: < 500ms
p99: < 1000ms
Users tolerate longer latencies for complex operations. The wider spread (p99 is 5x p95) is expected.
Actionable Guidance: What's Normal?
p95 to p99 Ratio
Healthy system: p99 is 1.5x - 4x p99
- Tight distribution
- Predictable performance
- Few anomalies
Example: p95=100ms, p99=200ms (ratio: 2x) - Healthy
Warning signs: p99 is 5x+ p95
- Wide tail
- Many slow requests
- Something is causing degradation
Example: p95=100ms, p99=1000ms (ratio: 10x) - Investigate
Expected Ranges by Use Case
| Service Type | p50 | p95 | p99 |
|---|---|---|---|
| API Gateway | 10-30ms | 50-100ms | 100-200ms |
| Web Server | 50-100ms | 150-300ms | 500-1000ms |
| Database Query | 10-50ms | 100-200ms | 500-2000ms |
| External API | 100-500ms | 1000-3000ms | 3000-10000ms |
| Batch Processing | 1-10s | 30-60s | 60-300s |
These are rough guides. Your specific values depend on your architecture.
How to Measure Percentiles
In Your Application Code
const latencies = [];
const startTime = performance.now();
// ... do work ...
const endTime = performance.now();
const latency = endTime - startTime;
latencies.push(latency);
// Sort and calculate percentiles
latencies.sort((a, b) => a - b);
const p50Index = Math.floor(latencies.length * 0.50);
const p95Index = Math.floor(latencies.length * 0.95);
const p99Index = Math.floor(latencies.length * 0.99);
console.log(`p50: ${latencies[p50Index]}ms`);
console.log(`p95: ${latencies[p95Index]}ms`);
console.log(`p99: ${latencies[p99Index]}ms`);
With Monitoring Tools
Most modern monitoring platforms (Datadog, New Relic, Prometheus) calculate percentiles automatically:
up0_api_latency_p50{region="us-east-1"} 52
up0_api_latency_p95{region="us-east-1"} 85
up0_api_latency_p99{region="us-east-1"} 450
Log Analysis
You can also calculate from logs:
# Extract response times and calculate p95
cat access.log | awk '{print $NF}' | sort -n | awk 'NR==int(NR_LINES*0.95)'
Debugging High Tail Latency
When you see high p99 latencies:
Step 1: Narrow Down the Service
Which service has the high latency? Instrument all services to find the culprit.
Step 2: Identify the Pattern
Is it:
- Constant? Something is slow for all requests
- Spiky? Occasional slow requests (GC, lock contention)
- Time-based? Happens at specific times (batch jobs, backups)
- Load-dependent? Worse under high load
Step 3: Investigate the Cause
Based on the pattern:
- Lock contention? Add metrics for lock wait times
- GC? Check GC logs and JVM heap
- Cache misses? Track cache hit rates
- External service? Measure downstream latency
- Capacity? Check CPU, memory, disk
Step 4: Measure the Impact
What percentage of users experience this latency? Is it worth fixing?
p99 = 2000ms affecting 1% of users = potentially worth fixing
p99.999 = 5000ms affecting 0.001% = probably not worth fixing
Step 5: Fix or Accept
Either fix the root cause, or accept it and set your SLO accordingly.
Percentiles in Different Contexts
API Uptime Monitoring
Track response time percentiles across regions:
us-east-1 p95: 85ms
eu-west-1 p95: 140ms
ap-south-1 p95: 280ms
Regional differences are normal. Alert if a region's percentile spikes.
Database Monitoring
Track query latency percentiles:
SELECT query_latency_p95() -> 250ms
UPDATE query_latency_p95() -> 450ms
Write operations are slower. Track separately.
End-to-End User Experience
Real User Monitoring (RUM) tracks what users actually see:
Page load p95: 3200ms (includes network, rendering, etc.)
Always slower than backend latency. This is what users care about.
Key Takeaways
-
Percentiles tell the truth, averages lie
- Track p50, p95, p99 not just average
-
Different percentiles matter for different things
- p50: Is the system generally fast?
- p95: Is the experience good for most users?
- p99: Are edge cases handled well?
-
Set SLOs by percentile
- Define what "good performance" means
- Alert on approaching or breaching SLOs
- Track error budget
-
Investigate high tail latency
- Percentile ratio shows distribution health
- Root cause could be locks, GC, cache, capacity, or network
-
Monitor percentiles across all services and regions
- Backend latencies
- Database latencies
- API calls
- Regional differences
Track your latency percentiles with up0 - Monitor p50, p95, p99 across 20+ regions. Get started free.