Reliability Patterns¶
FastAgentic provides lightweight reliability patterns for agent execution. These are intentionally simple—for advanced resilience, integrate with battle-tested libraries like Tenacity, CircuitBreaker, or use an LLM gateway like Portkey.
Philosophy¶
| Pattern | FastAgentic Builds | Consider Instead |
|---|---|---|
| Retry | Simple exponential backoff | Tenacity (complex policies) |
| Circuit Breaker | Basic state machine | pybreaker, CircuitBreaker lib |
| Timeout | Per-endpoint limits | Handled by async frameworks |
| Fallback | Model chain configuration | Portkey, LiteLLM (advanced routing) |
| Rate Limiting | Simple RPM/TPM counter | Redis-based, API gateways |
FastAgentic's built-in patterns cover 80% of use cases. For the other 20%, plug in specialized tools.
Retry Policy¶
Automatically retry failed LLM calls and tool executions.
Configuration¶
from fastagentic import agent_endpoint
from fastagentic.reliability import RetryPolicy
@agent_endpoint(
path="/analyze",
runnable=...,
retry=RetryPolicy(
max_attempts=3,
backoff="exponential", # or "fixed", "linear"
initial_delay_ms=1000, # 1 second
max_delay_ms=30000, # 30 seconds cap
jitter=True, # Add randomness to prevent thundering herd
retry_on=[
"rate_limit", # Retry on rate limit errors
"timeout", # Retry on timeouts
"server_error", # Retry on 5xx errors
],
# Don't retry these
no_retry_on=[
"invalid_input", # Client errors
"auth_error", # Auth failures
],
),
)
async def analyze(data: DataInput) -> AnalysisResult:
...
Backoff Strategies¶
| Strategy | Formula | Use Case |
|---|---|---|
fixed | delay = initial_delay | Predictable retry timing |
linear | delay = initial_delay * attempt | Gradual increase |
exponential | delay = initial_delay * 2^attempt | API rate limits (recommended) |
Jitter¶
Jitter adds randomness to prevent multiple clients retrying simultaneously:
# Without jitter: All clients retry at exactly 1s, 2s, 4s...
# With jitter: Client A at 1.2s, Client B at 0.9s, Client C at 1.1s...
RetryPolicy(
backoff="exponential",
jitter=True, # ±25% randomness
jitter_factor=0.25, # Customize jitter range
)
Retry Context in Hooks¶
from fastagentic.hooks import hook, HookContext
@hook("on_retry")
async def log_retry(ctx: HookContext):
print(f"Retry attempt {ctx.retry_count}/{ctx.max_retries}")
print(f"Error: {ctx.error}")
print(f"Next delay: {ctx.next_delay_ms}ms")
# Optionally modify retry behavior
if ctx.retry_count > 2:
# Switch to cheaper model on repeated failures
ctx.metadata["model_override"] = "gpt-3.5-turbo"
Retry Metrics¶
FastAgentic exposes retry metrics:
fastagentic_retries_total{endpoint="/analyze", attempt="1"} 150
fastagentic_retries_total{endpoint="/analyze", attempt="2"} 23
fastagentic_retries_total{endpoint="/analyze", attempt="3"} 5
fastagentic_retry_exhausted_total{endpoint="/analyze"} 2
Circuit Breaker¶
Prevent cascading failures by stopping calls to failing services.
Configuration¶
from fastagentic import agent_endpoint
from fastagentic.reliability import CircuitBreaker
@agent_endpoint(
path="/external-api",
runnable=...,
circuit_breaker=CircuitBreaker(
# Failure threshold to open circuit
failure_threshold=5, # Open after 5 failures
failure_window_ms=60000, # Within 60 seconds
# Recovery
reset_timeout_ms=30000, # Try again after 30 seconds
half_open_requests=2, # Allow 2 test requests in half-open
# Success threshold to close
success_threshold=3, # Close after 3 successes in half-open
# What counts as failure
failure_on=[
"timeout",
"server_error",
"connection_error",
],
),
)
async def call_external(request: Request) -> Response:
...
Circuit States¶
┌──────────────────────────────────────────────────┐
│ │
▼ │
┌─────────┐ failure_threshold ┌─────────┐ │
│ CLOSED │ ──────────────────► │ OPEN │ │
└────┬────┘ └────┬────┘ │
│ │ │
│ success │ reset_timeout │
│ ▼ │
│ ┌───────────┐ │
│ │ HALF-OPEN │ │
│ └─────┬─────┘ │
│ │ │
│ success_threshold │ failure │
└───────────────────────────────┴───────────────────┘
| State | Behavior |
|---|---|
| CLOSED | Normal operation, requests pass through |
| OPEN | Requests fail immediately, no calls made |
| HALF-OPEN | Limited test requests allowed |
Circuit Breaker Response¶
When circuit is open:
from fastagentic.reliability import CircuitOpenError
try:
result = await call_external(request)
except CircuitOpenError as e:
# Circuit is open, fail fast
print(f"Circuit open since: {e.opened_at}")
print(f"Will retry at: {e.retry_after}")
# Return cached/fallback response
return cached_response
Per-Dependency Circuits¶
from fastagentic.reliability import CircuitBreakerRegistry
# Shared circuit breaker registry
circuits = CircuitBreakerRegistry()
@agent_endpoint(
path="/multi-api",
runnable=...,
circuit_breakers={
"openai": CircuitBreaker(failure_threshold=5),
"anthropic": CircuitBreaker(failure_threshold=3),
"database": CircuitBreaker(failure_threshold=10),
},
)
async def multi_call(request: Request) -> Response:
...
Circuit Metrics¶
fastagentic_circuit_state{endpoint="/external-api", state="closed"} 1
fastagentic_circuit_state{endpoint="/external-api", state="open"} 0
fastagentic_circuit_failures_total{endpoint="/external-api"} 127
fastagentic_circuit_opens_total{endpoint="/external-api"} 3
Timeout¶
Enforce time limits on agent execution.
Configuration¶
from fastagentic import agent_endpoint
from fastagentic.reliability import Timeout
@agent_endpoint(
path="/analyze",
runnable=...,
timeout=Timeout(
total_ms=300000, # 5 minutes total
llm_call_ms=60000, # 1 minute per LLM call
tool_call_ms=30000, # 30 seconds per tool
checkpoint_ms=5000, # 5 seconds for checkpointing
),
)
async def analyze(data: DataInput) -> AnalysisResult:
...
Timeout Levels¶
| Level | Scope | Default |
|---|---|---|
total_ms | Entire request | 120000 (2 min) |
llm_call_ms | Single LLM API call | 60000 (1 min) |
tool_call_ms | Single tool execution | 30000 (30 sec) |
checkpoint_ms | Checkpoint save/load | 5000 (5 sec) |
Timeout Behavior¶
from fastagentic.reliability import TimeoutError
try:
result = await analyze(data)
except TimeoutError as e:
print(f"Timeout after {e.elapsed_ms}ms")
print(f"Stage: {e.stage}") # "llm_call", "tool_call", "total"
# If durable, can resume later
if e.run_id:
print(f"Resume with: POST /runs/{e.run_id}/resume")
Graceful Timeout with Checkpoints¶
@agent_endpoint(
path="/long-task",
runnable=...,
durable=True,
timeout=Timeout(
total_ms=60000,
# On timeout, save checkpoint instead of failing
on_timeout="checkpoint", # or "fail", "warn"
),
)
async def long_task(input: Input) -> Output:
...
Fallback Chains¶
Automatically fall back to alternative models or strategies.
Model Fallback¶
from fastagentic import App
from fastagentic.reliability import FallbackChain
app = App(
title="My Agent",
# Global fallback chain
model_fallback=FallbackChain(
primary="gpt-4o",
fallbacks=[
{"model": "gpt-4o-mini", "on": ["rate_limit", "timeout"]},
{"model": "gpt-3.5-turbo", "on": ["rate_limit", "timeout", "server_error"]},
],
# Cost tracking across fallbacks
track_fallback_costs=True,
),
)
Per-Endpoint Fallback¶
@agent_endpoint(
path="/critical",
runnable=...,
fallback=FallbackChain(
primary="claude-3-opus",
fallbacks=[
{"model": "claude-3-sonnet", "on": ["rate_limit"]},
{"model": "gpt-4o", "on": ["rate_limit", "timeout"]}, # Cross-provider
],
),
)
async def critical_task(input: Input) -> Output:
...
Strategy Fallback¶
from fastagentic.reliability import StrategyFallback
@agent_endpoint(
path="/flexible",
runnable=...,
fallback=StrategyFallback(
strategies=[
# Try full agent first
{"runnable": full_agent, "timeout_ms": 30000},
# Fall back to simpler chain
{"runnable": simple_chain, "timeout_ms": 10000},
# Last resort: cached response
{"runnable": cached_lookup, "timeout_ms": 1000},
],
),
)
async def flexible_task(input: Input) -> Output:
...
Fallback Metrics¶
fastagentic_fallback_triggered_total{endpoint="/critical", from="claude-3-opus", to="gpt-4o"} 15
fastagentic_fallback_success_total{endpoint="/critical", model="gpt-4o"} 14
fastagentic_fallback_exhausted_total{endpoint="/critical"} 1
Rate Limiting¶
Simple rate limiting for agent endpoints.
Configuration¶
from fastagentic import App
from fastagentic.reliability import RateLimiter
app = App(
title="My Agent",
rate_limiter=RateLimiter(
# Global limits
requests_per_minute=100,
tokens_per_minute=100000,
# Per-user limits (if authenticated)
per_user_rpm=10,
per_user_tpm=10000,
# Per-tenant limits
per_tenant_rpm=50,
per_tenant_tpm=50000,
# Behavior when limited
on_limit="reject", # or "queue", "delay"
),
)
Per-Endpoint Limits¶
@agent_endpoint(
path="/expensive",
runnable=...,
rate_limit={
"requests_per_minute": 10,
"tokens_per_minute": 50000,
},
)
async def expensive_task(input: Input) -> Output:
...
Rate Limit Response¶
from fastagentic.reliability import RateLimitError
try:
result = await expensive_task(input)
except RateLimitError as e:
print(f"Rate limited: {e.limit_type}") # "rpm", "tpm"
print(f"Retry after: {e.retry_after_ms}ms")
print(f"Current usage: {e.current}/{e.limit}")
Rate Limit Headers¶
FastAgentic returns standard rate limit headers:
Combining Patterns¶
Patterns can be combined:
@agent_endpoint(
path="/resilient",
runnable=...,
# Retry on transient failures
retry=RetryPolicy(
max_attempts=3,
backoff="exponential",
retry_on=["rate_limit", "timeout"],
),
# Circuit breaker for downstream protection
circuit_breaker=CircuitBreaker(
failure_threshold=5,
reset_timeout_ms=30000,
),
# Timeout to prevent runaway execution
timeout=Timeout(
total_ms=120000,
llm_call_ms=30000,
),
# Fallback when primary fails
fallback=FallbackChain(
primary="gpt-4o",
fallbacks=[{"model": "gpt-4o-mini", "on": ["rate_limit"]}],
),
# Rate limiting
rate_limit={"requests_per_minute": 50},
)
async def resilient_task(input: Input) -> Output:
...
Execution Order¶
Request
│
▼
┌─────────────┐
│ Rate Limit │ ─── Reject if over limit
└──────┬──────┘
│
▼
┌─────────────┐
│ Timeout │ ─── Start timer
└──────┬──────┘
│
▼
┌─────────────┐
│Circuit Break│ ─── Fail fast if open
└──────┬──────┘
│
▼
┌─────────────┐
│ Execute │ ─── Run agent
└──────┬──────┘
│
┌───┴───┐
│Success│
└───┬───┘
│
▼
Response
│Failure│
└───┬───┘
│
▼
┌─────────────┐
│ Retry? │ ─── If retries remain
└──────┬──────┘
│
▼
┌─────────────┐
│ Fallback? │ ─── If fallback configured
└──────┬──────┘
│
▼
Error Response
When to Use External Tools¶
FastAgentic's built-in patterns are intentionally simple. Use external tools when you need:
| Need | Use |
|---|---|
| Complex retry policies (fibonacci, custom) | Tenacity |
| Distributed circuit breakers | pybreaker + Redis |
| Advanced rate limiting | Redis + limits |
| LLM-specific routing | Portkey, LiteLLM |
| Full observability | Langfuse, Datadog |
Portkey Integration Example¶
from fastagentic import App
from fastagentic.integrations.portkey import PortkeyGateway
app = App(
title="My Agent",
# Use Portkey for advanced LLM routing
llm_gateway=PortkeyGateway(
api_key="...",
# Portkey handles: retries, fallbacks, caching, load balancing
config={
"retry": {"attempts": 3, "on_status_codes": [429, 500, 502, 503]},
"cache": {"mode": "semantic", "max_age": 3600},
"loadbalance": {"strategy": "round-robin"},
},
),
)
Next Steps¶
- Hooks Architecture — Intercept retry/circuit events
- Integrations — Portkey, LiteLLM setup
- Observability — Monitor reliability metrics