Connection Pooling¶

Fast LiteLLM provides a high-performance connection pool using Rust's DashMap for lock-free concurrent access. This achieves 3.2x faster connection management compared to traditional Python implementations.

Overview¶

The connection pool manages HTTP connections to API endpoints, reducing the overhead of establishing new connections for each request.

Key Features¶

Lock-free concurrent access using DashMap
Automatic health checking for connections
Idle connection cleanup
Per-endpoint connection limits

Performance¶

Metric	Python	Rust	Improvement
Single-threaded	0.136ms	0.042ms	3.2x faster
Multi-threaded (8 threads)	0.016ms	0.013ms	1.2x faster

Basic Usage¶

Automatic Acceleration¶

When you import fast_litellm, connection pooling is automatically accelerated:

import fast_litellm
import litellm

# Connection pooling is now accelerated
response = litellm.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Direct API Access¶

You can also use the connection pool directly:

from fast_litellm import SimpleConnectionPool

# Create a pool
pool = SimpleConnectionPool(pool_name="my_pool")

# Get a connection to an endpoint
conn_id = pool.get_connection("https://api.openai.com")

if conn_id:
    try:
        # Use the connection for your request...
        pass
    finally:
        # Return connection to pool when done
        pool.return_connection(conn_id)

API Reference¶

SimpleConnectionPool¶

class SimpleConnectionPool:
    def __init__(self, pool_name: str = "default") -> None:
        """Create a new connection pool."""

    def get_connection(self, endpoint: str) -> Optional[str]:
        """Get a connection ID for the specified endpoint."""

    def return_connection(self, connection_id: str) -> None:
        """Return a connection to the pool."""

    def health_check(self, connection_id: str) -> bool:
        """Check if a connection is healthy."""

    def cleanup(self) -> None:
        """Clean up expired/idle connections."""

    def get_stats(self) -> Dict[str, Any]:
        """Get pool statistics."""

Standalone Functions¶

# Get a connection
conn_id = fast_litellm.get_connection("https://api.openai.com")

# Return a connection
fast_litellm.return_connection(conn_id)

# Remove a connection
fast_litellm.remove_connection(conn_id)

# Health check
is_healthy = fast_litellm.health_check_connection(conn_id)

# Cleanup expired connections
fast_litellm.cleanup_expired_connections()

# Get statistics
stats = fast_litellm.get_connection_pool_stats()

Statistics¶

Monitor your connection pool:

from fast_litellm import SimpleConnectionPool

pool = SimpleConnectionPool()
stats = pool.get_stats()

print(f"Total connections: {stats.get('total_connections', 0)}")
print(f"Active connections: {stats.get('active_connections', 0)}")
print(f"Idle connections: {stats.get('idle_connections', 0)}")

Best Practices¶

1. Always Return Connections¶

Always return connections to the pool when done:

conn_id = pool.get_connection(endpoint)
try:
    # Use connection
    pass
finally:
    pool.return_connection(conn_id)

2. Periodic Cleanup¶

For long-running applications, periodically clean up idle connections:

import threading
import time

def cleanup_loop():
    while True:
        fast_litellm.cleanup_expired_connections()
        time.sleep(60)  # Clean up every minute

cleanup_thread = threading.Thread(target=cleanup_loop, daemon=True)
cleanup_thread.start()

3. Monitor Pool Health¶

Regularly check pool statistics to ensure optimal performance:

stats = fast_litellm.get_connection_pool_stats()
if stats.get('active_connections', 0) > 100:
    print("Warning: High number of active connections")

How It Works¶

The Rust implementation uses DashMap, a concurrent hash map that provides:

Lock-free reads - Multiple threads can read simultaneously
Fine-grained locking - Writes only lock specific buckets
Atomic operations - Thread-safe without global locks

// Simplified implementation
use dashmap::DashMap;

struct ConnectionPool {
    connections: DashMap<String, Connection>,
}

impl ConnectionPool {
    fn get_connection(&self, endpoint: &str) -> Option<String> {
        // Lock-free lookup
        self.connections.get(endpoint).map(|c| c.id.clone())
    }
}

This architecture provides significant performance improvements under concurrent workloads compared to Python's threading.Lock-based approach.

Next Steps¶

Rate Limiting - Learn about atomic rate limiting
Performance Tuning - Optimize for your workload