Quick Start¶
This guide will help you get started with Fast LiteLLM in just a few minutes.
Basic Usage¶
The simplest way to use Fast LiteLLM is to import it before importing LiteLLM:
import fast_litellm # Must be imported first!
import litellm
# Now all LiteLLM operations use Rust acceleration
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
Important
Always import fast_litellm before litellm. This allows Fast LiteLLM to apply its acceleration patches.
Check Acceleration Status¶
Verify that Rust acceleration is active:
import fast_litellm
# Check if Rust acceleration is available
print(f"Rust acceleration: {fast_litellm.RUST_ACCELERATION_AVAILABLE}")
# Get health status
health = fast_litellm.health_check()
print(f"Status: {health['status']}")
print(f"Components: {', '.join(health['components'])}")
Feature Status¶
Check which features are enabled:
import fast_litellm
features = fast_litellm.get_feature_status()
for name, status in features.items():
enabled = "enabled" if status.get("enabled") else "disabled"
print(f" {name}: {enabled}")
Token Counting¶
Fast LiteLLM accelerates token counting operations:
import fast_litellm
import litellm
# Encode text to tokens
text = "Hello, world! This is a test of Fast LiteLLM."
tokens = litellm.encode(model="gpt-3.5-turbo", text=text)
print(f"Token count: {len(tokens)}")
# Count tokens in messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
]
count = litellm.token_counter(model="gpt-3.5-turbo", messages=messages)
print(f"Message tokens: {count}")
Using the Connection Pool¶
Access the accelerated connection pool directly:
from fast_litellm import SimpleConnectionPool
pool = SimpleConnectionPool()
# Get a connection
conn_id = pool.get_connection("https://api.openai.com")
if conn_id:
print(f"Got connection: {conn_id}")
# Use the connection...
# Return it to the pool
pool.return_connection(conn_id)
# Get pool statistics
stats = pool.get_stats()
print(f"Pool stats: {stats}")
Using the Rate Limiter¶
Control request rates with the accelerated rate limiter:
from fast_litellm import SimpleRateLimiter
# Create a rate limiter (60 requests per minute)
limiter = SimpleRateLimiter(requests_per_minute=60)
# Check if request is allowed
result = limiter.check("api_key_123")
if result["allowed"]:
# Proceed with the request
print("Request allowed!")
else:
print(f"Rate limited. Retry after {result.get('retry_after_ms', 0)}ms")
# Simple boolean check
if limiter.is_allowed("api_key_123"):
# Make request
pass
Performance Monitoring¶
Monitor the performance of accelerated operations:
import fast_litellm
# Get performance statistics
stats = fast_litellm.get_performance_stats()
for key, value in stats.items():
print(f"{key}: {value}")
# Get optimization recommendations
recommendations = fast_litellm.get_recommendations()
for rec in recommendations:
print(f"Recommendation: {rec}")
Complete Example¶
Here's a complete example showing all the features together:
#!/usr/bin/env python3
"""Complete Fast LiteLLM example."""
import fast_litellm
import litellm
def main():
# 1. Check acceleration status
print("=== Acceleration Status ===")
print(f"Rust available: {fast_litellm.RUST_ACCELERATION_AVAILABLE}")
health = fast_litellm.health_check()
print(f"Status: {health['status']}")
print(f"Components: {', '.join(health['components'])}")
print()
# 2. Feature status
print("=== Feature Status ===")
features = fast_litellm.get_feature_status()
for name, status in features.items():
enabled = "ON" if status.get("enabled") else "OFF"
print(f" [{enabled}] {name}")
print()
# 3. Token counting
print("=== Token Counting ===")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."},
]
token_count = litellm.token_counter(model="gpt-3.5-turbo", messages=messages)
print(f"Message token count: {token_count}")
print()
# 4. Performance stats
print("=== Performance Stats ===")
stats = fast_litellm.get_performance_stats()
if stats:
for key, value in list(stats.items())[:5]:
print(f" {key}: {value}")
else:
print(" No stats collected yet")
print()
print("Done!")
if __name__ == "__main__":
main()
Next Steps¶
- Features Overview - Learn about all accelerated components
- Configuration Guide - Configure Fast LiteLLM behavior
- Performance Tuning - Optimize for your use case