Performance¶
rpytest is designed for speed at every layer. This document explains the performance optimizations and how to get the best results.
Performance Gains¶
Where Time is Saved¶
| Phase | pytest | rpytest | Savings |
|---|---|---|---|
| CLI startup | ~200ms | <10ms | 95% |
| Test collection | ~500ms | ~50ms | 90% |
| Per-test overhead | ~10ms | ~2ms | 80% |
| Result aggregation | ~50ms | ~5ms | 90% |
Real-World Impact¶
For a 500-test suite:
pytest: 500ms startup + 500ms collect + 500*10ms overhead = 6.0s overhead
rpytest: 10ms startup + 50ms collect + 500*2ms overhead = 1.06s overhead
Savings: ~5 seconds (83% less overhead)
Optimization Techniques¶
1. Rust CLI¶
The command-line interface is written in Rust for instant startup:
$ time rpytest --help
real 0m0.008s # 8 milliseconds
$ time pytest --help
real 0m0.234s # 234 milliseconds
Why it matters: Every invocation pays startup cost. In watch mode with frequent re-runs, this adds up.
2. Native Collection¶
AST-based test discovery without imports:
# pytest: imports every file to collect
import tests.test_api # executes module-level code
import tests.test_db # loads all dependencies
import tests.test_utils # slow!
# rpytest: parses AST without execution
ast.parse(open("tests/test_api.py").read()) # fast!
Benchmark:
3. Daemon Model¶
Persistent daemon avoids repeated Python startup:
Cold start (no daemon):
rpytest tests/ → start daemon (100ms) → run tests
Warm start (daemon running):
rpytest tests/ → connect (<1ms) → run tests
4. Worker Pool¶
Pre-spawned worker processes ready to execute:
5. Parallel Execution¶
LPT scheduling for optimal load balancing:
6. Efficient IPC¶
Binary MessagePack instead of JSON:
JSON: {"node_id": "tests/test.py::test_func", "outcome": "passed"}
MessagePack: \x82\xa7node_id\xbetests/test.py::test_func\xa7outcome\xa6passed
Size: 30% smaller
Parse: 5x faster
Configuration for Speed¶
Optimal Settings¶
# pyproject.toml
[tool.rpytest]
# Use all CPU cores
parallel = "auto"
# Keep daemon running
daemon_idle_timeout = 600
# Reuse session fixtures
enable_fixture_reuse = true
# Skip slow tests in development
default_markers = "not slow"
Watch Mode¶
For fastest feedback during development:
# Fast tests only
rpytest --watch -m "not slow"
# Single file focus
rpytest tests/test_current.py --watch
CI Optimization¶
# Parallel jobs with duration-balanced sharding
jobs:
test:
strategy:
matrix:
shard: [0, 1, 2, 3]
steps:
- run: |
rpytest tests/ \
--shard=${{ matrix.shard }} \
--total-shards=4 \
--shard-strategy=duration_balanced \
-n auto
Profiling¶
Measure Collection Time¶
Measure Execution Breakdown¶
$ rpytest tests/ --timing
Setup: 0.02s
Collection: 0.05s
Execution: 2.30s
Teardown: 0.01s
Reporting: 0.01s
Total: 2.39s
Compare with pytest¶
Memory Efficiency¶
CLI Memory¶
Daemon Memory¶
Optimization Tips¶
# Limit workers to reduce memory
rpytest -n 2
# Reduce fixture max age
rpytest --fixture-max-age=300
# Restart daemon periodically
rpytest --daemon-stop && rpytest tests/
Bottleneck Analysis¶
Slow Collection?¶
Check for: - Module-level imports in test files - Complex conftest.py fixtures - Dynamic test generation
Solution:
Slow Tests?¶
Identify slowest tests:
Output:
Slowest 10 tests:
2.50s tests/test_db.py::test_complex_query
1.20s tests/test_api.py::test_file_upload
0.80s tests/test_auth.py::test_oauth_flow
Slow Fixtures?¶
Check fixture setup time:
IPC Latency?¶
Debug communication:
Scaling¶
Small Suite (<100 tests)¶
Medium Suite (100-1000 tests)¶
Large Suite (1000+ tests)¶
Huge Suite (10000+ tests)¶
# Maximum parallelism + duration balancing
rpytest tests/ \
--shard=$SHARD --total-shards=16 \
--shard-strategy=duration_balanced \
-n auto \
--reuse-fixtures
Comparison Table¶
| Metric | pytest | rpytest | Improvement |
|---|---|---|---|
| CLI startup | 200ms | 8ms | 25x |
| Collection (1k files) | 8.5s | 0.3s | 28x |
| Per-test overhead | 10ms | 2ms | 5x |
| Memory (CLI) | 40MB | 5MB | 8x |
| Watch mode latency | 500ms | 50ms | 10x |
| Parallel efficiency | 70% | 90% | 1.3x |
Future Optimizations¶
Planned improvements:
- Incremental collection: Only re-parse changed files
- Result caching: Skip unchanged tests
- Distributed execution: Run across multiple machines
- WASM workers: Faster isolation than subprocess