Benchmarks¶

Performance comparison between rpytest and pytest on various test suites.

Test Environment¶

CPU: AMD Ryzen 7 5700U (16 threads)
OS: Linux 6.14.0-37-generic
Python: 3.12.3
pytest: 9.0.2
pytest-xdist: 3.8.0

Quick Summary¶

Metric	pytest	rpytest	Improvement
Execution Time	0.30s	0.25s	1.2x faster
CLI Memory	39.4 MB	5.9 MB	6.7x less
Wall Clock	0.63s	0.32s	2.0x faster

Sequential Execution¶

500 Test Suite¶

Runner	Execution	Wall Clock
pytest	0.30s	0.63s
rpytest	0.25s	0.32s

Speedup: 1.2x execution, 2.0x wall clock

Breakdown¶

Phase	pytest	rpytest
CLI Startup	~200ms	~8ms
Collection	~100ms	~20ms
Execution	300ms	250ms
Reporting	~30ms	~10ms

Parallel Execution¶

rpytest provides built-in -n support compatible with pytest-xdist.

Configuration	Time	vs Sequential
pytest	0.30s	baseline
pytest -n 4	0.87s	2.9x slower
pytest -n auto	1.90s	6.3x slower
rpytest	0.25s	baseline
rpytest -n 4	0.25s	same
rpytest -n auto	0.20s	1.25x faster

Why xdist Can Be Slower¶

For small-to-medium test suites, pytest-xdist often provides no benefit:

Worker startup: Each worker spawns new Python process
Re-collection: Each worker collects the entire test suite
IPC overhead: Results serialized and sent to master

rpytest avoids these with: - Warm daemon with pre-loaded pytest - Single collection, shared across workers - Efficient binary IPC

Memory Usage¶

Component	pytest	rpytest
CLI Process	39.4 MB	5.9 MB
Daemon	N/A	~80 MB (shared)

The rpytest daemon is shared across all CLI invocations, so the per-invocation cost is just 5.9 MB.

Throughput¶

Metric	pytest	rpytest
Tests/second	1,667	2,000
ms/test	0.60	0.50

Collection Performance¶

Native AST collection vs pytest import-based:

File Count	pytest	rpytest
10 files	0.5s	0.05s
100 files	2.0s	0.1s
1000 files	8.5s	0.3s

Speedup: 10-28x faster collection

Detailed Results¶

Execution Time (3 runs)¶

Configuration	Run 1	Run 2	Run 3	Average
pytest	0.31s	0.30s	0.30s	0.30s
pytest -n 4	0.89s	0.86s	0.87s	0.87s
rpytest	0.26s	0.23s	0.26s	0.25s
rpytest -n 4	0.25s	0.37s	0.33s	0.32s

Wall Clock (3 runs)¶

Configuration	Run 1	Run 2	Run 3	Average
pytest	0.63s	0.62s	0.64s	0.63s
rpytest	0.32s	0.31s	0.33s	0.32s

Test Suite Details¶

Benchmark suite composition:

500 tests across 30 files
Simple assertion tests
Parameterized tests (10 params each)
Tests with fixtures
Class-based test methods

Running Benchmarks¶

Reproduce these results:

# Setup
uv pip install pytest-xdist

# pytest benchmarks
time uv run python -m pytest benchmark_suite/ -q
time uv run python -m pytest benchmark_suite/ -n 4 -q
time uv run python -m pytest benchmark_suite/ -n auto -q

# rpytest benchmarks
time ./target/release/rpytest benchmark_suite/ -q
time ./target/release/rpytest benchmark_suite/ -n 1 -q
time ./target/release/rpytest benchmark_suite/ -n 4 -q

# Memory comparison
/usr/bin/time -v uv run python -m pytest benchmark_suite/ -q 2>&1 | grep "Maximum resident"
/usr/bin/time -v ./target/release/rpytest benchmark_suite/ -q 2>&1 | grep "Maximum resident"

When to Use What¶

Use rpytest (default mode) when:¶

Running unit tests during development
Fast iteration in watch mode
CI with modest test counts (<1000)

Use rpytest -n auto when:¶

Large test suites (>1000 tests)
Tests with significant individual runtime (>100ms)
Need to saturate CPU with compute-heavy tests

Use sharding when:¶

Very large suites (>5000 tests)
Distributed CI runners available
Wall clock time critical

CI Performance¶

Traditional pytest + xdist¶

# Single runner with xdist
- run: pytest tests/ -n auto
# Time: ~120s for 5000 tests

rpytest with sharding¶

# 4 runners with sharding
jobs:
  test:
    strategy:
      matrix:
        shard: [0, 1, 2, 3]
    steps:
      - run: rpytest --shard ${{ matrix.shard }} --total-shards 4
# Time: ~35s per runner

Total CI time: 120s vs 35s = 3.4x faster