Alerting¶

Configure alerts for rninja infrastructure.

Alert Conditions¶

Condition	Severity	Action
Cache hit rate < 50%	Warning	Investigate cache misses
Cache server down	Critical	Restart server
Disk > 90% full	Warning	Run GC
Build timeout	Error	Check build logs

Simple Monitoring Script¶

#!/bin/bash
# /etc/cron.hourly/rninja-alerts

# Check cache health
if ! rninja -t cache-health > /dev/null 2>&1; then
    echo "rninja cache unhealthy" | mail -s "Alert: rninja" [email protected]
fi

# Check disk usage
USAGE=$(df ~/.cache/rninja | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$USAGE" -gt 90 ]; then
    echo "rninja cache disk usage: ${USAGE}%" | mail -s "Warning: rninja" [email protected]
    rninja -t cache-gc
fi

Prometheus Alertmanager¶

# alertmanager.yml
route:
  receiver: 'ops-team'

receivers:
  - name: 'ops-team'
    email_configs:
      - to: '[email protected]'

PagerDuty Integration¶

For critical alerts:

receivers:
  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: '<key>'