Production Checklist¶
Essential steps for production deployment.
Security¶
Authentication¶
-
[ ] Set strong JWT secret
-
[ ] Configure JWT expiration
-
[ ] Enable API key authentication for service accounts
Encryption¶
-
[ ] Set encryption key for credentials
-
[ ] Enable TLS/HTTPS
Network¶
-
[ ] Configure CORS appropriately
-
[ ] Set up firewall rules
- [ ] Use private network for database connections
- [ ] Enable rate limiting
Access Control¶
-
[ ] Disable debug endpoints in production
-
[ ] Use minimal permissions for service accounts
- [ ] Audit credential access
Database¶
PostgreSQL Recommended¶
-
[ ] Use PostgreSQL for production
-
[ ] Enable SSL connections
- [ ] Configure connection pooling
Backups¶
-
[ ] Set up automated backups
-
[ ] Test backup restoration
- [ ] Store backups in separate location
- [ ] Configure retention policy
Performance¶
- [ ] Add database indexes
- [ ] Monitor query performance
- [ ] Set up connection pooling (PgBouncer)
High Availability¶
Multiple Instances¶
-
[ ] Run at least 2 replicas
-
[ ] Configure load balancer
- [ ] Enable health checks
Queue¶
-
[ ] Use Redis for distributed queue
-
[ ] Configure Redis persistence
- [ ] Set up Redis Sentinel or Cluster
Failover¶
- [ ] Configure pod anti-affinity
- [ ] Set up PodDisruptionBudget
- [ ] Test failover scenarios
Monitoring¶
Metrics¶
-
[ ] Enable Prometheus metrics
-
[ ] Set up Grafana dashboards
- [ ] Configure alerting rules
Key Metrics to Monitor¶
| Metric | Alert Threshold |
|---|---|
m9m_execution_errors |
> 5 per minute |
m9m_execution_duration |
p99 > 30s |
m9m_queue_size |
> 1000 |
m9m_active_workflows |
Unexpected changes |
| Memory usage | > 80% |
| CPU usage | > 80% |
Logging¶
-
[ ] Use JSON log format
-
[ ] Set up log aggregation (ELK, Loki)
- [ ] Configure log retention
- [ ] Don't log sensitive data
Tracing¶
- [ ] Enable distributed tracing
Performance¶
Resource Allocation¶
- [ ] Set appropriate resource limits
Autoscaling¶
- [ ] Configure HPA
Timeouts¶
- [ ] Set appropriate timeouts
Operations¶
Deployment¶
- [ ] Use container orchestration (Kubernetes)
- [ ] Implement blue/green or rolling deployments
- [ ] Version container images
- [ ] Don't use
latesttag in production
Configuration¶
- [ ] Use environment variables or secrets manager
- [ ] Don't commit secrets to version control
- [ ] Use ConfigMaps for non-sensitive config
Updates¶
- [ ] Plan maintenance windows
- [ ] Test updates in staging first
- [ ] Document rollback procedures
- [ ] Keep dependencies updated
Disaster Recovery¶
Backup Strategy¶
| Data | Frequency | Retention |
|---|---|---|
| Database | Daily | 30 days |
| Workflows | On change | Indefinite |
| Credentials | Daily | 30 days |
| Config | On change | Version controlled |
Recovery Plan¶
- [ ] Document recovery procedures
- [ ] Test recovery regularly
- [ ] Define RTO and RPO
- [ ] Have runbooks ready
Compliance¶
Audit Logging¶
-
[ ] Enable audit logging
-
[ ] Log authentication events
- [ ] Log workflow modifications
- [ ] Log credential access
Data Protection¶
- [ ] Encrypt data at rest
- [ ] Encrypt data in transit (TLS)
- [ ] Implement data retention policies
- [ ] Handle PII appropriately
Pre-Launch Checklist¶
Final Verification¶
- [ ] All secrets are properly configured
- [ ] TLS certificates are valid and not expiring soon
- [ ] Database backups are working
- [ ] Monitoring and alerting are configured
- [ ] Health checks are passing
- [ ] Load testing completed
- [ ] Security scan completed
- [ ] Documentation is up to date
- [ ] Runbooks are ready
- [ ] Support contacts are defined
Go-Live¶
- [ ] Announce maintenance window
- [ ] Deploy to production
- [ ] Verify health checks
- [ ] Test critical workflows
- [ ] Monitor metrics
- [ ] Verify alerts work
- [ ] Celebrate!