Process Supervision
The kernel's supervisor manages the lifecycle of process-isolated apps.
Overview
The supervisor is responsible for:
- Spawning app processes
- Health monitoring via heartbeats
- Graceful shutdown and restart
- Resource management and limits
App Lifecycle
┌─────────────────────────────────────────────────────────────────────┐
│ App Lifecycle │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐ │
│ │ Stopped │─────▶│ Starting │─────▶│ Ready │─────▶│ Stopping │ │
│ └─────────┘ └──────────┘ └─────────┘ └──────────┘ │
│ ▲ │ │ │ │
│ │ │ │ │ │
│ │ ▼ ▼ ▼ │
│ │ [Connect IPC] [Heartbeats] [Cleanup] │
│ │ [Register] [Traffic] │
│ │ │
│ └──────────────────── [Error/Crash] ──────────────────────────│
│ │
└─────────────────────────────────────────────────────────────────────┘
States
| State | Description |
|---|---|
stopped |
App is not running |
starting |
App process spawned, connecting to kernel |
ready |
App registered and ready for traffic |
stopping |
Graceful shutdown in progress |
error |
App crashed or failed health checks |
Spawning Apps
When an app is started:
- Load manifest: Read
openclawos.manifest.json - Validate capabilities: Check permissions
- Spawn process: Execute entry point
- Wait for registration: App calls
app.register - Wait for ready: App calls
app.ready
// Supervisor spawns app
const child = spawn("node", ["dist/index.js"], {
cwd: appPath,
env: {
...process.env,
OPENCLAWOS_SOCKET: socketPath,
OPENCLAWOS_APP_ID: manifest.id,
},
});
Health Monitoring
Apps must send periodic heartbeats:
// App sends heartbeat every 30 seconds
setInterval(async () => {
await kernel.heartbeat({ status: "healthy" });
}, 30000);
Heartbeat Protocol
- App sends
app.heartbeatrequest - Kernel responds with
{ ok: true, serverTime: ... } - If no heartbeat in 90 seconds, app marked unhealthy
- After 3 missed heartbeats, app is restarted
Restart Policy
interface RestartPolicy {
/** Maximum restart attempts */
maxRestarts: number; // Default: 5
/** Time window for restart counting */
restartWindow: number; // Default: 300000 (5 minutes)
/** Delay before restart */
restartDelay: number; // Default: 1000 (1 second)
/** Backoff multiplier */
backoffMultiplier: number; // Default: 2
}
Example restart sequence:
- Crash at T+0 → Restart after 1s
- Crash at T+5s → Restart after 2s
- Crash at T+10s → Restart after 4s
- Crash at T+15s → Restart after 8s
- Crash at T+20s → App disabled, requires manual intervention
Graceful Shutdown
When stopping an app:
- Send shutdown signal:
app.shutdownevent - Wait for cleanup: App has
shutdownTimeoutto finish - Force kill: If timeout exceeded, SIGKILL
// Kernel initiates shutdown
await sendToApp({ event: "shutdown", timeout: 5000 });
// Wait for app to cleanup
await waitForDisconnect(5000);
// Force kill if needed
if (app.isRunning) {
app.process.kill("SIGKILL");
}
App Shutdown Handler
class MyApp extends OpenClawApp {
protected async teardown(): Promise<void> {
// Clean up resources
await this.database.disconnect();
await this.queue.flush();
// App will exit after teardown completes
}
}
Resource Limits
Apps can have resource constraints:
interface ResourceLimits {
/** Maximum memory in bytes */
maxMemory?: number;
/** CPU shares (relative weight) */
cpuShares?: number;
/** Maximum file descriptors */
maxFds?: number;
}
Setting Limits
In manifest:
Process Isolation
Each app runs in its own process:
┌─────────────────────────────────────────────────────────────────┐
│ Kernel Process │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Supervisor │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ App Process │ │ App Process │ │ App Process │ │ │
│ │ │ (telegram) │ │ (discord) │ │ (slack) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ PID: 12345 │ │ PID: 12346 │ │ PID: 12347 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │ │ │
│ │ └─────────────────┼─────────────────┘ │ │
│ │ │ │ │
│ │ Unix Socket IPC │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Benefits
| Benefit | Description |
|---|---|
| Fault Isolation | App crash doesn't affect kernel or other apps |
| Memory Protection | Apps can't access each other's memory |
| Resource Accounting | Per-app CPU/memory tracking |
| Independent Updates | Update one app without stopping others |
Monitoring
The supervisor provides monitoring data:
interface AppStatus {
packageId: string;
state: AppState;
pid?: number;
startedAt?: number;
restartCount: number;
lastHeartbeat?: number;
lastError?: string;
resourceUsage?: {
memoryRss: number;
cpuPercent: number;
};
}
Gateway Methods
apps.list- List all apps with statusapps.info- Get detailed app infoapps.logs- Get app stdout/stderr logsapps.start- Start a stopped appapps.stop- Stop a running appapps.restart- Restart an app
Configuration
Supervisor configuration in config.json:
{
"apps": {
"runtime": "ipc",
"supervisor": {
"heartbeatInterval": 30000,
"heartbeatTimeout": 90000,
"shutdownTimeout": 5000,
"maxRestarts": 5,
"restartWindow": 300000
}
}
}
Next Steps
- Capabilities - Permission system
- IPC Protocol - Communication details
- Developing Apps - Build your own app