# Hero Supervisor Documentation ## Overview Hero Supervisor is a distributed job execution system that manages runners and coordinates job processing across multiple worker nodes. It provides a robust OpenRPC API for job management and runner administration. ## Architecture The supervisor consists of several key components: - **Supervisor Core**: Central coordinator that manages runners and job dispatch - **OpenRPC Server**: JSON-RPC API server for remote management - **Redis Backend**: Job queue and state management - **Process Manager**: Runner lifecycle management (Simple or Tmux) - **Client Libraries**: Native Rust and WASM clients for integration ## Quick Start ### Starting the Supervisor ```bash # With default configuration ./supervisor # With custom configuration file ./supervisor --config /path/to/config.toml ``` ### Example Configuration ```toml # config.toml redis_url = "redis://localhost:6379" namespace = "hero" bind_address = "127.0.0.1" port = 3030 # Admin secrets for full access admin_secrets = ["admin-secret-123"] # User secrets for job operations user_secrets = ["user-secret-456"] # Register secrets for runner registration register_secrets = ["register-secret-789"] [[actors]] id = "sal_runner_1" name = "sal_runner_1" binary_path = "/path/to/sal_runner" db_path = "/tmp/sal_db" redis_url = "redis://localhost:6379" process_manager = "simple" [[actors]] id = "osis_runner_1" name = "osis_runner_1" binary_path = "/path/to/osis_runner" db_path = "/tmp/osis_db" redis_url = "redis://localhost:6379" process_manager = "tmux:osis_session" ``` ## API Documentation ### Job API Convention The Hero Supervisor follows a consistent naming convention for job operations: - **`jobs.`** - General job operations (create, list) - **`job.`** - Specific job operations (run, start, status, result) See [Job API Convention](job-api-convention.md) for detailed documentation. ### Core Methods #### Runner Management - `register_runner` - Register a new runner - `list_runners` - List all registered runners - `start_runner` / `stop_runner` - Control runner lifecycle - `get_runner_status` - Get runner status - `get_runner_logs` - Retrieve runner logs #### Job Management - `jobs.create` - Create a job without queuing - `jobs.list` - List all jobs with full details - `job.run` - Run a job and return result - `job.start` - Start a created job - `job.stop` - Stop a running job - `job.delete` - Delete a job from the system - `job.status` - Get job status (non-blocking) - `job.result` - Get job result (blocking) #### Administration - `add_secret` / `remove_secret` - Manage authentication secrets - `get_supervisor_info` - Get system information - `rpc.discover` - OpenRPC specification discovery ## Client Usage ### Rust Client ```rust use hero_supervisor_openrpc_client::{SupervisorClient, JobBuilder}; // Create client let client = SupervisorClient::new("http://localhost:3030")?; // Create a job let job = JobBuilder::new() .caller_id("my_client") .context_id("my_context") .payload("print('Hello World')") .executor("osis") .runner("osis_runner_1") .timeout(60) .build()?; // Option 1: Fire-and-forget execution let result = client.job_run("user-secret", job.clone()).await?; match result { JobResult::Success { success } => println!("Output: {}", success), JobResult::Error { error } => println!("Error: {}", error), } // Option 2: Asynchronous execution let job_id = client.jobs_create("user-secret", job).await?; client.job_start("user-secret", &job_id).await?; // Poll for completion loop { let status = client.job_status(&job_id).await?; if status.status == "completed" || status.status == "failed" { break; } tokio::time::sleep(Duration::from_secs(1)).await; } let result = client.job_result(&job_id).await?; // Option 3: Job management // Stop a running job client.job_stop("user-secret", &job_id).await?; // Delete a job client.job_delete("user-secret", &job_id).await?; // List all jobs (returns full Job objects) let jobs = client.jobs_list("user-secret").await?; for job in jobs { println!("Job {}: {} ({})", job.id, job.executor, job.payload); } ``` ### WASM Client ```javascript import { WasmSupervisorClient, WasmJob } from 'hero-supervisor-openrpc-client'; // Create client const client = new WasmSupervisorClient('http://localhost:3030'); // Create and run job const job = new WasmJob('job-id', 'print("Hello")', 'osis', 'osis_runner_1'); const result = await client.create_job('user-secret', job); ``` ## Security ### Authentication Levels 1. **Admin Secrets**: Full system access - All runner management operations - All job operations - Secret management - System information access 2. **User Secrets**: Job operations only - Create, run, start jobs - Get job status and results - No runner or secret management 3. **Register Secrets**: Runner registration only - Register new runners - No other operations ### Best Practices - Use different secret types for different access levels - Rotate secrets regularly - Store secrets securely (environment variables, secret management systems) - Use HTTPS in production environments - Implement proper logging and monitoring ## Development ### Building ```bash # Build supervisor binary cargo build --release # Build with OpenRPC feature cargo build --release --features openrpc # Build client library cd clients/openrpc cargo build --release ``` ### Testing ```bash # Run tests cargo test # Run with Redis (requires Redis server) docker run -d -p 6379:6379 redis:alpine cargo test -- --ignored ``` ### Examples See the `examples/` directory for: - Basic supervisor setup - Mock runner implementation - Comprehensive OpenRPC client usage - Configuration examples ## Troubleshooting ### Common Issues 1. **Redis Connection Failed** - Ensure Redis server is running - Check Redis URL in configuration - Verify network connectivity 2. **Runner Registration Failed** - Check register secret validity - Verify runner binary path exists - Ensure runner has proper permissions 3. **Job Execution Timeout** - Increase job timeout value - Check runner resource availability - Monitor runner logs for issues 4. **OpenRPC Method Not Found** - Verify method name spelling - Check OpenRPC specification - Ensure server supports the method ### Logging Enable debug logging: ```bash RUST_LOG=debug ./supervisor --config config.toml ``` ### Monitoring Monitor key metrics: - Runner status and health - Job queue lengths - Job success/failure rates - Response times - Redis connection status ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make changes with tests 4. Update documentation 5. Submit a pull request ## License [License information here]