Hero Supervisor Documentation
Overview
Hero Supervisor is a distributed job execution system that manages runners and coordinates job processing across multiple worker nodes. It provides a robust OpenRPC API for job management and runner administration.
Architecture
The supervisor consists of several key components:
- Supervisor Core: Central coordinator that manages runners and job dispatch
- OpenRPC Server: JSON-RPC API server for remote management
- Redis Backend: Job queue and state management
- Process Manager: Runner lifecycle management (Simple or Tmux)
- Client Libraries: Native Rust and WASM clients for integration
Quick Start
Starting the Supervisor
# With default configuration
./supervisor
# With custom configuration file
./supervisor --config /path/to/config.toml
Example Configuration
# config.toml
redis_url = "redis://localhost:6379"
namespace = "hero"
bind_address = "127.0.0.1"
port = 3030
# Admin secrets for full access
admin_secrets = ["admin-secret-123"]
# User secrets for job operations
user_secrets = ["user-secret-456"]
# Register secrets for runner registration
register_secrets = ["register-secret-789"]
[[actors]]
id = "sal_runner_1"
name = "sal_runner_1"
binary_path = "/path/to/sal_runner"
db_path = "/tmp/sal_db"
redis_url = "redis://localhost:6379"
process_manager = "simple"
[[actors]]
id = "osis_runner_1"
name = "osis_runner_1"
binary_path = "/path/to/osis_runner"
db_path = "/tmp/osis_db"
redis_url = "redis://localhost:6379"
process_manager = "tmux:osis_session"
API Documentation
Job API Convention
The Hero Supervisor follows a consistent naming convention for job operations:
jobs.
- General job operations (create, list)job.
- Specific job operations (run, start, status, result)
See Job API Convention for detailed documentation.
Core Methods
Runner Management
register_runner
- Register a new runnerlist_runners
- List all registered runnersstart_runner
/stop_runner
- Control runner lifecycleget_runner_status
- Get runner statusget_runner_logs
- Retrieve runner logs
Job Management
jobs.create
- Create a job without queuingjobs.list
- List all jobs with full detailsjob.run
- Run a job and return resultjob.start
- Start a created jobjob.stop
- Stop a running jobjob.delete
- Delete a job from the systemjob.status
- Get job status (non-blocking)job.result
- Get job result (blocking)
Administration
add_secret
/remove_secret
- Manage authentication secretsget_supervisor_info
- Get system informationrpc.discover
- OpenRPC specification discovery
Client Usage
Rust Client
use hero_supervisor_openrpc_client::{SupervisorClient, JobBuilder};
// Create client
let client = SupervisorClient::new("http://localhost:3030")?;
// Create a job
let job = JobBuilder::new()
.caller_id("my_client")
.context_id("my_context")
.payload("print('Hello World')")
.executor("osis")
.runner("osis_runner_1")
.timeout(60)
.build()?;
// Option 1: Fire-and-forget execution
let result = client.job_run("user-secret", job.clone()).await?;
match result {
JobResult::Success { success } => println!("Output: {}", success),
JobResult::Error { error } => println!("Error: {}", error),
}
// Option 2: Asynchronous execution
let job_id = client.jobs_create("user-secret", job).await?;
client.job_start("user-secret", &job_id).await?;
// Poll for completion
loop {
let status = client.job_status(&job_id).await?;
if status.status == "completed" || status.status == "failed" {
break;
}
tokio::time::sleep(Duration::from_secs(1)).await;
}
let result = client.job_result(&job_id).await?;
// Option 3: Job management
// Stop a running job
client.job_stop("user-secret", &job_id).await?;
// Delete a job
client.job_delete("user-secret", &job_id).await?;
// List all jobs (returns full Job objects)
let jobs = client.jobs_list("user-secret").await?;
for job in jobs {
println!("Job {}: {} ({})", job.id, job.executor, job.payload);
}
WASM Client
import { WasmSupervisorClient, WasmJob } from 'hero-supervisor-openrpc-client';
// Create client
const client = new WasmSupervisorClient('http://localhost:3030');
// Create and run job
const job = new WasmJob('job-id', 'print("Hello")', 'osis', 'osis_runner_1');
const result = await client.create_job('user-secret', job);
Security
Authentication Levels
-
Admin Secrets: Full system access
- All runner management operations
- All job operations
- Secret management
- System information access
-
User Secrets: Job operations only
- Create, run, start jobs
- Get job status and results
- No runner or secret management
-
Register Secrets: Runner registration only
- Register new runners
- No other operations
Best Practices
- Use different secret types for different access levels
- Rotate secrets regularly
- Store secrets securely (environment variables, secret management systems)
- Use HTTPS in production environments
- Implement proper logging and monitoring
Development
Building
# Build supervisor binary
cargo build --release
# Build with OpenRPC feature
cargo build --release --features openrpc
# Build client library
cd clients/openrpc
cargo build --release
Testing
# Run tests
cargo test
# Run with Redis (requires Redis server)
docker run -d -p 6379:6379 redis:alpine
cargo test -- --ignored
Examples
See the examples/
directory for:
- Basic supervisor setup
- Mock runner implementation
- Comprehensive OpenRPC client usage
- Configuration examples
Troubleshooting
Common Issues
-
Redis Connection Failed
- Ensure Redis server is running
- Check Redis URL in configuration
- Verify network connectivity
-
Runner Registration Failed
- Check register secret validity
- Verify runner binary path exists
- Ensure runner has proper permissions
-
Job Execution Timeout
- Increase job timeout value
- Check runner resource availability
- Monitor runner logs for issues
-
OpenRPC Method Not Found
- Verify method name spelling
- Check OpenRPC specification
- Ensure server supports the method
Logging
Enable debug logging:
RUST_LOG=debug ./supervisor --config config.toml
Monitoring
Monitor key metrics:
- Runner status and health
- Job queue lengths
- Job success/failure rates
- Response times
- Redis connection status
Contributing
- Fork the repository
- Create a feature branch
- Make changes with tests
- Update documentation
- Submit a pull request
License
[License information here]