Files
2025-08-27 10:07:53 +02:00
..
2025-08-27 10:07:53 +02:00

Hero Supervisor Documentation

Overview

Hero Supervisor is a distributed job execution system that manages runners and coordinates job processing across multiple worker nodes. It provides a robust OpenRPC API for job management and runner administration.

Architecture

The supervisor consists of several key components:

  • Supervisor Core: Central coordinator that manages runners and job dispatch
  • OpenRPC Server: JSON-RPC API server for remote management
  • Redis Backend: Job queue and state management
  • Process Manager: Runner lifecycle management (Simple or Tmux)
  • Client Libraries: Native Rust and WASM clients for integration

Quick Start

Starting the Supervisor

# With default configuration
./supervisor

# With custom configuration file
./supervisor --config /path/to/config.toml

Example Configuration

# config.toml
redis_url = "redis://localhost:6379"
namespace = "hero"
bind_address = "127.0.0.1"
port = 3030

# Admin secrets for full access
admin_secrets = ["admin-secret-123"]

# User secrets for job operations
user_secrets = ["user-secret-456"]

# Register secrets for runner registration
register_secrets = ["register-secret-789"]

[[actors]]
id = "sal_runner_1"
name = "sal_runner_1"
binary_path = "/path/to/sal_runner"
db_path = "/tmp/sal_db"
redis_url = "redis://localhost:6379"
process_manager = "simple"

[[actors]]
id = "osis_runner_1"
name = "osis_runner_1"
binary_path = "/path/to/osis_runner"
db_path = "/tmp/osis_db"
redis_url = "redis://localhost:6379"
process_manager = "tmux:osis_session"

API Documentation

Job API Convention

The Hero Supervisor follows a consistent naming convention for job operations:

  • jobs. - General job operations (create, list)
  • job. - Specific job operations (run, start, status, result)

See Job API Convention for detailed documentation.

Core Methods

Runner Management

  • register_runner - Register a new runner
  • list_runners - List all registered runners
  • start_runner / stop_runner - Control runner lifecycle
  • get_runner_status - Get runner status
  • get_runner_logs - Retrieve runner logs

Job Management

  • jobs.create - Create a job without queuing
  • jobs.list - List all jobs with full details
  • job.run - Run a job and return result
  • job.start - Start a created job
  • job.stop - Stop a running job
  • job.delete - Delete a job from the system
  • job.status - Get job status (non-blocking)
  • job.result - Get job result (blocking)

Administration

  • add_secret / remove_secret - Manage authentication secrets
  • get_supervisor_info - Get system information
  • rpc.discover - OpenRPC specification discovery

Client Usage

Rust Client

use hero_supervisor_openrpc_client::{SupervisorClient, JobBuilder};

// Create client
let client = SupervisorClient::new("http://localhost:3030")?;

// Create a job
let job = JobBuilder::new()
    .caller_id("my_client")
    .context_id("my_context")
    .payload("print('Hello World')")
    .executor("osis")
    .runner("osis_runner_1")
    .timeout(60)
    .build()?;

// Option 1: Fire-and-forget execution
let result = client.job_run("user-secret", job.clone()).await?;
match result {
    JobResult::Success { success } => println!("Output: {}", success),
    JobResult::Error { error } => println!("Error: {}", error),
}

// Option 2: Asynchronous execution
let job_id = client.jobs_create("user-secret", job).await?;
client.job_start("user-secret", &job_id).await?;

// Poll for completion
loop {
    let status = client.job_status(&job_id).await?;
    if status.status == "completed" || status.status == "failed" {
        break;
    }
    tokio::time::sleep(Duration::from_secs(1)).await;
}

let result = client.job_result(&job_id).await?;

// Option 3: Job management
// Stop a running job
client.job_stop("user-secret", &job_id).await?;

// Delete a job
client.job_delete("user-secret", &job_id).await?;

// List all jobs (returns full Job objects)
let jobs = client.jobs_list("user-secret").await?;
for job in jobs {
    println!("Job {}: {} ({})", job.id, job.executor, job.payload);
}

WASM Client

import { WasmSupervisorClient, WasmJob } from 'hero-supervisor-openrpc-client';

// Create client
const client = new WasmSupervisorClient('http://localhost:3030');

// Create and run job
const job = new WasmJob('job-id', 'print("Hello")', 'osis', 'osis_runner_1');
const result = await client.create_job('user-secret', job);

Security

Authentication Levels

  1. Admin Secrets: Full system access

    • All runner management operations
    • All job operations
    • Secret management
    • System information access
  2. User Secrets: Job operations only

    • Create, run, start jobs
    • Get job status and results
    • No runner or secret management
  3. Register Secrets: Runner registration only

    • Register new runners
    • No other operations

Best Practices

  • Use different secret types for different access levels
  • Rotate secrets regularly
  • Store secrets securely (environment variables, secret management systems)
  • Use HTTPS in production environments
  • Implement proper logging and monitoring

Development

Building

# Build supervisor binary
cargo build --release

# Build with OpenRPC feature
cargo build --release --features openrpc

# Build client library
cd clients/openrpc
cargo build --release

Testing

# Run tests
cargo test

# Run with Redis (requires Redis server)
docker run -d -p 6379:6379 redis:alpine
cargo test -- --ignored

Examples

See the examples/ directory for:

  • Basic supervisor setup
  • Mock runner implementation
  • Comprehensive OpenRPC client usage
  • Configuration examples

Troubleshooting

Common Issues

  1. Redis Connection Failed

    • Ensure Redis server is running
    • Check Redis URL in configuration
    • Verify network connectivity
  2. Runner Registration Failed

    • Check register secret validity
    • Verify runner binary path exists
    • Ensure runner has proper permissions
  3. Job Execution Timeout

    • Increase job timeout value
    • Check runner resource availability
    • Monitor runner logs for issues
  4. OpenRPC Method Not Found

    • Verify method name spelling
    • Check OpenRPC specification
    • Ensure server supports the method

Logging

Enable debug logging:

RUST_LOG=debug ./supervisor --config config.toml

Monitoring

Monitor key metrics:

  • Runner status and health
  • Job queue lengths
  • Job success/failure rates
  • Response times
  • Redis connection status

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Update documentation
  5. Submit a pull request

License

[License information here]