add some documentation for blue book

This commit is contained in:
Timur Gordon
2025-11-14 11:00:26 +01:00
parent 75e62f4730
commit f67296cd25
11 changed files with 1275 additions and 8 deletions

0
docs/.collection Normal file
View File

67
docs/README.md Normal file
View File

@@ -0,0 +1,67 @@
# Horus Documentation
**Hierarchical Orchestration Runtime for Universal Scripts**
Horus is a distributed job execution system with three layers: Coordinator, Supervisor, and Runner.
## Quick Links
- **[Getting Started](./getting-started.md)** - Install and run your first job
- **[Architecture](./architecture.md)** - System design and components
- **[Etymology](./ethymology.md)** - The meaning behind the name
## Components
### Coordinator
Workflow orchestration engine for DAG-based execution.
- [Overview](./coordinator/overview.md)
### Supervisor
Job dispatcher with authentication and routing.
- [Overview](./supervisor/overview.md)
- [Authentication](./supervisor/auth.md)
- [OpenRPC API](./supervisor/openrpc.json)
### Runners
Job executors for different workload types.
- [Runner Overview](./runner/overview.md)
- [Hero Runner](./runner/hero.md) - Heroscript execution
- [SAL Runner](./runner/sal.md) - System operations
- [Osiris Runner](./runner/osiris.md) - Database operations
## Core Concepts
### Jobs
Units of work executed by runners. Each job contains:
- Target runner ID
- Payload (script/command)
- Cryptographic signature
- Optional timeout and environment variables
### Workflows
Multi-step DAGs executed by the Coordinator. Steps can:
- Run in parallel or sequence
- Pass data between steps
- Target different runners
- Handle errors and retries
### Signatures
All jobs must be cryptographically signed:
- Ensures job authenticity
- Prevents tampering
- Enables authorization
## Use Cases
- **Automation**: Execute system tasks and scripts
- **Data Pipelines**: Multi-step ETL workflows
- **CI/CD**: Build, test, and deployment pipelines
- **Infrastructure**: Manage cloud resources and containers
- **Integration**: Connect systems via scripted workflows
## Repository
[git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)

View File

@@ -1,15 +1,185 @@
# Architecture
The Horus architecture consists of three layers:
Horus is a hierarchical orchestration runtime with three layers: Coordinator, Supervisor, and Runner.
1. Coordinator: A workflow engine that executes DAG-based flows by sending ready job steps to the targeted supervisors.
2. Supervisor: A job dispatcher that routes jobs to the appropriate runners.
3. Runner: A job executor that runs the actual job steps.
## Overview
## Networking
```
┌─────────────────────────────────────────────────────────┐
│ Coordinator │
│ (Workflow Engine - DAG Execution) │
│ │
│ • Parses workflow definitions │
│ • Resolves dependencies │
│ • Dispatches ready steps │
│ • Tracks workflow state │
└────────────────────┬────────────────────────────────────┘
│ OpenRPC (HTTP/Mycelium)
┌────────────────────▼────────────────────────────────────┐
│ Supervisor │
│ (Job Dispatcher & Authenticator) │
│ │
│ • Verifies job signatures │
│ • Routes jobs to runners │
│ • Manages runner registry │
│ • Tracks job lifecycle │
└────────────────────┬────────────────────────────────────┘
│ Redis Queue Protocol
┌────────────────────▼────────────────────────────────────┐
│ Runners │
│ (Job Executors) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Hero │ │ SAL │ │ Osiris │ │
│ │ Runner │ │ Runner │ │ Runner │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
```
- The user / client talks to the coordinator over an OpenRPC interface, using either regular HTTP transport or Mycelium.
- The coordinator talks to the supervisor over an OpenRPC interface, using either regular HTTP transport or Mycelium.
- The supervisor talks to runners over a Redis based job execution protocol.
## Layers
### 1. Coordinator (Optional)
**Purpose:** Workflow orchestration and DAG execution
**Responsibilities:**
- Parse and validate workflow definitions
- Execute DAG-based flows
- Manage step dependencies
- Route jobs to appropriate supervisors
- Handle multi-step workflows
**Use When:**
- You need multi-step workflows
- Jobs have dependencies
- Parallel execution is required
- Complex data pipelines
[→ Coordinator Documentation](./coordinator/overview.md)
### 2. Supervisor (Required)
**Purpose:** Job admission, authentication, and routing
**Responsibilities:**
- Receive jobs via OpenRPC interface
- Verify cryptographic signatures
- Route jobs to appropriate runners
- Manage runner registry
- Track job status and results
**Features:**
- OpenRPC API for job management
- HTTP and Mycelium transport
- Signature-based authentication
- Runner health monitoring
[→ Supervisor Documentation](./supervisor/overview.md)
### 3. Runners (Required)
**Purpose:** Execute actual job workloads
**Available Runners:**
- **Hero Runner**: Executes heroscripts via Hero CLI
- **SAL Runner**: System operations (OS, K8s, cloud, etc.)
- **Osiris Runner**: Database operations with Rhai scripts
**Common Features:**
- Redis queue-based job polling
- Signature verification
- Timeout support
- Environment variable handling
[→ Runner Documentation](./runner/overview.md)
## Communication Protocols
### Client ↔ Coordinator
- **Protocol:** OpenRPC
- **Transport:** HTTP or Mycelium
- **Operations:** Submit workflow, check status, retrieve results
### Coordinator ↔ Supervisor
- **Protocol:** OpenRPC
- **Transport:** HTTP or Mycelium
- **Operations:** Create job, get status, retrieve logs
### Supervisor ↔ Runner
- **Protocol:** Redis Queue
- **Transport:** Redis pub/sub and lists
- **Operations:** Push job, poll queue, store result
## Job Flow
### Simple Job (No Coordinator)
```
1. Client → Supervisor: create_job()
2. Supervisor: Verify signature
3. Supervisor → Redis: Push to runner queue
4. Runner ← Redis: Pop job
5. Runner: Execute job
6. Runner → Redis: Store result
7. Client ← Supervisor: get_job_result()
```
### Workflow (With Coordinator)
```
1. Client → Coordinator: submit_workflow()
2. Coordinator: Parse DAG
3. Coordinator: Identify ready steps
4. Coordinator → Supervisor: create_job() for each ready step
5. Supervisor → Runner: Route via Redis
6. Runner: Execute and return result
7. Coordinator: Update workflow state
8. Coordinator: Dispatch next ready steps
9. Repeat until workflow complete
```
## Security Model
### Authentication
- Jobs must be cryptographically signed
- Signatures verified at Supervisor layer
- Public key infrastructure for identity
### Authorization
- Runners only execute signed jobs
- Signature verification before execution
- Untrusted jobs rejected
### Transport Security
- Optional TLS for HTTP transport
- End-to-end encryption via Mycelium
- No plaintext credentials
[→ Authentication Details](./supervisor/auth.md)
## Deployment Patterns
### Minimal Setup
```
Redis + Supervisor + Runner(s)
```
Single machine, simple job execution.
### Distributed Setup
```
Redis Cluster + Multiple Supervisors + Runner Pool
```
High availability, load balancing.
### Full Orchestration
```
Coordinator + Multiple Supervisors + Runner Pool
```
Complex workflows, multi-step pipelines.
## Design Principles
1. **Hierarchical**: Clear separation of concerns across layers
2. **Secure**: Signature-based authentication throughout
3. **Scalable**: Horizontal scaling at each layer
4. **Observable**: Comprehensive logging and status tracking
5. **Flexible**: Multiple runners for different workload types

View File

@@ -0,0 +1,145 @@
# Coordinator Overview
The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
## Architecture
```
Client → Coordinator → Supervisor(s) → Runner(s)
```
## Responsibilities
### 1. **Workflow Management**
- Parse and validate DAG workflow definitions
- Track workflow execution state
- Manage step dependencies
### 2. **Job Orchestration**
- Determine which steps are ready to execute
- Dispatch jobs to appropriate supervisors
- Handle step failures and retries
### 3. **Dependency Resolution**
- Track step completion
- Resolve data dependencies between steps
- Pass outputs from completed steps to dependent steps
### 4. **Multi-Supervisor Coordination**
- Route jobs to specific supervisors
- Handle supervisor failures
- Load balance across supervisors
## Workflow Definition
Workflows are defined as Directed Acyclic Graphs (DAGs):
```yaml
workflow:
name: "data-pipeline"
steps:
- id: "fetch"
runner: "hero"
payload: "!!http.get url:'https://api.example.com/data'"
- id: "process"
runner: "sal"
depends_on: ["fetch"]
payload: |
let data = input.fetch;
let processed = process_data(data);
processed
- id: "store"
runner: "osiris"
depends_on: ["process"]
payload: |
let model = osiris.model("results");
model.create(input.process);
```
## Features
### DAG Execution
- Parallel execution of independent steps
- Sequential execution of dependent steps
- Automatic dependency resolution
### Error Handling
- Step-level retry policies
- Workflow-level error handlers
- Partial workflow recovery
### Data Flow
- Pass outputs between steps
- Transform data between steps
- Aggregate results from parallel steps
### Monitoring
- Real-time workflow status
- Step-level progress tracking
- Execution metrics and logs
## Workflow Lifecycle
1. **Submission**: Client submits workflow definition
2. **Validation**: Coordinator validates DAG structure
3. **Scheduling**: Determine ready steps (no pending dependencies)
4. **Dispatch**: Send jobs to supervisors
5. **Tracking**: Monitor step completion
6. **Progression**: Execute next ready steps
7. **Completion**: Workflow finishes when all steps complete
## Use Cases
### Data Pipelines
```yaml
Extract → Transform → Load
```
### CI/CD Workflows
```yaml
Build → Test → Deploy
```
### Multi-Stage Processing
```yaml
Fetch Data → Process → Validate → Store → Notify
```
### Parallel Execution
```yaml
┌─ Task A ─┐
Start ──┼─ Task B ─┼── Aggregate → Finish
└─ Task C ─┘
```
## Configuration
```bash
# Start coordinator
coordinator --port 9090 --redis-url redis://localhost:6379
# With multiple supervisors
coordinator --port 9090 \
--supervisor http://supervisor1:8080 \
--supervisor http://supervisor2:8080
```
## API
The Coordinator exposes an OpenRPC API:
- `submit_workflow`: Submit a new workflow
- `get_workflow_status`: Check workflow progress
- `list_workflows`: List all workflows
- `cancel_workflow`: Stop a running workflow
- `get_workflow_logs`: Retrieve execution logs
## Advantages
- **Declarative**: Define what to do, not how
- **Scalable**: Parallel execution across multiple supervisors
- **Resilient**: Automatic retry and error handling
- **Observable**: Real-time status and logging
- **Composable**: Reuse workflows as steps in larger workflows

186
docs/getting-started.md Normal file
View File

@@ -0,0 +1,186 @@
# Getting Started with Horus
Quick start guide to running your first Horus job.
## Prerequisites
- Redis server running
- Rust toolchain installed
- Horus repository cloned
## Installation
### Build from Source
```bash
# Clone repository
git clone https://git.ourworld.tf/herocode/horus
cd horus
# Build all components
cargo build --release
# Binaries will be in target/release/
```
## Quick Start
### 1. Start Redis
```bash
# Using Docker
docker run -d -p 6379:6379 redis:latest
# Or install locally
redis-server
```
### 2. Start a Runner
```bash
# Start Hero runner
./target/release/herorunner my-runner
# Or SAL runner
./target/release/runner_sal my-sal-runner
# Or Osiris runner
./target/release/runner_osiris my-osiris-runner
```
### 3. Start the Supervisor
```bash
./target/release/supervisor --port 8080
```
### 4. Submit a Job
Using the Supervisor client:
```rust
use hero_supervisor_client::SupervisorClient;
use hero_job::Job;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = SupervisorClient::new("http://localhost:8080")?;
let job = Job::new(
"my-runner",
"print('Hello from Horus!')".to_string(),
);
let result = client.create_job(job).await?;
println!("Job ID: {}", result.id);
Ok(())
}
```
## Example Workflows
### Simple Heroscript Execution
```bash
# Job payload
print("Hello World")
!!git.list
```
### SAL System Operation
```rhai
// List files in directory
let files = os.list_dir("/tmp");
for file in files {
print(file);
}
```
### Osiris Data Storage
```rhai
// Store user data
let users = osiris.model("users");
let user = users.create(#{
name: "Alice",
email: "alice@example.com"
});
print(`Created user: ${user.id}`);
```
## Architecture Overview
```
┌──────────────┐
│ Coordinator │ (Optional: For workflows)
└──────┬───────┘
┌──────▼───────┐
│ Supervisor │ (Job dispatcher)
└──────┬───────┘
│ Redis
┌──────▼───────┐
│ Runners │ (Job executors)
│ - Hero │
│ - SAL │
│ - Osiris │
└──────────────┘
```
## Next Steps
- [Architecture Details](./architecture.md)
- [Runner Documentation](./runner/overview.md)
- [Supervisor API](./supervisor/overview.md)
- [Coordinator Workflows](./coordinator/overview.md)
- [Authentication](./supervisor/auth.md)
## Common Issues
### Runner Not Receiving Jobs
1. Check Redis connection
2. Verify runner ID matches job target
3. Check supervisor logs
### Job Signature Verification Failed
1. Ensure job is properly signed
2. Verify public key is registered
3. Check signature format
### Timeout Errors
1. Increase job timeout value
2. Check runner resource availability
3. Optimize job payload
## Development
### Running Tests
```bash
# All tests
cargo test
# Specific component
cargo test -p hero-supervisor
cargo test -p runner-hero
```
### Debug Mode
```bash
# Enable debug logging
RUST_LOG=debug ./target/release/supervisor --port 8080
```
## Support
- Documentation: [docs.ourworld.tf/horus](https://docs.ourworld.tf/horus)
- Repository: [git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
- Issues: Report on the repository

179
docs/job-format.md Normal file
View File

@@ -0,0 +1,179 @@
# Job Format
Jobs are the fundamental unit of work in Horus.
## Structure
```rust
pub struct Job {
pub id: String, // Unique job identifier
pub runner_id: String, // Target runner ID
pub payload: String, // Job payload (script/command)
pub timeout: Option<u64>, // Timeout in seconds
pub env_vars: HashMap<String, String>, // Environment variables
pub signatures: Vec<Signature>, // Cryptographic signatures
pub created_at: i64, // Creation timestamp
pub status: JobStatus, // Current status
}
```
## Job Status
```rust
pub enum JobStatus {
Pending, // Queued, not yet started
Running, // Currently executing
Completed, // Finished successfully
Failed, // Execution failed
Timeout, // Exceeded timeout
Cancelled, // Manually cancelled
}
```
## Signature Format
```rust
pub struct Signature {
pub public_key: String, // Signer's public key
pub signature: String, // Cryptographic signature
pub algorithm: String, // Signature algorithm (e.g., "ed25519")
}
```
## Creating a Job
### Minimal Job
```rust
use hero_job::Job;
let job = Job::new(
"my-runner",
"print('Hello World')".to_string(),
);
```
### With Timeout
```rust
let job = Job::builder()
.runner_id("my-runner")
.payload("long_running_task()")
.timeout(300) // 5 minutes
.build();
```
### With Environment Variables
```rust
use std::collections::HashMap;
let mut env_vars = HashMap::new();
env_vars.insert("API_KEY".to_string(), "secret".to_string());
env_vars.insert("ENV".to_string(), "production".to_string());
let job = Job::builder()
.runner_id("my-runner")
.payload("deploy_app()")
.env_vars(env_vars)
.build();
```
### With Signature
```rust
use hero_job::{Job, Signature};
let job = Job::builder()
.runner_id("my-runner")
.payload("important_task()")
.signature(Signature {
public_key: "ed25519:abc123...".to_string(),
signature: "sig:xyz789...".to_string(),
algorithm: "ed25519".to_string(),
})
.build();
```
## Payload Format
The payload format depends on the target runner:
### Hero Runner
Heroscript content:
```heroscript
!!git.list
print("Repositories listed")
!!docker.ps
```
### SAL Runner
Rhai script with SAL modules:
```rhai
let files = os.list_dir("/tmp");
for file in files {
print(file);
}
```
### Osiris Runner
Rhai script with Osiris database:
```rhai
let users = osiris.model("users");
let user = users.create(#{
name: "Alice",
email: "alice@example.com"
});
```
## Job Result
```rust
pub struct JobResult {
pub job_id: String,
pub status: JobStatus,
pub output: String, // Stdout
pub error: Option<String>, // Stderr or error message
pub exit_code: Option<i32>,
pub started_at: Option<i64>,
pub completed_at: Option<i64>,
}
```
## Best Practices
### Timeouts
- Always set timeouts for jobs
- Default: 60 seconds
- Long-running jobs: Set appropriate timeout
- Infinite jobs: Use separate monitoring
### Environment Variables
- Don't store secrets in env vars in production
- Use vault/secret management instead
- Keep env vars minimal
- Document required variables
### Signatures
- Always sign jobs in production
- Use strong algorithms (ed25519)
- Rotate keys regularly
- Store private keys securely
### Payloads
- Keep payloads concise
- Validate input data
- Handle errors gracefully
- Log important operations
## Validation
Jobs are validated before execution:
1. **Structure**: All required fields present
2. **Signature**: Valid cryptographic signature
3. **Runner**: Target runner exists and available
4. **Payload**: Non-empty payload
5. **Timeout**: Reasonable timeout value
Invalid jobs are rejected before execution.

71
docs/runner/hero.md Normal file
View File

@@ -0,0 +1,71 @@
# Hero Runner
Executes heroscripts using the Hero CLI tool.
## Overview
The Hero runner pipes job payloads directly to `hero run -s` via stdin, making it ideal for executing Hero automation tasks and heroscripts.
## Features
- **Heroscript Execution**: Direct stdin piping to `hero run -s`
- **No Temp Files**: Secure execution without filesystem artifacts
- **Environment Variables**: Full environment variable support
- **Timeout Support**: Respects job timeout settings
- **Signature Verification**: Cryptographic job verification
## Usage
```bash
# Start the runner
herorunner my-hero-runner
# With custom Redis
herorunner my-hero-runner --redis-url redis://custom:6379
```
## Job Payload
The payload should contain the heroscript content:
```heroscript
!!git.list
print("Repositories listed")
!!docker.ps
```
## Examples
### Simple Print
```heroscript
print("Hello from heroscript!")
```
### Hero Actions
```heroscript
!!git.list
!!docker.start name:"myapp"
```
### With Environment Variables
```json
{
"payload": "print(env.MY_VAR)",
"env_vars": {
"MY_VAR": "Hello World"
}
}
```
## Requirements
- `hero` CLI must be installed and in PATH
- Redis server accessible
- Valid job signatures
## Error Handling
- **Hero CLI Not Found**: Returns error if `hero` command unavailable
- **Timeout**: Kills process if timeout exceeded
- **Non-zero Exit**: Returns error with hero CLI output
- **Invalid Signature**: Rejects job before execution

142
docs/runner/osiris.md Normal file
View File

@@ -0,0 +1,142 @@
# Osiris Runner
Database-backed runner for structured data storage and retrieval.
## Overview
The Osiris runner executes Rhai scripts with access to a model-based database system, enabling structured data operations and persistence.
## Features
- **Rhai Scripting**: Execute Rhai scripts with Osiris database access
- **Model-Based Storage**: Define and use data models
- **CRUD Operations**: Create, read, update, delete records
- **Query Support**: Search and filter data
- **Schema Validation**: Type-safe data operations
- **Transaction Support**: Atomic database operations
## Usage
```bash
# Start the runner
runner_osiris my-osiris-runner
# With custom Redis
runner_osiris my-osiris-runner --redis-url redis://custom:6379
```
## Job Payload
The payload should contain a Rhai script using Osiris operations:
```rhai
// Example: Store data
let model = osiris.model("users");
let user = model.create(#{
name: "Alice",
email: "alice@example.com",
age: 30
});
print(user.id);
// Example: Retrieve data
let found = model.get(user.id);
print(found.name);
```
## Examples
### Create Model and Store Data
```rhai
// Define model
let posts = osiris.model("posts");
// Create record
let post = posts.create(#{
title: "Hello World",
content: "First post",
author: "Alice",
published: true
});
print(`Created post with ID: ${post.id}`);
```
### Query Data
```rhai
let posts = osiris.model("posts");
// Find by field
let published = posts.find(#{
published: true
});
for post in published {
print(post.title);
}
```
### Update Records
```rhai
let posts = osiris.model("posts");
// Get record
let post = posts.get("post-123");
// Update fields
post.content = "Updated content";
posts.update(post);
```
### Delete Records
```rhai
let posts = osiris.model("posts");
// Delete by ID
posts.delete("post-123");
```
### Transactions
```rhai
osiris.transaction(|| {
let users = osiris.model("users");
let posts = osiris.model("posts");
let user = users.create(#{ name: "Bob" });
let post = posts.create(#{
title: "Bob's Post",
author_id: user.id
});
// Both operations commit together
});
```
## Data Models
Models are defined dynamically through Rhai scripts:
```rhai
let model = osiris.model("products");
// Model automatically handles:
// - ID generation
// - Timestamps (created_at, updated_at)
// - Schema validation
// - Indexing
```
## Requirements
- Redis server accessible
- Osiris database configured
- Valid job signatures
- Sufficient storage for data operations
## Use Cases
- **Configuration Storage**: Store application configs
- **User Data**: Manage user profiles and preferences
- **Workflow State**: Persist workflow execution state
- **Metrics & Logs**: Store structured logs and metrics
- **Cache Management**: Persistent caching layer

96
docs/runner/overview.md Normal file
View File

@@ -0,0 +1,96 @@
# Runners Overview
Runners are the execution layer in the Horus architecture. They receive jobs from the Supervisor via Redis queues and execute the actual workload.
## Architecture
```
Supervisor → Redis Queue → Runner → Execute Job → Return Result
```
## Available Runners
Horus provides three specialized runners:
### 1. **Hero Runner**
Executes heroscripts using the Hero CLI ecosystem.
**Use Cases:**
- Running Hero automation tasks
- Executing heroscripts from job payloads
- Integration with Hero CLI tools
**Binary:** `herorunner`
[→ Hero Runner Documentation](./hero.md)
### 2. **SAL Runner**
System Abstraction Layer runner for system-level operations.
**Use Cases:**
- OS operations (file, process, network)
- Infrastructure management (Kubernetes, VMs)
- Cloud provider operations (Hetzner)
- Database operations (Redis, Postgres)
**Binary:** `runner_sal`
[→ SAL Runner Documentation](./sal.md)
### 3. **Osiris Runner**
Database-backed runner for data storage and retrieval using Rhai scripts.
**Use Cases:**
- Structured data storage
- Model-based data operations
- Rhai script execution with database access
**Binary:** `runner_osiris`
[→ Osiris Runner Documentation](./osiris.md)
## Common Features
All runners implement the `Runner` trait and provide:
- **Job Execution**: Process jobs from Redis queues
- **Signature Verification**: Verify job signatures before execution
- **Timeout Support**: Respect job timeout settings
- **Environment Variables**: Pass environment variables to jobs
- **Error Handling**: Comprehensive error reporting
- **Logging**: Structured logging for debugging
## Runner Protocol
Runners communicate with the Supervisor using a Redis-based protocol:
1. **Job Queue**: Supervisor pushes jobs to `runner:{runner_id}:jobs`
2. **Job Processing**: Runner pops job, validates signature, executes
3. **Result Storage**: Runner stores result in `job:{job_id}:result`
4. **Status Updates**: Runner updates job status throughout execution
## Starting a Runner
```bash
# Hero Runner
herorunner <runner_id> [--redis-url <url>]
# SAL Runner
runner_sal <runner_id> [--redis-url <url>]
# Osiris Runner
runner_osiris <runner_id> [--redis-url <url>]
```
## Configuration
All runners accept:
- `runner_id`: Unique identifier for the runner (required)
- `--redis-url`: Redis connection URL (default: `redis://localhost:6379`)
## Security
- Jobs must be cryptographically signed
- Runners verify signatures before execution
- Untrusted jobs are rejected
- Environment variables should not contain sensitive data in production

123
docs/runner/sal.md Normal file
View File

@@ -0,0 +1,123 @@
# SAL Runner
System Abstraction Layer runner for system-level operations.
## Overview
The SAL runner executes Rhai scripts with access to system abstraction modules for OS operations, infrastructure management, and cloud provider interactions.
## Features
- **Rhai Scripting**: Execute Rhai scripts with SAL modules
- **System Operations**: File, process, and network management
- **Infrastructure**: Kubernetes, VM, and container operations
- **Cloud Providers**: Hetzner and other cloud integrations
- **Database Access**: Redis and Postgres client operations
- **Networking**: Mycelium and network configuration
## Available SAL Modules
### Core Modules
- **sal-os**: Operating system operations
- **sal-process**: Process management
- **sal-text**: Text processing utilities
- **sal-net**: Network operations
### Infrastructure
- **sal-virt**: Virtualization management
- **sal-kubernetes**: Kubernetes cluster operations
- **sal-zinit-client**: Zinit process manager
### Storage & Data
- **sal-redisclient**: Redis operations
- **sal-postgresclient**: PostgreSQL operations
- **sal-vault**: Secret management
### Networking
- **sal-mycelium**: Mycelium network integration
### Cloud Providers
- **sal-hetzner**: Hetzner cloud operations
### Version Control
- **sal-git**: Git repository operations
## Usage
```bash
# Start the runner
runner_sal my-sal-runner
# With custom Redis
runner_sal my-sal-runner --redis-url redis://custom:6379
```
## Job Payload
The payload should contain a Rhai script using SAL modules:
```rhai
// Example: List files
let files = os.list_dir("/tmp");
print(files);
// Example: Process management
let pid = process.spawn("ls", ["-la"]);
let output = process.wait(pid);
print(output);
```
## Examples
### File Operations
```rhai
// Read file
let content = os.read_file("/path/to/file");
print(content);
// Write file
os.write_file("/path/to/output", "Hello World");
```
### Kubernetes Operations
```rhai
// List pods
let pods = k8s.list_pods("default");
for pod in pods {
print(pod.name);
}
```
### Redis Operations
```rhai
// Set value
redis.set("key", "value");
// Get value
let val = redis.get("key");
print(val);
```
### Git Operations
```rhai
// Clone repository
git.clone("https://github.com/user/repo", "/tmp/repo");
// Get status
let status = git.status("/tmp/repo");
print(status);
```
## Requirements
- Redis server accessible
- System permissions for requested operations
- Valid job signatures
- SAL modules available in runtime
## Security Considerations
- SAL operations have system-level access
- Jobs must be from trusted sources
- Signature verification is mandatory
- Limit runner permissions in production

View File

@@ -0,0 +1,88 @@
# Supervisor Overview
The Supervisor is the job dispatcher layer in Horus. It receives jobs, verifies signatures, and routes them to appropriate runners.
## Architecture
```
Client → Supervisor → Redis Queue → Runner
```
## Responsibilities
### 1. **Job Admission**
- Receive jobs via OpenRPC interface
- Validate job structure and required fields
- Verify cryptographic signatures
### 2. **Authentication & Authorization**
- Verify job signatures using public keys
- Ensure jobs are from authorized sources
- Reject unsigned or invalid jobs
### 3. **Job Routing**
- Route jobs to appropriate runner queues
- Maintain runner registry
- Load balance across available runners
### 4. **Job Management**
- Track job status and lifecycle
- Provide job query and listing APIs
- Store job results and logs
### 5. **Runner Management**
- Register and track available runners
- Monitor runner health and availability
- Handle runner disconnections
## OpenRPC Interface
The Supervisor exposes an OpenRPC API for job management:
### Job Operations
- `create_job`: Submit a new job
- `get_job`: Retrieve job details
- `list_jobs`: List all jobs
- `delete_job`: Remove a job
- `get_job_logs`: Retrieve job execution logs
### Runner Operations
- `register_runner`: Register a new runner
- `list_runners`: List available runners
- `get_runner_status`: Check runner health
## Job Lifecycle
1. **Submission**: Client submits job via OpenRPC
2. **Validation**: Supervisor validates structure and signature
3. **Queueing**: Job pushed to runner's Redis queue
4. **Execution**: Runner processes job
5. **Completion**: Result stored in Redis
6. **Retrieval**: Client retrieves result via OpenRPC
## Transport Options
The Supervisor supports multiple transport layers:
- **HTTP**: Standard HTTP/HTTPS transport
- **Mycelium**: Peer-to-peer encrypted transport
## Configuration
```bash
# Start supervisor
supervisor --port 8080 --redis-url redis://localhost:6379
# With Mycelium
supervisor --port 8080 --mycelium --redis-url redis://localhost:6379
```
## Security
- All jobs must be cryptographically signed
- Signatures verified before job admission
- Public key infrastructure for identity
- Optional TLS for HTTP transport
- End-to-end encryption via Mycelium
[→ Authentication Documentation](./auth.md)