revert: rename back to docs directory
This commit is contained in:
1
docs/.collection
Normal file
1
docs/.collection
Normal file
@@ -0,0 +1 @@
|
||||
horus
|
||||
67
docs/README.md
Normal file
67
docs/README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Horus Documentation
|
||||
|
||||
**Hierarchical Orchestration Runtime for Universal Scripts**
|
||||
|
||||
Horus is a distributed job execution system with three layers: Coordinator, Supervisor, and Runner.
|
||||
|
||||
## Quick Links
|
||||
|
||||
- **[Getting Started](./getting-started.md)** - Install and run your first job
|
||||
- **[Architecture](./architecture.md)** - System design and components
|
||||
- **[Etymology](./ethymology.md)** - The meaning behind the name
|
||||
|
||||
## Components
|
||||
|
||||
### Coordinator
|
||||
Workflow orchestration engine for DAG-based execution.
|
||||
|
||||
- [Overview](./coordinator/coordinator.md)
|
||||
|
||||
### Supervisor
|
||||
Job dispatcher with authentication and routing.
|
||||
|
||||
- [Overview](./supervisor/supervisor.md)
|
||||
- [Authentication](./supervisor/auth.md)
|
||||
- [OpenRPC API](./supervisor/openrpc.json)
|
||||
|
||||
### Runners
|
||||
Job executors for different workload types.
|
||||
|
||||
- [Runner Overview](./runner/runners.md)
|
||||
- [Hero Runner](./runner/hero.md) - Heroscript execution
|
||||
- [SAL Runner](./runner/sal.md) - System operations
|
||||
- [Osiris Runner](./runner/osiris.md) - Database operations
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Jobs
|
||||
Units of work executed by runners. Each job contains:
|
||||
- Target runner ID
|
||||
- Payload (script/command)
|
||||
- Cryptographic signature
|
||||
- Optional timeout and environment variables
|
||||
|
||||
### Workflows
|
||||
Multi-step DAGs executed by the Coordinator. Steps can:
|
||||
- Run in parallel or sequence
|
||||
- Pass data between steps
|
||||
- Target different runners
|
||||
- Handle errors and retries
|
||||
|
||||
### Signatures
|
||||
All jobs must be cryptographically signed:
|
||||
- Ensures job authenticity
|
||||
- Prevents tampering
|
||||
- Enables authorization
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Automation**: Execute system tasks and scripts
|
||||
- **Data Pipelines**: Multi-step ETL workflows
|
||||
- **CI/CD**: Build, test, and deployment pipelines
|
||||
- **Infrastructure**: Manage cloud resources and containers
|
||||
- **Integration**: Connect systems via scripted workflows
|
||||
|
||||
## Repository
|
||||
|
||||
[git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
|
||||
185
docs/architecture.md
Normal file
185
docs/architecture.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# Architecture
|
||||
|
||||
Horus is a hierarchical orchestration runtime with three layers: Coordinator, Supervisor, and Runner.
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Coordinator │
|
||||
│ (Workflow Engine - DAG Execution) │
|
||||
│ │
|
||||
│ • Parses workflow definitions │
|
||||
│ • Resolves dependencies │
|
||||
│ • Dispatches ready steps │
|
||||
│ • Tracks workflow state │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ OpenRPC (HTTP/Mycelium)
|
||||
│
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Supervisor │
|
||||
│ (Job Dispatcher & Authenticator) │
|
||||
│ │
|
||||
│ • Verifies job signatures │
|
||||
│ • Routes jobs to runners │
|
||||
│ • Manages runner registry │
|
||||
│ • Tracks job lifecycle │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ Redis Queue Protocol
|
||||
│
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Runners │
|
||||
│ (Job Executors) │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Hero │ │ SAL │ │ Osiris │ │
|
||||
│ │ Runner │ │ Runner │ │ Runner │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Layers
|
||||
|
||||
### 1. Coordinator (Optional)
|
||||
**Purpose:** Workflow orchestration and DAG execution
|
||||
|
||||
**Responsibilities:**
|
||||
- Parse and validate workflow definitions
|
||||
- Execute DAG-based flows
|
||||
- Manage step dependencies
|
||||
- Route jobs to appropriate supervisors
|
||||
- Handle multi-step workflows
|
||||
|
||||
**Use When:**
|
||||
- You need multi-step workflows
|
||||
- Jobs have dependencies
|
||||
- Parallel execution is required
|
||||
- Complex data pipelines
|
||||
|
||||
[→ Coordinator Documentation](./coordinator/coordinator.md)
|
||||
|
||||
### 2. Supervisor (Required)
|
||||
**Purpose:** Job admission, authentication, and routing
|
||||
|
||||
**Responsibilities:**
|
||||
- Receive jobs via OpenRPC interface
|
||||
- Verify cryptographic signatures
|
||||
- Route jobs to appropriate runners
|
||||
- Manage runner registry
|
||||
- Track job status and results
|
||||
|
||||
**Features:**
|
||||
- OpenRPC API for job management
|
||||
- HTTP and Mycelium transport
|
||||
- Signature-based authentication
|
||||
- Runner health monitoring
|
||||
|
||||
[→ Supervisor Documentation](./supervisor/supervisor.md)
|
||||
|
||||
### 3. Runners (Required)
|
||||
**Purpose:** Execute actual job workloads
|
||||
|
||||
**Available Runners:**
|
||||
- **Hero Runner**: Executes heroscripts via Hero CLI
|
||||
- **SAL Runner**: System operations (OS, K8s, cloud, etc.)
|
||||
- **Osiris Runner**: Database operations with Rhai scripts
|
||||
|
||||
**Common Features:**
|
||||
- Redis queue-based job polling
|
||||
- Signature verification
|
||||
- Timeout support
|
||||
- Environment variable handling
|
||||
|
||||
[→ Runner Documentation](./runner/runners.md)
|
||||
|
||||
## Communication Protocols
|
||||
|
||||
### Client ↔ Coordinator
|
||||
- **Protocol:** OpenRPC
|
||||
- **Transport:** HTTP or Mycelium
|
||||
- **Operations:** Submit workflow, check status, retrieve results
|
||||
|
||||
### Coordinator ↔ Supervisor
|
||||
- **Protocol:** OpenRPC
|
||||
- **Transport:** HTTP or Mycelium
|
||||
- **Operations:** Create job, get status, retrieve logs
|
||||
|
||||
### Supervisor ↔ Runner
|
||||
- **Protocol:** Redis Queue
|
||||
- **Transport:** Redis pub/sub and lists
|
||||
- **Operations:** Push job, poll queue, store result
|
||||
|
||||
## Job Flow
|
||||
|
||||
### Simple Job (No Coordinator)
|
||||
```
|
||||
1. Client → Supervisor: create_job()
|
||||
2. Supervisor: Verify signature
|
||||
3. Supervisor → Redis: Push to runner queue
|
||||
4. Runner ← Redis: Pop job
|
||||
5. Runner: Execute job
|
||||
6. Runner → Redis: Store result
|
||||
7. Client ← Supervisor: get_job_result()
|
||||
```
|
||||
|
||||
### Workflow (With Coordinator)
|
||||
```
|
||||
1. Client → Coordinator: submit_workflow()
|
||||
2. Coordinator: Parse DAG
|
||||
3. Coordinator: Identify ready steps
|
||||
4. Coordinator → Supervisor: create_job() for each ready step
|
||||
5. Supervisor → Runner: Route via Redis
|
||||
6. Runner: Execute and return result
|
||||
7. Coordinator: Update workflow state
|
||||
8. Coordinator: Dispatch next ready steps
|
||||
9. Repeat until workflow complete
|
||||
```
|
||||
|
||||
## Security Model
|
||||
|
||||
### Authentication
|
||||
- Jobs must be cryptographically signed
|
||||
- Signatures verified at Supervisor layer
|
||||
- Public key infrastructure for identity
|
||||
|
||||
### Authorization
|
||||
- Runners only execute signed jobs
|
||||
- Signature verification before execution
|
||||
- Untrusted jobs rejected
|
||||
|
||||
### Transport Security
|
||||
- Optional TLS for HTTP transport
|
||||
- End-to-end encryption via Mycelium
|
||||
- No plaintext credentials
|
||||
|
||||
[→ Authentication Details](./supervisor/auth.md)
|
||||
|
||||
## Deployment Patterns
|
||||
|
||||
### Minimal Setup
|
||||
```
|
||||
Redis + Supervisor + Runner(s)
|
||||
```
|
||||
Single machine, simple job execution.
|
||||
|
||||
### Distributed Setup
|
||||
```
|
||||
Redis Cluster + Multiple Supervisors + Runner Pool
|
||||
```
|
||||
High availability, load balancing.
|
||||
|
||||
### Full Orchestration
|
||||
```
|
||||
Coordinator + Multiple Supervisors + Runner Pool
|
||||
```
|
||||
Complex workflows, multi-step pipelines.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Hierarchical**: Clear separation of concerns across layers
|
||||
2. **Secure**: Signature-based authentication throughout
|
||||
3. **Scalable**: Horizontal scaling at each layer
|
||||
4. **Observable**: Comprehensive logging and status tracking
|
||||
5. **Flexible**: Multiple runners for different workload types
|
||||
|
||||
|
||||
145
docs/coordinator/coordinator.md
Normal file
145
docs/coordinator/coordinator.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Coordinator Overview
|
||||
|
||||
The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Client → Coordinator → Supervisor(s) → Runner(s)
|
||||
```
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### 1. **Workflow Management**
|
||||
- Parse and validate DAG workflow definitions
|
||||
- Track workflow execution state
|
||||
- Manage step dependencies
|
||||
|
||||
### 2. **Job Orchestration**
|
||||
- Determine which steps are ready to execute
|
||||
- Dispatch jobs to appropriate supervisors
|
||||
- Handle step failures and retries
|
||||
|
||||
### 3. **Dependency Resolution**
|
||||
- Track step completion
|
||||
- Resolve data dependencies between steps
|
||||
- Pass outputs from completed steps to dependent steps
|
||||
|
||||
### 4. **Multi-Supervisor Coordination**
|
||||
- Route jobs to specific supervisors
|
||||
- Handle supervisor failures
|
||||
- Load balance across supervisors
|
||||
|
||||
## Workflow Definition
|
||||
|
||||
Workflows are defined as Directed Acyclic Graphs (DAGs):
|
||||
|
||||
```yaml
|
||||
workflow:
|
||||
name: "data-pipeline"
|
||||
steps:
|
||||
- id: "fetch"
|
||||
runner: "hero"
|
||||
payload: "!!http.get url:'https://api.example.com/data'"
|
||||
|
||||
- id: "process"
|
||||
runner: "sal"
|
||||
depends_on: ["fetch"]
|
||||
payload: |
|
||||
let data = input.fetch;
|
||||
let processed = process_data(data);
|
||||
processed
|
||||
|
||||
- id: "store"
|
||||
runner: "osiris"
|
||||
depends_on: ["process"]
|
||||
payload: |
|
||||
let model = osiris.model("results");
|
||||
model.create(input.process);
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### DAG Execution
|
||||
- Parallel execution of independent steps
|
||||
- Sequential execution of dependent steps
|
||||
- Automatic dependency resolution
|
||||
|
||||
### Error Handling
|
||||
- Step-level retry policies
|
||||
- Workflow-level error handlers
|
||||
- Partial workflow recovery
|
||||
|
||||
### Data Flow
|
||||
- Pass outputs between steps
|
||||
- Transform data between steps
|
||||
- Aggregate results from parallel steps
|
||||
|
||||
### Monitoring
|
||||
- Real-time workflow status
|
||||
- Step-level progress tracking
|
||||
- Execution metrics and logs
|
||||
|
||||
## Workflow Lifecycle
|
||||
|
||||
1. **Submission**: Client submits workflow definition
|
||||
2. **Validation**: Coordinator validates DAG structure
|
||||
3. **Scheduling**: Determine ready steps (no pending dependencies)
|
||||
4. **Dispatch**: Send jobs to supervisors
|
||||
5. **Tracking**: Monitor step completion
|
||||
6. **Progression**: Execute next ready steps
|
||||
7. **Completion**: Workflow finishes when all steps complete
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Data Pipelines
|
||||
```yaml
|
||||
Extract → Transform → Load
|
||||
```
|
||||
|
||||
### CI/CD Workflows
|
||||
```yaml
|
||||
Build → Test → Deploy
|
||||
```
|
||||
|
||||
### Multi-Stage Processing
|
||||
```yaml
|
||||
Fetch Data → Process → Validate → Store → Notify
|
||||
```
|
||||
|
||||
### Parallel Execution
|
||||
```yaml
|
||||
┌─ Task A ─┐
|
||||
Start ──┼─ Task B ─┼── Aggregate → Finish
|
||||
└─ Task C ─┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# Start coordinator
|
||||
coordinator --port 9090 --redis-url redis://localhost:6379
|
||||
|
||||
# With multiple supervisors
|
||||
coordinator --port 9090 \
|
||||
--supervisor http://supervisor1:8080 \
|
||||
--supervisor http://supervisor2:8080
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
The Coordinator exposes an OpenRPC API:
|
||||
|
||||
- `submit_workflow`: Submit a new workflow
|
||||
- `get_workflow_status`: Check workflow progress
|
||||
- `list_workflows`: List all workflows
|
||||
- `cancel_workflow`: Stop a running workflow
|
||||
- `get_workflow_logs`: Retrieve execution logs
|
||||
|
||||
## Advantages
|
||||
|
||||
- **Declarative**: Define what to do, not how
|
||||
- **Scalable**: Parallel execution across multiple supervisors
|
||||
- **Resilient**: Automatic retry and error handling
|
||||
- **Observable**: Real-time status and logging
|
||||
- **Composable**: Reuse workflows as steps in larger workflows
|
||||
91
docs/ethymology.md
Normal file
91
docs/ethymology.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# HORUS — The Meaning Behind the Name
|
||||
*Hierarchical Orchestration Runtime for Universal Scripts*
|
||||
|
||||
---
|
||||
|
||||
## 1. Why “Horus”?
|
||||
|
||||
**Horus** is one of the oldest and most symbolic deities of ancient Egypt:
|
||||
a god of the **sky, perception, order, and dominion**.
|
||||
|
||||
In mythology, Horus *is* the sky itself;
|
||||
his **right eye is the sun** (clarity, authority),
|
||||
his **left eye the moon** (rhythm, balance).
|
||||
|
||||
This symbolism aligns perfectly with a system built to supervise, coordinate, and execute distributed workloads.
|
||||
|
||||
---
|
||||
|
||||
## 2. Symbolic Mapping to the Architecture
|
||||
|
||||
- **Sky** → the compute fabric itself
|
||||
- **Solar eye (sun)** → supervisor layer (visibility, authentication, authority)
|
||||
- **Lunar eye (moon)** → coordinator layer (workflow rhythms, stepwise order)
|
||||
- **Falcon wings** → runners (swift execution of tasks)
|
||||
- **Battle against chaos** → ordering and normalizing raw jobs
|
||||
|
||||
Horus is an archetype of **oversight**, **correct action**, and **restoring balance**—all fundamental qualities of an agentic execution system.
|
||||
|
||||
---
|
||||
|
||||
## 3. The Name as a Backronym
|
||||
**H O R U S**
|
||||
**H**ierarchical
|
||||
**O**rchestration
|
||||
**R**untime for
|
||||
**U**niversal
|
||||
**S**cripts
|
||||
|
||||
This describes the system exactly:
|
||||
a runtime that receives jobs, authenticates them, orchestrates workflows, and executes scripts across distributed runners.
|
||||
|
||||
---
|
||||
|
||||
## 4. Why It Fits This Stack
|
||||
|
||||
The stack consists of:
|
||||
|
||||
- **Job** – the incoming intent
|
||||
- **Supervisor** – verifies, authenticates, admits
|
||||
- **Coordinator** – plans, arranges, sequences
|
||||
- **Runner** – executes scripts
|
||||
- **SAL** – system-level script engine
|
||||
- **Osiris** – object-level storage & retrieval engine
|
||||
|
||||
All of this is unified by the central logic of *oversight, orchestration, and action*.
|
||||
|
||||
Horus expresses these ideas precisely:
|
||||
- Observation → validation & monitoring
|
||||
- Order → workflow coordination
|
||||
- Action → script execution
|
||||
- Sky → the domain that contains all processes beneath it
|
||||
|
||||
---
|
||||
|
||||
## 5. Visual & Conceptual Identity
|
||||
|
||||
**Themes:**
|
||||
- The Eye of Horus → observability, correctness, safety
|
||||
- Falcon → agile execution
|
||||
- Sky → the domain of computation
|
||||
- Light (sun/moon) → insight, clarity, cycle
|
||||
|
||||
**Palette concepts:**
|
||||
- Gold + deep blue
|
||||
- Light on dark (sun in sky)
|
||||
- Single-line geometric Eye (modernized)
|
||||
|
||||
The name offers both deep mythic roots and clean, modern branding potential.
|
||||
|
||||
---
|
||||
|
||||
## 6. Narrative Summary
|
||||
|
||||
**HORUS** is the execution sky:
|
||||
the domain where jobs arrive, gain form, and become actions.
|
||||
It brings clarity to chaos, structure to tasks, and order to distributed systems.
|
||||
|
||||
It is not just a name.
|
||||
It is the story of a system that sees clearly, acts decisively, and orchestrates wisely.
|
||||
|
||||
---
|
||||
186
docs/getting-started.md
Normal file
186
docs/getting-started.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Getting Started with Horus
|
||||
|
||||
Quick start guide to running your first Horus job.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Redis server running
|
||||
- Rust toolchain installed
|
||||
- Horus repository cloned
|
||||
|
||||
## Installation
|
||||
|
||||
### Build from Source
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://git.ourworld.tf/herocode/horus
|
||||
cd horus
|
||||
|
||||
# Build all components
|
||||
cargo build --release
|
||||
|
||||
# Binaries will be in target/release/
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Start Redis
|
||||
|
||||
```bash
|
||||
# Using Docker
|
||||
docker run -d -p 6379:6379 redis:latest
|
||||
|
||||
# Or install locally
|
||||
redis-server
|
||||
```
|
||||
|
||||
### 2. Start a Runner
|
||||
|
||||
```bash
|
||||
# Start Hero runner
|
||||
./target/release/herorunner my-runner
|
||||
|
||||
# Or SAL runner
|
||||
./target/release/runner_sal my-sal-runner
|
||||
|
||||
# Or Osiris runner
|
||||
./target/release/runner_osiris my-osiris-runner
|
||||
```
|
||||
|
||||
### 3. Start the Supervisor
|
||||
|
||||
```bash
|
||||
./target/release/supervisor --port 8080
|
||||
```
|
||||
|
||||
### 4. Submit a Job
|
||||
|
||||
Using the Supervisor client:
|
||||
|
||||
```rust
|
||||
use hero_supervisor_client::SupervisorClient;
|
||||
use hero_job::Job;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let client = SupervisorClient::new("http://localhost:8080")?;
|
||||
|
||||
let job = Job::new(
|
||||
"my-runner",
|
||||
"print('Hello from Horus!')".to_string(),
|
||||
);
|
||||
|
||||
let result = client.create_job(job).await?;
|
||||
println!("Job ID: {}", result.id);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## Example Workflows
|
||||
|
||||
### Simple Heroscript Execution
|
||||
|
||||
```bash
|
||||
# Job payload
|
||||
print("Hello World")
|
||||
!!git.list
|
||||
```
|
||||
|
||||
### SAL System Operation
|
||||
|
||||
```rhai
|
||||
// List files in directory
|
||||
let files = os.list_dir("/tmp");
|
||||
for file in files {
|
||||
print(file);
|
||||
}
|
||||
```
|
||||
|
||||
### Osiris Data Storage
|
||||
|
||||
```rhai
|
||||
// Store user data
|
||||
let users = osiris.model("users");
|
||||
let user = users.create(#{
|
||||
name: "Alice",
|
||||
email: "alice@example.com"
|
||||
});
|
||||
print(`Created user: ${user.id}`);
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ Coordinator │ (Optional: For workflows)
|
||||
└──────┬───────┘
|
||||
│
|
||||
┌──────▼───────┐
|
||||
│ Supervisor │ (Job dispatcher)
|
||||
└──────┬───────┘
|
||||
│
|
||||
│ Redis
|
||||
│
|
||||
┌──────▼───────┐
|
||||
│ Runners │ (Job executors)
|
||||
│ - Hero │
|
||||
│ - SAL │
|
||||
│ - Osiris │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Architecture Details](./architecture.md)
|
||||
- [Runner Documentation](./runner/overview.md)
|
||||
- [Supervisor API](./supervisor/overview.md)
|
||||
- [Coordinator Workflows](./coordinator/overview.md)
|
||||
- [Authentication](./supervisor/auth.md)
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Runner Not Receiving Jobs
|
||||
|
||||
1. Check Redis connection
|
||||
2. Verify runner ID matches job target
|
||||
3. Check supervisor logs
|
||||
|
||||
### Job Signature Verification Failed
|
||||
|
||||
1. Ensure job is properly signed
|
||||
2. Verify public key is registered
|
||||
3. Check signature format
|
||||
|
||||
### Timeout Errors
|
||||
|
||||
1. Increase job timeout value
|
||||
2. Check runner resource availability
|
||||
3. Optimize job payload
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# All tests
|
||||
cargo test
|
||||
|
||||
# Specific component
|
||||
cargo test -p hero-supervisor
|
||||
cargo test -p runner-hero
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
|
||||
```bash
|
||||
# Enable debug logging
|
||||
RUST_LOG=debug ./target/release/supervisor --port 8080
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
- Documentation: [docs.ourworld.tf/horus](https://docs.ourworld.tf/horus)
|
||||
- Repository: [git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
|
||||
- Issues: Report on the repository
|
||||
6
docs/glossary.md
Normal file
6
docs/glossary.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# Terminology
|
||||
|
||||
- Flow: A workflow that is executed by the coordinator.
|
||||
- Job: A unit of work that is executed by a runner.
|
||||
- Supervisor: A job dispatcher that routes jobs to the appropriate runners.
|
||||
- Runner: A job executor that runs the actual job steps.
|
||||
179
docs/job-format.md
Normal file
179
docs/job-format.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Job Format
|
||||
|
||||
Jobs are the fundamental unit of work in Horus.
|
||||
|
||||
## Structure
|
||||
|
||||
```rust
|
||||
pub struct Job {
|
||||
pub id: String, // Unique job identifier
|
||||
pub runner_id: String, // Target runner ID
|
||||
pub payload: String, // Job payload (script/command)
|
||||
pub timeout: Option<u64>, // Timeout in seconds
|
||||
pub env_vars: HashMap<String, String>, // Environment variables
|
||||
pub signatures: Vec<Signature>, // Cryptographic signatures
|
||||
pub created_at: i64, // Creation timestamp
|
||||
pub status: JobStatus, // Current status
|
||||
}
|
||||
```
|
||||
|
||||
## Job Status
|
||||
|
||||
```rust
|
||||
pub enum JobStatus {
|
||||
Pending, // Queued, not yet started
|
||||
Running, // Currently executing
|
||||
Completed, // Finished successfully
|
||||
Failed, // Execution failed
|
||||
Timeout, // Exceeded timeout
|
||||
Cancelled, // Manually cancelled
|
||||
}
|
||||
```
|
||||
|
||||
## Signature Format
|
||||
|
||||
```rust
|
||||
pub struct Signature {
|
||||
pub public_key: String, // Signer's public key
|
||||
pub signature: String, // Cryptographic signature
|
||||
pub algorithm: String, // Signature algorithm (e.g., "ed25519")
|
||||
}
|
||||
```
|
||||
|
||||
## Creating a Job
|
||||
|
||||
### Minimal Job
|
||||
|
||||
```rust
|
||||
use hero_job::Job;
|
||||
|
||||
let job = Job::new(
|
||||
"my-runner",
|
||||
"print('Hello World')".to_string(),
|
||||
);
|
||||
```
|
||||
|
||||
### With Timeout
|
||||
|
||||
```rust
|
||||
let job = Job::builder()
|
||||
.runner_id("my-runner")
|
||||
.payload("long_running_task()")
|
||||
.timeout(300) // 5 minutes
|
||||
.build();
|
||||
```
|
||||
|
||||
### With Environment Variables
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut env_vars = HashMap::new();
|
||||
env_vars.insert("API_KEY".to_string(), "secret".to_string());
|
||||
env_vars.insert("ENV".to_string(), "production".to_string());
|
||||
|
||||
let job = Job::builder()
|
||||
.runner_id("my-runner")
|
||||
.payload("deploy_app()")
|
||||
.env_vars(env_vars)
|
||||
.build();
|
||||
```
|
||||
|
||||
### With Signature
|
||||
|
||||
```rust
|
||||
use hero_job::{Job, Signature};
|
||||
|
||||
let job = Job::builder()
|
||||
.runner_id("my-runner")
|
||||
.payload("important_task()")
|
||||
.signature(Signature {
|
||||
public_key: "ed25519:abc123...".to_string(),
|
||||
signature: "sig:xyz789...".to_string(),
|
||||
algorithm: "ed25519".to_string(),
|
||||
})
|
||||
.build();
|
||||
```
|
||||
|
||||
## Payload Format
|
||||
|
||||
The payload format depends on the target runner:
|
||||
|
||||
### Hero Runner
|
||||
Heroscript content:
|
||||
```heroscript
|
||||
!!git.list
|
||||
print("Repositories listed")
|
||||
!!docker.ps
|
||||
```
|
||||
|
||||
### SAL Runner
|
||||
Rhai script with SAL modules:
|
||||
```rhai
|
||||
let files = os.list_dir("/tmp");
|
||||
for file in files {
|
||||
print(file);
|
||||
}
|
||||
```
|
||||
|
||||
### Osiris Runner
|
||||
Rhai script with Osiris database:
|
||||
```rhai
|
||||
let users = osiris.model("users");
|
||||
let user = users.create(#{
|
||||
name: "Alice",
|
||||
email: "alice@example.com"
|
||||
});
|
||||
```
|
||||
|
||||
## Job Result
|
||||
|
||||
```rust
|
||||
pub struct JobResult {
|
||||
pub job_id: String,
|
||||
pub status: JobStatus,
|
||||
pub output: String, // Stdout
|
||||
pub error: Option<String>, // Stderr or error message
|
||||
pub exit_code: Option<i32>,
|
||||
pub started_at: Option<i64>,
|
||||
pub completed_at: Option<i64>,
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Timeouts
|
||||
- Always set timeouts for jobs
|
||||
- Default: 60 seconds
|
||||
- Long-running jobs: Set appropriate timeout
|
||||
- Infinite jobs: Use separate monitoring
|
||||
|
||||
### Environment Variables
|
||||
- Don't store secrets in env vars in production
|
||||
- Use vault/secret management instead
|
||||
- Keep env vars minimal
|
||||
- Document required variables
|
||||
|
||||
### Signatures
|
||||
- Always sign jobs in production
|
||||
- Use strong algorithms (ed25519)
|
||||
- Rotate keys regularly
|
||||
- Store private keys securely
|
||||
|
||||
### Payloads
|
||||
- Keep payloads concise
|
||||
- Validate input data
|
||||
- Handle errors gracefully
|
||||
- Log important operations
|
||||
|
||||
## Validation
|
||||
|
||||
Jobs are validated before execution:
|
||||
|
||||
1. **Structure**: All required fields present
|
||||
2. **Signature**: Valid cryptographic signature
|
||||
3. **Runner**: Target runner exists and available
|
||||
4. **Payload**: Non-empty payload
|
||||
5. **Timeout**: Reasonable timeout value
|
||||
|
||||
Invalid jobs are rejected before execution.
|
||||
71
docs/runner/hero.md
Normal file
71
docs/runner/hero.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Hero Runner
|
||||
|
||||
Executes heroscripts using the Hero CLI tool.
|
||||
|
||||
## Overview
|
||||
|
||||
The Hero runner pipes job payloads directly to `hero run -s` via stdin, making it ideal for executing Hero automation tasks and heroscripts.
|
||||
|
||||
## Features
|
||||
|
||||
- **Heroscript Execution**: Direct stdin piping to `hero run -s`
|
||||
- **No Temp Files**: Secure execution without filesystem artifacts
|
||||
- **Environment Variables**: Full environment variable support
|
||||
- **Timeout Support**: Respects job timeout settings
|
||||
- **Signature Verification**: Cryptographic job verification
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start the runner
|
||||
herorunner my-hero-runner
|
||||
|
||||
# With custom Redis
|
||||
herorunner my-hero-runner --redis-url redis://custom:6379
|
||||
```
|
||||
|
||||
## Job Payload
|
||||
|
||||
The payload should contain the heroscript content:
|
||||
|
||||
```heroscript
|
||||
!!git.list
|
||||
print("Repositories listed")
|
||||
!!docker.ps
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Simple Print
|
||||
```heroscript
|
||||
print("Hello from heroscript!")
|
||||
```
|
||||
|
||||
### Hero Actions
|
||||
```heroscript
|
||||
!!git.list
|
||||
!!docker.start name:"myapp"
|
||||
```
|
||||
|
||||
### With Environment Variables
|
||||
```json
|
||||
{
|
||||
"payload": "print(env.MY_VAR)",
|
||||
"env_vars": {
|
||||
"MY_VAR": "Hello World"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- `hero` CLI must be installed and in PATH
|
||||
- Redis server accessible
|
||||
- Valid job signatures
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Hero CLI Not Found**: Returns error if `hero` command unavailable
|
||||
- **Timeout**: Kills process if timeout exceeded
|
||||
- **Non-zero Exit**: Returns error with hero CLI output
|
||||
- **Invalid Signature**: Rejects job before execution
|
||||
142
docs/runner/osiris.md
Normal file
142
docs/runner/osiris.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Osiris Runner
|
||||
|
||||
Database-backed runner for structured data storage and retrieval.
|
||||
|
||||
## Overview
|
||||
|
||||
The Osiris runner executes Rhai scripts with access to a model-based database system, enabling structured data operations and persistence.
|
||||
|
||||
## Features
|
||||
|
||||
- **Rhai Scripting**: Execute Rhai scripts with Osiris database access
|
||||
- **Model-Based Storage**: Define and use data models
|
||||
- **CRUD Operations**: Create, read, update, delete records
|
||||
- **Query Support**: Search and filter data
|
||||
- **Schema Validation**: Type-safe data operations
|
||||
- **Transaction Support**: Atomic database operations
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start the runner
|
||||
runner_osiris my-osiris-runner
|
||||
|
||||
# With custom Redis
|
||||
runner_osiris my-osiris-runner --redis-url redis://custom:6379
|
||||
```
|
||||
|
||||
## Job Payload
|
||||
|
||||
The payload should contain a Rhai script using Osiris operations:
|
||||
|
||||
```rhai
|
||||
// Example: Store data
|
||||
let model = osiris.model("users");
|
||||
let user = model.create(#{
|
||||
name: "Alice",
|
||||
email: "alice@example.com",
|
||||
age: 30
|
||||
});
|
||||
print(user.id);
|
||||
|
||||
// Example: Retrieve data
|
||||
let found = model.get(user.id);
|
||||
print(found.name);
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Create Model and Store Data
|
||||
```rhai
|
||||
// Define model
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Create record
|
||||
let post = posts.create(#{
|
||||
title: "Hello World",
|
||||
content: "First post",
|
||||
author: "Alice",
|
||||
published: true
|
||||
});
|
||||
|
||||
print(`Created post with ID: ${post.id}`);
|
||||
```
|
||||
|
||||
### Query Data
|
||||
```rhai
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Find by field
|
||||
let published = posts.find(#{
|
||||
published: true
|
||||
});
|
||||
|
||||
for post in published {
|
||||
print(post.title);
|
||||
}
|
||||
```
|
||||
|
||||
### Update Records
|
||||
```rhai
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Get record
|
||||
let post = posts.get("post-123");
|
||||
|
||||
// Update fields
|
||||
post.content = "Updated content";
|
||||
posts.update(post);
|
||||
```
|
||||
|
||||
### Delete Records
|
||||
```rhai
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Delete by ID
|
||||
posts.delete("post-123");
|
||||
```
|
||||
|
||||
### Transactions
|
||||
```rhai
|
||||
osiris.transaction(|| {
|
||||
let users = osiris.model("users");
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
let user = users.create(#{ name: "Bob" });
|
||||
let post = posts.create(#{
|
||||
title: "Bob's Post",
|
||||
author_id: user.id
|
||||
});
|
||||
|
||||
// Both operations commit together
|
||||
});
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
Models are defined dynamically through Rhai scripts:
|
||||
|
||||
```rhai
|
||||
let model = osiris.model("products");
|
||||
|
||||
// Model automatically handles:
|
||||
// - ID generation
|
||||
// - Timestamps (created_at, updated_at)
|
||||
// - Schema validation
|
||||
// - Indexing
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Redis server accessible
|
||||
- Osiris database configured
|
||||
- Valid job signatures
|
||||
- Sufficient storage for data operations
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Configuration Storage**: Store application configs
|
||||
- **User Data**: Manage user profiles and preferences
|
||||
- **Workflow State**: Persist workflow execution state
|
||||
- **Metrics & Logs**: Store structured logs and metrics
|
||||
- **Cache Management**: Persistent caching layer
|
||||
96
docs/runner/runners.md
Normal file
96
docs/runner/runners.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Runners Overview
|
||||
|
||||
Runners are the execution layer in the Horus architecture. They receive jobs from the Supervisor via Redis queues and execute the actual workload.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Supervisor → Redis Queue → Runner → Execute Job → Return Result
|
||||
```
|
||||
|
||||
## Available Runners
|
||||
|
||||
Horus provides three specialized runners:
|
||||
|
||||
### 1. **Hero Runner**
|
||||
Executes heroscripts using the Hero CLI ecosystem.
|
||||
|
||||
**Use Cases:**
|
||||
- Running Hero automation tasks
|
||||
- Executing heroscripts from job payloads
|
||||
- Integration with Hero CLI tools
|
||||
|
||||
**Binary:** `herorunner`
|
||||
|
||||
[→ Hero Runner Documentation](./hero.md)
|
||||
|
||||
### 2. **SAL Runner**
|
||||
System Abstraction Layer runner for system-level operations.
|
||||
|
||||
**Use Cases:**
|
||||
- OS operations (file, process, network)
|
||||
- Infrastructure management (Kubernetes, VMs)
|
||||
- Cloud provider operations (Hetzner)
|
||||
- Database operations (Redis, Postgres)
|
||||
|
||||
**Binary:** `runner_sal`
|
||||
|
||||
[→ SAL Runner Documentation](./sal.md)
|
||||
|
||||
### 3. **Osiris Runner**
|
||||
Database-backed runner for data storage and retrieval using Rhai scripts.
|
||||
|
||||
**Use Cases:**
|
||||
- Structured data storage
|
||||
- Model-based data operations
|
||||
- Rhai script execution with database access
|
||||
|
||||
**Binary:** `runner_osiris`
|
||||
|
||||
[→ Osiris Runner Documentation](./osiris.md)
|
||||
|
||||
## Common Features
|
||||
|
||||
All runners implement the `Runner` trait and provide:
|
||||
|
||||
- **Job Execution**: Process jobs from Redis queues
|
||||
- **Signature Verification**: Verify job signatures before execution
|
||||
- **Timeout Support**: Respect job timeout settings
|
||||
- **Environment Variables**: Pass environment variables to jobs
|
||||
- **Error Handling**: Comprehensive error reporting
|
||||
- **Logging**: Structured logging for debugging
|
||||
|
||||
## Runner Protocol
|
||||
|
||||
Runners communicate with the Supervisor using a Redis-based protocol:
|
||||
|
||||
1. **Job Queue**: Supervisor pushes jobs to `runner:{runner_id}:jobs`
|
||||
2. **Job Processing**: Runner pops job, validates signature, executes
|
||||
3. **Result Storage**: Runner stores result in `job:{job_id}:result`
|
||||
4. **Status Updates**: Runner updates job status throughout execution
|
||||
|
||||
## Starting a Runner
|
||||
|
||||
```bash
|
||||
# Hero Runner
|
||||
herorunner <runner_id> [--redis-url <url>]
|
||||
|
||||
# SAL Runner
|
||||
runner_sal <runner_id> [--redis-url <url>]
|
||||
|
||||
# Osiris Runner
|
||||
runner_osiris <runner_id> [--redis-url <url>]
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
All runners accept:
|
||||
- `runner_id`: Unique identifier for the runner (required)
|
||||
- `--redis-url`: Redis connection URL (default: `redis://localhost:6379`)
|
||||
|
||||
## Security
|
||||
|
||||
- Jobs must be cryptographically signed
|
||||
- Runners verify signatures before execution
|
||||
- Untrusted jobs are rejected
|
||||
- Environment variables should not contain sensitive data in production
|
||||
123
docs/runner/sal.md
Normal file
123
docs/runner/sal.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# SAL Runner
|
||||
|
||||
System Abstraction Layer runner for system-level operations.
|
||||
|
||||
## Overview
|
||||
|
||||
The SAL runner executes Rhai scripts with access to system abstraction modules for OS operations, infrastructure management, and cloud provider interactions.
|
||||
|
||||
## Features
|
||||
|
||||
- **Rhai Scripting**: Execute Rhai scripts with SAL modules
|
||||
- **System Operations**: File, process, and network management
|
||||
- **Infrastructure**: Kubernetes, VM, and container operations
|
||||
- **Cloud Providers**: Hetzner and other cloud integrations
|
||||
- **Database Access**: Redis and Postgres client operations
|
||||
- **Networking**: Mycelium and network configuration
|
||||
|
||||
## Available SAL Modules
|
||||
|
||||
### Core Modules
|
||||
- **sal-os**: Operating system operations
|
||||
- **sal-process**: Process management
|
||||
- **sal-text**: Text processing utilities
|
||||
- **sal-net**: Network operations
|
||||
|
||||
### Infrastructure
|
||||
- **sal-virt**: Virtualization management
|
||||
- **sal-kubernetes**: Kubernetes cluster operations
|
||||
- **sal-zinit-client**: Zinit process manager
|
||||
|
||||
### Storage & Data
|
||||
- **sal-redisclient**: Redis operations
|
||||
- **sal-postgresclient**: PostgreSQL operations
|
||||
- **sal-vault**: Secret management
|
||||
|
||||
### Networking
|
||||
- **sal-mycelium**: Mycelium network integration
|
||||
|
||||
### Cloud Providers
|
||||
- **sal-hetzner**: Hetzner cloud operations
|
||||
|
||||
### Version Control
|
||||
- **sal-git**: Git repository operations
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start the runner
|
||||
runner_sal my-sal-runner
|
||||
|
||||
# With custom Redis
|
||||
runner_sal my-sal-runner --redis-url redis://custom:6379
|
||||
```
|
||||
|
||||
## Job Payload
|
||||
|
||||
The payload should contain a Rhai script using SAL modules:
|
||||
|
||||
```rhai
|
||||
// Example: List files
|
||||
let files = os.list_dir("/tmp");
|
||||
print(files);
|
||||
|
||||
// Example: Process management
|
||||
let pid = process.spawn("ls", ["-la"]);
|
||||
let output = process.wait(pid);
|
||||
print(output);
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### File Operations
|
||||
```rhai
|
||||
// Read file
|
||||
let content = os.read_file("/path/to/file");
|
||||
print(content);
|
||||
|
||||
// Write file
|
||||
os.write_file("/path/to/output", "Hello World");
|
||||
```
|
||||
|
||||
### Kubernetes Operations
|
||||
```rhai
|
||||
// List pods
|
||||
let pods = k8s.list_pods("default");
|
||||
for pod in pods {
|
||||
print(pod.name);
|
||||
}
|
||||
```
|
||||
|
||||
### Redis Operations
|
||||
```rhai
|
||||
// Set value
|
||||
redis.set("key", "value");
|
||||
|
||||
// Get value
|
||||
let val = redis.get("key");
|
||||
print(val);
|
||||
```
|
||||
|
||||
### Git Operations
|
||||
```rhai
|
||||
// Clone repository
|
||||
git.clone("https://github.com/user/repo", "/tmp/repo");
|
||||
|
||||
// Get status
|
||||
let status = git.status("/tmp/repo");
|
||||
print(status);
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Redis server accessible
|
||||
- System permissions for requested operations
|
||||
- Valid job signatures
|
||||
- SAL modules available in runtime
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- SAL operations have system-level access
|
||||
- Jobs must be from trusted sources
|
||||
- Signature verification is mandatory
|
||||
- Limit runner permissions in production
|
||||
28
docs/supervisor/auth.md
Normal file
28
docs/supervisor/auth.md
Normal file
@@ -0,0 +1,28 @@
|
||||
## Supervisor Authentication
|
||||
|
||||
The supervisor has two authentication systems:
|
||||
|
||||
1. An authentication system based on scoped symmetric API keys.
|
||||
2. An authentication of the signatures of a job's canonical representation.
|
||||
|
||||
The first is used to control access to the supervisor API, the second is used to authenticate the signatories of a job, such that the runners can implement access control based on the signatories.
|
||||
|
||||
#### API Key Management
|
||||
|
||||
API keys are used to authenticate requests to the supervisor. They are created using the `auth.key.create` method and can be listed using the `key.list` method.
|
||||
|
||||
#### API Key Scopes
|
||||
|
||||
API keys have a scope that determines what actions they can perform. The following scopes are available:
|
||||
|
||||
- `admin`: Full access to all supervisor methods.
|
||||
- `registrar`: Access to methods related to job registration and management.
|
||||
- `user`: Access to methods related to job execution and management.
|
||||
|
||||
#### API Key Usage
|
||||
|
||||
API keys are passed as a header in the `Authorization` field of the request. The format is `Bearer <key>`.
|
||||
|
||||
#### API Key Rotation
|
||||
|
||||
API keys can be rotated using the `key.remove` method. This will invalidate the old key and create a new one.
|
||||
391
docs/supervisor/openrpc.json
Normal file
391
docs/supervisor/openrpc.json
Normal file
@@ -0,0 +1,391 @@
|
||||
{
|
||||
"openrpc": "1.3.2",
|
||||
"info": {
|
||||
"title": "Hero Supervisor OpenRPC API",
|
||||
"version": "1.0.0",
|
||||
"description": "OpenRPC API for managing Hero Supervisor runners and jobs. Job operations follow the convention: 'jobs.' for general operations and 'job.' for specific job operations."
|
||||
},
|
||||
"components": {
|
||||
"schemas": {
|
||||
"Job": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"id": { "type": "string" },
|
||||
"caller_id": { "type": "string" },
|
||||
"context_id": { "type": "string" },
|
||||
"payload": { "type": "string" },
|
||||
"runner": { "type": "string" },
|
||||
"executor": { "type": "string" },
|
||||
"timeout": { "type": "number" },
|
||||
"env_vars": { "type": "object" },
|
||||
"created_at": { "type": "string" },
|
||||
"updated_at": { "type": "string" }
|
||||
},
|
||||
"required": ["id", "caller_id", "context_id", "payload", "runner", "executor", "timeout", "env_vars", "created_at", "updated_at"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"methods": [
|
||||
{
|
||||
"name": "list_runners",
|
||||
"description": "List all registered runners",
|
||||
"params": [],
|
||||
"result": {
|
||||
"name": "runners",
|
||||
"schema": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" }
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "register_runner",
|
||||
"description": "Register a new runner to the supervisor with secret authentication",
|
||||
"params": [
|
||||
{
|
||||
"name": "params",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"secret": { "type": "string" },
|
||||
"name": { "type": "string" },
|
||||
"queue": { "type": "string" }
|
||||
},
|
||||
"required": ["secret", "name", "queue"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "jobs.create",
|
||||
"description": "Create a new job without queuing it to a runner",
|
||||
"params": [
|
||||
{
|
||||
"name": "params",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"secret": { "type": "string" },
|
||||
"job": {
|
||||
"$ref": "#/components/schemas/Job"
|
||||
}
|
||||
},
|
||||
"required": ["secret", "job"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "job_id",
|
||||
"schema": { "type": "string" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "jobs.list",
|
||||
"description": "List all jobs",
|
||||
"params": [],
|
||||
"result": {
|
||||
"name": "jobs",
|
||||
"schema": {
|
||||
"type": "array",
|
||||
"items": { "$ref": "#/components/schemas/Job" }
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "job.run",
|
||||
"description": "Run a job on the appropriate runner and return the result",
|
||||
"params": [
|
||||
{
|
||||
"name": "params",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"secret": { "type": "string" },
|
||||
"job": {
|
||||
"$ref": "#/components/schemas/Job"
|
||||
}
|
||||
},
|
||||
"required": ["secret", "job"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": {
|
||||
"oneOf": [
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"success": { "type": "string" }
|
||||
},
|
||||
"required": ["success"]
|
||||
},
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"error": { "type": "string" }
|
||||
},
|
||||
"required": ["error"]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "job.start",
|
||||
"description": "Start a previously created job by queuing it to its assigned runner",
|
||||
"params": [
|
||||
{
|
||||
"name": "params",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"secret": { "type": "string" },
|
||||
"job_id": { "type": "string" }
|
||||
},
|
||||
"required": ["secret", "job_id"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "job.status",
|
||||
"description": "Get the current status of a job",
|
||||
"params": [
|
||||
{
|
||||
"name": "job_id",
|
||||
"schema": { "type": "string" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "status",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"job_id": { "type": "string" },
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["created", "queued", "running", "completed", "failed", "timeout"]
|
||||
},
|
||||
"created_at": { "type": "string" },
|
||||
"started_at": { "type": ["string", "null"] },
|
||||
"completed_at": { "type": ["string", "null"] }
|
||||
},
|
||||
"required": ["job_id", "status", "created_at"]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "job.result",
|
||||
"description": "Get the result of a completed job (blocks until result is available)",
|
||||
"params": [
|
||||
{
|
||||
"name": "job_id",
|
||||
"schema": { "type": "string" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": {
|
||||
"oneOf": [
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"success": { "type": "string" }
|
||||
},
|
||||
"required": ["success"]
|
||||
},
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"error": { "type": "string" }
|
||||
},
|
||||
"required": ["error"]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "remove_runner",
|
||||
"description": "Remove a runner from the supervisor",
|
||||
"params": [
|
||||
{
|
||||
"name": "actor_id",
|
||||
"schema": { "type": "string" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "start_runner",
|
||||
"description": "Start a specific runner",
|
||||
"params": [
|
||||
{
|
||||
"name": "actor_id",
|
||||
"schema": { "type": "string" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "stop_runner",
|
||||
"description": "Stop a specific runner",
|
||||
"params": [
|
||||
{
|
||||
"name": "actor_id",
|
||||
"schema": { "type": "string" }
|
||||
},
|
||||
{
|
||||
"name": "force",
|
||||
"schema": { "type": "boolean" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "get_runner_status",
|
||||
"description": "Get the status of a specific runner",
|
||||
"params": [
|
||||
{
|
||||
"name": "actor_id",
|
||||
"schema": { "type": "string" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "status",
|
||||
"schema": { "type": "object" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "get_all_runner_status",
|
||||
"description": "Get status of all runners",
|
||||
"params": [],
|
||||
"result": {
|
||||
"name": "statuses",
|
||||
"schema": {
|
||||
"type": "array",
|
||||
"items": { "type": "object" }
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "start_all",
|
||||
"description": "Start all runners",
|
||||
"params": [],
|
||||
"result": {
|
||||
"name": "results",
|
||||
"schema": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "stop_all",
|
||||
"description": "Stop all runners",
|
||||
"params": [
|
||||
{
|
||||
"name": "force",
|
||||
"schema": { "type": "boolean" }
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "results",
|
||||
"schema": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "get_all_status",
|
||||
"description": "Get status of all runners (alternative format)",
|
||||
"params": [],
|
||||
"result": {
|
||||
"name": "statuses",
|
||||
"schema": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "job.stop",
|
||||
"description": "Stop a running job",
|
||||
"params": [
|
||||
{
|
||||
"name": "params",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"secret": { "type": "string" },
|
||||
"job_id": { "type": "string" }
|
||||
},
|
||||
"required": ["secret", "job_id"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "job.delete",
|
||||
"description": "Delete a job from the system",
|
||||
"params": [
|
||||
{
|
||||
"name": "params",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"secret": { "type": "string" },
|
||||
"job_id": { "type": "string" }
|
||||
},
|
||||
"required": ["secret", "job_id"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"result": {
|
||||
"name": "result",
|
||||
"schema": { "type": "null" }
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "rpc.discover",
|
||||
"description": "OpenRPC discovery method - returns the OpenRPC document describing this API",
|
||||
"params": [],
|
||||
"result": {
|
||||
"name": "openrpc_document",
|
||||
"schema": { "type": "object" }
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
88
docs/supervisor/supervisor.md
Normal file
88
docs/supervisor/supervisor.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# Supervisor Overview
|
||||
|
||||
The Supervisor is the job dispatcher layer in Horus. It receives jobs, verifies signatures, and routes them to appropriate runners.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Client → Supervisor → Redis Queue → Runner
|
||||
```
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### 1. **Job Admission**
|
||||
- Receive jobs via OpenRPC interface
|
||||
- Validate job structure and required fields
|
||||
- Verify cryptographic signatures
|
||||
|
||||
### 2. **Authentication & Authorization**
|
||||
- Verify job signatures using public keys
|
||||
- Ensure jobs are from authorized sources
|
||||
- Reject unsigned or invalid jobs
|
||||
|
||||
### 3. **Job Routing**
|
||||
- Route jobs to appropriate runner queues
|
||||
- Maintain runner registry
|
||||
- Load balance across available runners
|
||||
|
||||
### 4. **Job Management**
|
||||
- Track job status and lifecycle
|
||||
- Provide job query and listing APIs
|
||||
- Store job results and logs
|
||||
|
||||
### 5. **Runner Management**
|
||||
- Register and track available runners
|
||||
- Monitor runner health and availability
|
||||
- Handle runner disconnections
|
||||
|
||||
## OpenRPC Interface
|
||||
|
||||
The Supervisor exposes an OpenRPC API for job management:
|
||||
|
||||
### Job Operations
|
||||
- `create_job`: Submit a new job
|
||||
- `get_job`: Retrieve job details
|
||||
- `list_jobs`: List all jobs
|
||||
- `delete_job`: Remove a job
|
||||
- `get_job_logs`: Retrieve job execution logs
|
||||
|
||||
### Runner Operations
|
||||
- `register_runner`: Register a new runner
|
||||
- `list_runners`: List available runners
|
||||
- `get_runner_status`: Check runner health
|
||||
|
||||
## Job Lifecycle
|
||||
|
||||
1. **Submission**: Client submits job via OpenRPC
|
||||
2. **Validation**: Supervisor validates structure and signature
|
||||
3. **Queueing**: Job pushed to runner's Redis queue
|
||||
4. **Execution**: Runner processes job
|
||||
5. **Completion**: Result stored in Redis
|
||||
6. **Retrieval**: Client retrieves result via OpenRPC
|
||||
|
||||
## Transport Options
|
||||
|
||||
The Supervisor supports multiple transport layers:
|
||||
|
||||
- **HTTP**: Standard HTTP/HTTPS transport
|
||||
- **Mycelium**: Peer-to-peer encrypted transport
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# Start supervisor
|
||||
supervisor --port 8080 --redis-url redis://localhost:6379
|
||||
|
||||
# With Mycelium
|
||||
supervisor --port 8080 --mycelium --redis-url redis://localhost:6379
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
- All jobs must be cryptographically signed
|
||||
- Signatures verified before job admission
|
||||
- Public key infrastructure for identity
|
||||
- Optional TLS for HTTP transport
|
||||
- End-to-end encryption via Mycelium
|
||||
|
||||
[→ Authentication Documentation](./auth.md)
|
||||
Reference in New Issue
Block a user