add some documentation for blue book
This commit is contained in:
0
docs/.collection
Normal file
0
docs/.collection
Normal file
67
docs/README.md
Normal file
67
docs/README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Horus Documentation
|
||||
|
||||
**Hierarchical Orchestration Runtime for Universal Scripts**
|
||||
|
||||
Horus is a distributed job execution system with three layers: Coordinator, Supervisor, and Runner.
|
||||
|
||||
## Quick Links
|
||||
|
||||
- **[Getting Started](./getting-started.md)** - Install and run your first job
|
||||
- **[Architecture](./architecture.md)** - System design and components
|
||||
- **[Etymology](./ethymology.md)** - The meaning behind the name
|
||||
|
||||
## Components
|
||||
|
||||
### Coordinator
|
||||
Workflow orchestration engine for DAG-based execution.
|
||||
|
||||
- [Overview](./coordinator/overview.md)
|
||||
|
||||
### Supervisor
|
||||
Job dispatcher with authentication and routing.
|
||||
|
||||
- [Overview](./supervisor/overview.md)
|
||||
- [Authentication](./supervisor/auth.md)
|
||||
- [OpenRPC API](./supervisor/openrpc.json)
|
||||
|
||||
### Runners
|
||||
Job executors for different workload types.
|
||||
|
||||
- [Runner Overview](./runner/overview.md)
|
||||
- [Hero Runner](./runner/hero.md) - Heroscript execution
|
||||
- [SAL Runner](./runner/sal.md) - System operations
|
||||
- [Osiris Runner](./runner/osiris.md) - Database operations
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Jobs
|
||||
Units of work executed by runners. Each job contains:
|
||||
- Target runner ID
|
||||
- Payload (script/command)
|
||||
- Cryptographic signature
|
||||
- Optional timeout and environment variables
|
||||
|
||||
### Workflows
|
||||
Multi-step DAGs executed by the Coordinator. Steps can:
|
||||
- Run in parallel or sequence
|
||||
- Pass data between steps
|
||||
- Target different runners
|
||||
- Handle errors and retries
|
||||
|
||||
### Signatures
|
||||
All jobs must be cryptographically signed:
|
||||
- Ensures job authenticity
|
||||
- Prevents tampering
|
||||
- Enables authorization
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Automation**: Execute system tasks and scripts
|
||||
- **Data Pipelines**: Multi-step ETL workflows
|
||||
- **CI/CD**: Build, test, and deployment pipelines
|
||||
- **Infrastructure**: Manage cloud resources and containers
|
||||
- **Integration**: Connect systems via scripted workflows
|
||||
|
||||
## Repository
|
||||
|
||||
[git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
|
||||
@@ -1,15 +1,185 @@
|
||||
# Architecture
|
||||
|
||||
The Horus architecture consists of three layers:
|
||||
Horus is a hierarchical orchestration runtime with three layers: Coordinator, Supervisor, and Runner.
|
||||
|
||||
1. Coordinator: A workflow engine that executes DAG-based flows by sending ready job steps to the targeted supervisors.
|
||||
2. Supervisor: A job dispatcher that routes jobs to the appropriate runners.
|
||||
3. Runner: A job executor that runs the actual job steps.
|
||||
## Overview
|
||||
|
||||
## Networking
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Coordinator │
|
||||
│ (Workflow Engine - DAG Execution) │
|
||||
│ │
|
||||
│ • Parses workflow definitions │
|
||||
│ • Resolves dependencies │
|
||||
│ • Dispatches ready steps │
|
||||
│ • Tracks workflow state │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ OpenRPC (HTTP/Mycelium)
|
||||
│
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Supervisor │
|
||||
│ (Job Dispatcher & Authenticator) │
|
||||
│ │
|
||||
│ • Verifies job signatures │
|
||||
│ • Routes jobs to runners │
|
||||
│ • Manages runner registry │
|
||||
│ • Tracks job lifecycle │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ Redis Queue Protocol
|
||||
│
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Runners │
|
||||
│ (Job Executors) │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Hero │ │ SAL │ │ Osiris │ │
|
||||
│ │ Runner │ │ Runner │ │ Runner │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- The user / client talks to the coordinator over an OpenRPC interface, using either regular HTTP transport or Mycelium.
|
||||
- The coordinator talks to the supervisor over an OpenRPC interface, using either regular HTTP transport or Mycelium.
|
||||
- The supervisor talks to runners over a Redis based job execution protocol.
|
||||
## Layers
|
||||
|
||||
### 1. Coordinator (Optional)
|
||||
**Purpose:** Workflow orchestration and DAG execution
|
||||
|
||||
**Responsibilities:**
|
||||
- Parse and validate workflow definitions
|
||||
- Execute DAG-based flows
|
||||
- Manage step dependencies
|
||||
- Route jobs to appropriate supervisors
|
||||
- Handle multi-step workflows
|
||||
|
||||
**Use When:**
|
||||
- You need multi-step workflows
|
||||
- Jobs have dependencies
|
||||
- Parallel execution is required
|
||||
- Complex data pipelines
|
||||
|
||||
[→ Coordinator Documentation](./coordinator/overview.md)
|
||||
|
||||
### 2. Supervisor (Required)
|
||||
**Purpose:** Job admission, authentication, and routing
|
||||
|
||||
**Responsibilities:**
|
||||
- Receive jobs via OpenRPC interface
|
||||
- Verify cryptographic signatures
|
||||
- Route jobs to appropriate runners
|
||||
- Manage runner registry
|
||||
- Track job status and results
|
||||
|
||||
**Features:**
|
||||
- OpenRPC API for job management
|
||||
- HTTP and Mycelium transport
|
||||
- Signature-based authentication
|
||||
- Runner health monitoring
|
||||
|
||||
[→ Supervisor Documentation](./supervisor/overview.md)
|
||||
|
||||
### 3. Runners (Required)
|
||||
**Purpose:** Execute actual job workloads
|
||||
|
||||
**Available Runners:**
|
||||
- **Hero Runner**: Executes heroscripts via Hero CLI
|
||||
- **SAL Runner**: System operations (OS, K8s, cloud, etc.)
|
||||
- **Osiris Runner**: Database operations with Rhai scripts
|
||||
|
||||
**Common Features:**
|
||||
- Redis queue-based job polling
|
||||
- Signature verification
|
||||
- Timeout support
|
||||
- Environment variable handling
|
||||
|
||||
[→ Runner Documentation](./runner/overview.md)
|
||||
|
||||
## Communication Protocols
|
||||
|
||||
### Client ↔ Coordinator
|
||||
- **Protocol:** OpenRPC
|
||||
- **Transport:** HTTP or Mycelium
|
||||
- **Operations:** Submit workflow, check status, retrieve results
|
||||
|
||||
### Coordinator ↔ Supervisor
|
||||
- **Protocol:** OpenRPC
|
||||
- **Transport:** HTTP or Mycelium
|
||||
- **Operations:** Create job, get status, retrieve logs
|
||||
|
||||
### Supervisor ↔ Runner
|
||||
- **Protocol:** Redis Queue
|
||||
- **Transport:** Redis pub/sub and lists
|
||||
- **Operations:** Push job, poll queue, store result
|
||||
|
||||
## Job Flow
|
||||
|
||||
### Simple Job (No Coordinator)
|
||||
```
|
||||
1. Client → Supervisor: create_job()
|
||||
2. Supervisor: Verify signature
|
||||
3. Supervisor → Redis: Push to runner queue
|
||||
4. Runner ← Redis: Pop job
|
||||
5. Runner: Execute job
|
||||
6. Runner → Redis: Store result
|
||||
7. Client ← Supervisor: get_job_result()
|
||||
```
|
||||
|
||||
### Workflow (With Coordinator)
|
||||
```
|
||||
1. Client → Coordinator: submit_workflow()
|
||||
2. Coordinator: Parse DAG
|
||||
3. Coordinator: Identify ready steps
|
||||
4. Coordinator → Supervisor: create_job() for each ready step
|
||||
5. Supervisor → Runner: Route via Redis
|
||||
6. Runner: Execute and return result
|
||||
7. Coordinator: Update workflow state
|
||||
8. Coordinator: Dispatch next ready steps
|
||||
9. Repeat until workflow complete
|
||||
```
|
||||
|
||||
## Security Model
|
||||
|
||||
### Authentication
|
||||
- Jobs must be cryptographically signed
|
||||
- Signatures verified at Supervisor layer
|
||||
- Public key infrastructure for identity
|
||||
|
||||
### Authorization
|
||||
- Runners only execute signed jobs
|
||||
- Signature verification before execution
|
||||
- Untrusted jobs rejected
|
||||
|
||||
### Transport Security
|
||||
- Optional TLS for HTTP transport
|
||||
- End-to-end encryption via Mycelium
|
||||
- No plaintext credentials
|
||||
|
||||
[→ Authentication Details](./supervisor/auth.md)
|
||||
|
||||
## Deployment Patterns
|
||||
|
||||
### Minimal Setup
|
||||
```
|
||||
Redis + Supervisor + Runner(s)
|
||||
```
|
||||
Single machine, simple job execution.
|
||||
|
||||
### Distributed Setup
|
||||
```
|
||||
Redis Cluster + Multiple Supervisors + Runner Pool
|
||||
```
|
||||
High availability, load balancing.
|
||||
|
||||
### Full Orchestration
|
||||
```
|
||||
Coordinator + Multiple Supervisors + Runner Pool
|
||||
```
|
||||
Complex workflows, multi-step pipelines.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Hierarchical**: Clear separation of concerns across layers
|
||||
2. **Secure**: Signature-based authentication throughout
|
||||
3. **Scalable**: Horizontal scaling at each layer
|
||||
4. **Observable**: Comprehensive logging and status tracking
|
||||
5. **Flexible**: Multiple runners for different workload types
|
||||
|
||||
|
||||
|
||||
145
docs/coordinator/overview.md
Normal file
145
docs/coordinator/overview.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Coordinator Overview
|
||||
|
||||
The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Client → Coordinator → Supervisor(s) → Runner(s)
|
||||
```
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### 1. **Workflow Management**
|
||||
- Parse and validate DAG workflow definitions
|
||||
- Track workflow execution state
|
||||
- Manage step dependencies
|
||||
|
||||
### 2. **Job Orchestration**
|
||||
- Determine which steps are ready to execute
|
||||
- Dispatch jobs to appropriate supervisors
|
||||
- Handle step failures and retries
|
||||
|
||||
### 3. **Dependency Resolution**
|
||||
- Track step completion
|
||||
- Resolve data dependencies between steps
|
||||
- Pass outputs from completed steps to dependent steps
|
||||
|
||||
### 4. **Multi-Supervisor Coordination**
|
||||
- Route jobs to specific supervisors
|
||||
- Handle supervisor failures
|
||||
- Load balance across supervisors
|
||||
|
||||
## Workflow Definition
|
||||
|
||||
Workflows are defined as Directed Acyclic Graphs (DAGs):
|
||||
|
||||
```yaml
|
||||
workflow:
|
||||
name: "data-pipeline"
|
||||
steps:
|
||||
- id: "fetch"
|
||||
runner: "hero"
|
||||
payload: "!!http.get url:'https://api.example.com/data'"
|
||||
|
||||
- id: "process"
|
||||
runner: "sal"
|
||||
depends_on: ["fetch"]
|
||||
payload: |
|
||||
let data = input.fetch;
|
||||
let processed = process_data(data);
|
||||
processed
|
||||
|
||||
- id: "store"
|
||||
runner: "osiris"
|
||||
depends_on: ["process"]
|
||||
payload: |
|
||||
let model = osiris.model("results");
|
||||
model.create(input.process);
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### DAG Execution
|
||||
- Parallel execution of independent steps
|
||||
- Sequential execution of dependent steps
|
||||
- Automatic dependency resolution
|
||||
|
||||
### Error Handling
|
||||
- Step-level retry policies
|
||||
- Workflow-level error handlers
|
||||
- Partial workflow recovery
|
||||
|
||||
### Data Flow
|
||||
- Pass outputs between steps
|
||||
- Transform data between steps
|
||||
- Aggregate results from parallel steps
|
||||
|
||||
### Monitoring
|
||||
- Real-time workflow status
|
||||
- Step-level progress tracking
|
||||
- Execution metrics and logs
|
||||
|
||||
## Workflow Lifecycle
|
||||
|
||||
1. **Submission**: Client submits workflow definition
|
||||
2. **Validation**: Coordinator validates DAG structure
|
||||
3. **Scheduling**: Determine ready steps (no pending dependencies)
|
||||
4. **Dispatch**: Send jobs to supervisors
|
||||
5. **Tracking**: Monitor step completion
|
||||
6. **Progression**: Execute next ready steps
|
||||
7. **Completion**: Workflow finishes when all steps complete
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Data Pipelines
|
||||
```yaml
|
||||
Extract → Transform → Load
|
||||
```
|
||||
|
||||
### CI/CD Workflows
|
||||
```yaml
|
||||
Build → Test → Deploy
|
||||
```
|
||||
|
||||
### Multi-Stage Processing
|
||||
```yaml
|
||||
Fetch Data → Process → Validate → Store → Notify
|
||||
```
|
||||
|
||||
### Parallel Execution
|
||||
```yaml
|
||||
┌─ Task A ─┐
|
||||
Start ──┼─ Task B ─┼── Aggregate → Finish
|
||||
└─ Task C ─┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# Start coordinator
|
||||
coordinator --port 9090 --redis-url redis://localhost:6379
|
||||
|
||||
# With multiple supervisors
|
||||
coordinator --port 9090 \
|
||||
--supervisor http://supervisor1:8080 \
|
||||
--supervisor http://supervisor2:8080
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
The Coordinator exposes an OpenRPC API:
|
||||
|
||||
- `submit_workflow`: Submit a new workflow
|
||||
- `get_workflow_status`: Check workflow progress
|
||||
- `list_workflows`: List all workflows
|
||||
- `cancel_workflow`: Stop a running workflow
|
||||
- `get_workflow_logs`: Retrieve execution logs
|
||||
|
||||
## Advantages
|
||||
|
||||
- **Declarative**: Define what to do, not how
|
||||
- **Scalable**: Parallel execution across multiple supervisors
|
||||
- **Resilient**: Automatic retry and error handling
|
||||
- **Observable**: Real-time status and logging
|
||||
- **Composable**: Reuse workflows as steps in larger workflows
|
||||
186
docs/getting-started.md
Normal file
186
docs/getting-started.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Getting Started with Horus
|
||||
|
||||
Quick start guide to running your first Horus job.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Redis server running
|
||||
- Rust toolchain installed
|
||||
- Horus repository cloned
|
||||
|
||||
## Installation
|
||||
|
||||
### Build from Source
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://git.ourworld.tf/herocode/horus
|
||||
cd horus
|
||||
|
||||
# Build all components
|
||||
cargo build --release
|
||||
|
||||
# Binaries will be in target/release/
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Start Redis
|
||||
|
||||
```bash
|
||||
# Using Docker
|
||||
docker run -d -p 6379:6379 redis:latest
|
||||
|
||||
# Or install locally
|
||||
redis-server
|
||||
```
|
||||
|
||||
### 2. Start a Runner
|
||||
|
||||
```bash
|
||||
# Start Hero runner
|
||||
./target/release/herorunner my-runner
|
||||
|
||||
# Or SAL runner
|
||||
./target/release/runner_sal my-sal-runner
|
||||
|
||||
# Or Osiris runner
|
||||
./target/release/runner_osiris my-osiris-runner
|
||||
```
|
||||
|
||||
### 3. Start the Supervisor
|
||||
|
||||
```bash
|
||||
./target/release/supervisor --port 8080
|
||||
```
|
||||
|
||||
### 4. Submit a Job
|
||||
|
||||
Using the Supervisor client:
|
||||
|
||||
```rust
|
||||
use hero_supervisor_client::SupervisorClient;
|
||||
use hero_job::Job;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let client = SupervisorClient::new("http://localhost:8080")?;
|
||||
|
||||
let job = Job::new(
|
||||
"my-runner",
|
||||
"print('Hello from Horus!')".to_string(),
|
||||
);
|
||||
|
||||
let result = client.create_job(job).await?;
|
||||
println!("Job ID: {}", result.id);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## Example Workflows
|
||||
|
||||
### Simple Heroscript Execution
|
||||
|
||||
```bash
|
||||
# Job payload
|
||||
print("Hello World")
|
||||
!!git.list
|
||||
```
|
||||
|
||||
### SAL System Operation
|
||||
|
||||
```rhai
|
||||
// List files in directory
|
||||
let files = os.list_dir("/tmp");
|
||||
for file in files {
|
||||
print(file);
|
||||
}
|
||||
```
|
||||
|
||||
### Osiris Data Storage
|
||||
|
||||
```rhai
|
||||
// Store user data
|
||||
let users = osiris.model("users");
|
||||
let user = users.create(#{
|
||||
name: "Alice",
|
||||
email: "alice@example.com"
|
||||
});
|
||||
print(`Created user: ${user.id}`);
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ Coordinator │ (Optional: For workflows)
|
||||
└──────┬───────┘
|
||||
│
|
||||
┌──────▼───────┐
|
||||
│ Supervisor │ (Job dispatcher)
|
||||
└──────┬───────┘
|
||||
│
|
||||
│ Redis
|
||||
│
|
||||
┌──────▼───────┐
|
||||
│ Runners │ (Job executors)
|
||||
│ - Hero │
|
||||
│ - SAL │
|
||||
│ - Osiris │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Architecture Details](./architecture.md)
|
||||
- [Runner Documentation](./runner/overview.md)
|
||||
- [Supervisor API](./supervisor/overview.md)
|
||||
- [Coordinator Workflows](./coordinator/overview.md)
|
||||
- [Authentication](./supervisor/auth.md)
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Runner Not Receiving Jobs
|
||||
|
||||
1. Check Redis connection
|
||||
2. Verify runner ID matches job target
|
||||
3. Check supervisor logs
|
||||
|
||||
### Job Signature Verification Failed
|
||||
|
||||
1. Ensure job is properly signed
|
||||
2. Verify public key is registered
|
||||
3. Check signature format
|
||||
|
||||
### Timeout Errors
|
||||
|
||||
1. Increase job timeout value
|
||||
2. Check runner resource availability
|
||||
3. Optimize job payload
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# All tests
|
||||
cargo test
|
||||
|
||||
# Specific component
|
||||
cargo test -p hero-supervisor
|
||||
cargo test -p runner-hero
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
|
||||
```bash
|
||||
# Enable debug logging
|
||||
RUST_LOG=debug ./target/release/supervisor --port 8080
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
- Documentation: [docs.ourworld.tf/horus](https://docs.ourworld.tf/horus)
|
||||
- Repository: [git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus)
|
||||
- Issues: Report on the repository
|
||||
179
docs/job-format.md
Normal file
179
docs/job-format.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Job Format
|
||||
|
||||
Jobs are the fundamental unit of work in Horus.
|
||||
|
||||
## Structure
|
||||
|
||||
```rust
|
||||
pub struct Job {
|
||||
pub id: String, // Unique job identifier
|
||||
pub runner_id: String, // Target runner ID
|
||||
pub payload: String, // Job payload (script/command)
|
||||
pub timeout: Option<u64>, // Timeout in seconds
|
||||
pub env_vars: HashMap<String, String>, // Environment variables
|
||||
pub signatures: Vec<Signature>, // Cryptographic signatures
|
||||
pub created_at: i64, // Creation timestamp
|
||||
pub status: JobStatus, // Current status
|
||||
}
|
||||
```
|
||||
|
||||
## Job Status
|
||||
|
||||
```rust
|
||||
pub enum JobStatus {
|
||||
Pending, // Queued, not yet started
|
||||
Running, // Currently executing
|
||||
Completed, // Finished successfully
|
||||
Failed, // Execution failed
|
||||
Timeout, // Exceeded timeout
|
||||
Cancelled, // Manually cancelled
|
||||
}
|
||||
```
|
||||
|
||||
## Signature Format
|
||||
|
||||
```rust
|
||||
pub struct Signature {
|
||||
pub public_key: String, // Signer's public key
|
||||
pub signature: String, // Cryptographic signature
|
||||
pub algorithm: String, // Signature algorithm (e.g., "ed25519")
|
||||
}
|
||||
```
|
||||
|
||||
## Creating a Job
|
||||
|
||||
### Minimal Job
|
||||
|
||||
```rust
|
||||
use hero_job::Job;
|
||||
|
||||
let job = Job::new(
|
||||
"my-runner",
|
||||
"print('Hello World')".to_string(),
|
||||
);
|
||||
```
|
||||
|
||||
### With Timeout
|
||||
|
||||
```rust
|
||||
let job = Job::builder()
|
||||
.runner_id("my-runner")
|
||||
.payload("long_running_task()")
|
||||
.timeout(300) // 5 minutes
|
||||
.build();
|
||||
```
|
||||
|
||||
### With Environment Variables
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut env_vars = HashMap::new();
|
||||
env_vars.insert("API_KEY".to_string(), "secret".to_string());
|
||||
env_vars.insert("ENV".to_string(), "production".to_string());
|
||||
|
||||
let job = Job::builder()
|
||||
.runner_id("my-runner")
|
||||
.payload("deploy_app()")
|
||||
.env_vars(env_vars)
|
||||
.build();
|
||||
```
|
||||
|
||||
### With Signature
|
||||
|
||||
```rust
|
||||
use hero_job::{Job, Signature};
|
||||
|
||||
let job = Job::builder()
|
||||
.runner_id("my-runner")
|
||||
.payload("important_task()")
|
||||
.signature(Signature {
|
||||
public_key: "ed25519:abc123...".to_string(),
|
||||
signature: "sig:xyz789...".to_string(),
|
||||
algorithm: "ed25519".to_string(),
|
||||
})
|
||||
.build();
|
||||
```
|
||||
|
||||
## Payload Format
|
||||
|
||||
The payload format depends on the target runner:
|
||||
|
||||
### Hero Runner
|
||||
Heroscript content:
|
||||
```heroscript
|
||||
!!git.list
|
||||
print("Repositories listed")
|
||||
!!docker.ps
|
||||
```
|
||||
|
||||
### SAL Runner
|
||||
Rhai script with SAL modules:
|
||||
```rhai
|
||||
let files = os.list_dir("/tmp");
|
||||
for file in files {
|
||||
print(file);
|
||||
}
|
||||
```
|
||||
|
||||
### Osiris Runner
|
||||
Rhai script with Osiris database:
|
||||
```rhai
|
||||
let users = osiris.model("users");
|
||||
let user = users.create(#{
|
||||
name: "Alice",
|
||||
email: "alice@example.com"
|
||||
});
|
||||
```
|
||||
|
||||
## Job Result
|
||||
|
||||
```rust
|
||||
pub struct JobResult {
|
||||
pub job_id: String,
|
||||
pub status: JobStatus,
|
||||
pub output: String, // Stdout
|
||||
pub error: Option<String>, // Stderr or error message
|
||||
pub exit_code: Option<i32>,
|
||||
pub started_at: Option<i64>,
|
||||
pub completed_at: Option<i64>,
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Timeouts
|
||||
- Always set timeouts for jobs
|
||||
- Default: 60 seconds
|
||||
- Long-running jobs: Set appropriate timeout
|
||||
- Infinite jobs: Use separate monitoring
|
||||
|
||||
### Environment Variables
|
||||
- Don't store secrets in env vars in production
|
||||
- Use vault/secret management instead
|
||||
- Keep env vars minimal
|
||||
- Document required variables
|
||||
|
||||
### Signatures
|
||||
- Always sign jobs in production
|
||||
- Use strong algorithms (ed25519)
|
||||
- Rotate keys regularly
|
||||
- Store private keys securely
|
||||
|
||||
### Payloads
|
||||
- Keep payloads concise
|
||||
- Validate input data
|
||||
- Handle errors gracefully
|
||||
- Log important operations
|
||||
|
||||
## Validation
|
||||
|
||||
Jobs are validated before execution:
|
||||
|
||||
1. **Structure**: All required fields present
|
||||
2. **Signature**: Valid cryptographic signature
|
||||
3. **Runner**: Target runner exists and available
|
||||
4. **Payload**: Non-empty payload
|
||||
5. **Timeout**: Reasonable timeout value
|
||||
|
||||
Invalid jobs are rejected before execution.
|
||||
71
docs/runner/hero.md
Normal file
71
docs/runner/hero.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Hero Runner
|
||||
|
||||
Executes heroscripts using the Hero CLI tool.
|
||||
|
||||
## Overview
|
||||
|
||||
The Hero runner pipes job payloads directly to `hero run -s` via stdin, making it ideal for executing Hero automation tasks and heroscripts.
|
||||
|
||||
## Features
|
||||
|
||||
- **Heroscript Execution**: Direct stdin piping to `hero run -s`
|
||||
- **No Temp Files**: Secure execution without filesystem artifacts
|
||||
- **Environment Variables**: Full environment variable support
|
||||
- **Timeout Support**: Respects job timeout settings
|
||||
- **Signature Verification**: Cryptographic job verification
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start the runner
|
||||
herorunner my-hero-runner
|
||||
|
||||
# With custom Redis
|
||||
herorunner my-hero-runner --redis-url redis://custom:6379
|
||||
```
|
||||
|
||||
## Job Payload
|
||||
|
||||
The payload should contain the heroscript content:
|
||||
|
||||
```heroscript
|
||||
!!git.list
|
||||
print("Repositories listed")
|
||||
!!docker.ps
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Simple Print
|
||||
```heroscript
|
||||
print("Hello from heroscript!")
|
||||
```
|
||||
|
||||
### Hero Actions
|
||||
```heroscript
|
||||
!!git.list
|
||||
!!docker.start name:"myapp"
|
||||
```
|
||||
|
||||
### With Environment Variables
|
||||
```json
|
||||
{
|
||||
"payload": "print(env.MY_VAR)",
|
||||
"env_vars": {
|
||||
"MY_VAR": "Hello World"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- `hero` CLI must be installed and in PATH
|
||||
- Redis server accessible
|
||||
- Valid job signatures
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Hero CLI Not Found**: Returns error if `hero` command unavailable
|
||||
- **Timeout**: Kills process if timeout exceeded
|
||||
- **Non-zero Exit**: Returns error with hero CLI output
|
||||
- **Invalid Signature**: Rejects job before execution
|
||||
142
docs/runner/osiris.md
Normal file
142
docs/runner/osiris.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Osiris Runner
|
||||
|
||||
Database-backed runner for structured data storage and retrieval.
|
||||
|
||||
## Overview
|
||||
|
||||
The Osiris runner executes Rhai scripts with access to a model-based database system, enabling structured data operations and persistence.
|
||||
|
||||
## Features
|
||||
|
||||
- **Rhai Scripting**: Execute Rhai scripts with Osiris database access
|
||||
- **Model-Based Storage**: Define and use data models
|
||||
- **CRUD Operations**: Create, read, update, delete records
|
||||
- **Query Support**: Search and filter data
|
||||
- **Schema Validation**: Type-safe data operations
|
||||
- **Transaction Support**: Atomic database operations
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start the runner
|
||||
runner_osiris my-osiris-runner
|
||||
|
||||
# With custom Redis
|
||||
runner_osiris my-osiris-runner --redis-url redis://custom:6379
|
||||
```
|
||||
|
||||
## Job Payload
|
||||
|
||||
The payload should contain a Rhai script using Osiris operations:
|
||||
|
||||
```rhai
|
||||
// Example: Store data
|
||||
let model = osiris.model("users");
|
||||
let user = model.create(#{
|
||||
name: "Alice",
|
||||
email: "alice@example.com",
|
||||
age: 30
|
||||
});
|
||||
print(user.id);
|
||||
|
||||
// Example: Retrieve data
|
||||
let found = model.get(user.id);
|
||||
print(found.name);
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Create Model and Store Data
|
||||
```rhai
|
||||
// Define model
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Create record
|
||||
let post = posts.create(#{
|
||||
title: "Hello World",
|
||||
content: "First post",
|
||||
author: "Alice",
|
||||
published: true
|
||||
});
|
||||
|
||||
print(`Created post with ID: ${post.id}`);
|
||||
```
|
||||
|
||||
### Query Data
|
||||
```rhai
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Find by field
|
||||
let published = posts.find(#{
|
||||
published: true
|
||||
});
|
||||
|
||||
for post in published {
|
||||
print(post.title);
|
||||
}
|
||||
```
|
||||
|
||||
### Update Records
|
||||
```rhai
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Get record
|
||||
let post = posts.get("post-123");
|
||||
|
||||
// Update fields
|
||||
post.content = "Updated content";
|
||||
posts.update(post);
|
||||
```
|
||||
|
||||
### Delete Records
|
||||
```rhai
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
// Delete by ID
|
||||
posts.delete("post-123");
|
||||
```
|
||||
|
||||
### Transactions
|
||||
```rhai
|
||||
osiris.transaction(|| {
|
||||
let users = osiris.model("users");
|
||||
let posts = osiris.model("posts");
|
||||
|
||||
let user = users.create(#{ name: "Bob" });
|
||||
let post = posts.create(#{
|
||||
title: "Bob's Post",
|
||||
author_id: user.id
|
||||
});
|
||||
|
||||
// Both operations commit together
|
||||
});
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
Models are defined dynamically through Rhai scripts:
|
||||
|
||||
```rhai
|
||||
let model = osiris.model("products");
|
||||
|
||||
// Model automatically handles:
|
||||
// - ID generation
|
||||
// - Timestamps (created_at, updated_at)
|
||||
// - Schema validation
|
||||
// - Indexing
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Redis server accessible
|
||||
- Osiris database configured
|
||||
- Valid job signatures
|
||||
- Sufficient storage for data operations
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Configuration Storage**: Store application configs
|
||||
- **User Data**: Manage user profiles and preferences
|
||||
- **Workflow State**: Persist workflow execution state
|
||||
- **Metrics & Logs**: Store structured logs and metrics
|
||||
- **Cache Management**: Persistent caching layer
|
||||
96
docs/runner/overview.md
Normal file
96
docs/runner/overview.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Runners Overview
|
||||
|
||||
Runners are the execution layer in the Horus architecture. They receive jobs from the Supervisor via Redis queues and execute the actual workload.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Supervisor → Redis Queue → Runner → Execute Job → Return Result
|
||||
```
|
||||
|
||||
## Available Runners
|
||||
|
||||
Horus provides three specialized runners:
|
||||
|
||||
### 1. **Hero Runner**
|
||||
Executes heroscripts using the Hero CLI ecosystem.
|
||||
|
||||
**Use Cases:**
|
||||
- Running Hero automation tasks
|
||||
- Executing heroscripts from job payloads
|
||||
- Integration with Hero CLI tools
|
||||
|
||||
**Binary:** `herorunner`
|
||||
|
||||
[→ Hero Runner Documentation](./hero.md)
|
||||
|
||||
### 2. **SAL Runner**
|
||||
System Abstraction Layer runner for system-level operations.
|
||||
|
||||
**Use Cases:**
|
||||
- OS operations (file, process, network)
|
||||
- Infrastructure management (Kubernetes, VMs)
|
||||
- Cloud provider operations (Hetzner)
|
||||
- Database operations (Redis, Postgres)
|
||||
|
||||
**Binary:** `runner_sal`
|
||||
|
||||
[→ SAL Runner Documentation](./sal.md)
|
||||
|
||||
### 3. **Osiris Runner**
|
||||
Database-backed runner for data storage and retrieval using Rhai scripts.
|
||||
|
||||
**Use Cases:**
|
||||
- Structured data storage
|
||||
- Model-based data operations
|
||||
- Rhai script execution with database access
|
||||
|
||||
**Binary:** `runner_osiris`
|
||||
|
||||
[→ Osiris Runner Documentation](./osiris.md)
|
||||
|
||||
## Common Features
|
||||
|
||||
All runners implement the `Runner` trait and provide:
|
||||
|
||||
- **Job Execution**: Process jobs from Redis queues
|
||||
- **Signature Verification**: Verify job signatures before execution
|
||||
- **Timeout Support**: Respect job timeout settings
|
||||
- **Environment Variables**: Pass environment variables to jobs
|
||||
- **Error Handling**: Comprehensive error reporting
|
||||
- **Logging**: Structured logging for debugging
|
||||
|
||||
## Runner Protocol
|
||||
|
||||
Runners communicate with the Supervisor using a Redis-based protocol:
|
||||
|
||||
1. **Job Queue**: Supervisor pushes jobs to `runner:{runner_id}:jobs`
|
||||
2. **Job Processing**: Runner pops job, validates signature, executes
|
||||
3. **Result Storage**: Runner stores result in `job:{job_id}:result`
|
||||
4. **Status Updates**: Runner updates job status throughout execution
|
||||
|
||||
## Starting a Runner
|
||||
|
||||
```bash
|
||||
# Hero Runner
|
||||
herorunner <runner_id> [--redis-url <url>]
|
||||
|
||||
# SAL Runner
|
||||
runner_sal <runner_id> [--redis-url <url>]
|
||||
|
||||
# Osiris Runner
|
||||
runner_osiris <runner_id> [--redis-url <url>]
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
All runners accept:
|
||||
- `runner_id`: Unique identifier for the runner (required)
|
||||
- `--redis-url`: Redis connection URL (default: `redis://localhost:6379`)
|
||||
|
||||
## Security
|
||||
|
||||
- Jobs must be cryptographically signed
|
||||
- Runners verify signatures before execution
|
||||
- Untrusted jobs are rejected
|
||||
- Environment variables should not contain sensitive data in production
|
||||
123
docs/runner/sal.md
Normal file
123
docs/runner/sal.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# SAL Runner
|
||||
|
||||
System Abstraction Layer runner for system-level operations.
|
||||
|
||||
## Overview
|
||||
|
||||
The SAL runner executes Rhai scripts with access to system abstraction modules for OS operations, infrastructure management, and cloud provider interactions.
|
||||
|
||||
## Features
|
||||
|
||||
- **Rhai Scripting**: Execute Rhai scripts with SAL modules
|
||||
- **System Operations**: File, process, and network management
|
||||
- **Infrastructure**: Kubernetes, VM, and container operations
|
||||
- **Cloud Providers**: Hetzner and other cloud integrations
|
||||
- **Database Access**: Redis and Postgres client operations
|
||||
- **Networking**: Mycelium and network configuration
|
||||
|
||||
## Available SAL Modules
|
||||
|
||||
### Core Modules
|
||||
- **sal-os**: Operating system operations
|
||||
- **sal-process**: Process management
|
||||
- **sal-text**: Text processing utilities
|
||||
- **sal-net**: Network operations
|
||||
|
||||
### Infrastructure
|
||||
- **sal-virt**: Virtualization management
|
||||
- **sal-kubernetes**: Kubernetes cluster operations
|
||||
- **sal-zinit-client**: Zinit process manager
|
||||
|
||||
### Storage & Data
|
||||
- **sal-redisclient**: Redis operations
|
||||
- **sal-postgresclient**: PostgreSQL operations
|
||||
- **sal-vault**: Secret management
|
||||
|
||||
### Networking
|
||||
- **sal-mycelium**: Mycelium network integration
|
||||
|
||||
### Cloud Providers
|
||||
- **sal-hetzner**: Hetzner cloud operations
|
||||
|
||||
### Version Control
|
||||
- **sal-git**: Git repository operations
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start the runner
|
||||
runner_sal my-sal-runner
|
||||
|
||||
# With custom Redis
|
||||
runner_sal my-sal-runner --redis-url redis://custom:6379
|
||||
```
|
||||
|
||||
## Job Payload
|
||||
|
||||
The payload should contain a Rhai script using SAL modules:
|
||||
|
||||
```rhai
|
||||
// Example: List files
|
||||
let files = os.list_dir("/tmp");
|
||||
print(files);
|
||||
|
||||
// Example: Process management
|
||||
let pid = process.spawn("ls", ["-la"]);
|
||||
let output = process.wait(pid);
|
||||
print(output);
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### File Operations
|
||||
```rhai
|
||||
// Read file
|
||||
let content = os.read_file("/path/to/file");
|
||||
print(content);
|
||||
|
||||
// Write file
|
||||
os.write_file("/path/to/output", "Hello World");
|
||||
```
|
||||
|
||||
### Kubernetes Operations
|
||||
```rhai
|
||||
// List pods
|
||||
let pods = k8s.list_pods("default");
|
||||
for pod in pods {
|
||||
print(pod.name);
|
||||
}
|
||||
```
|
||||
|
||||
### Redis Operations
|
||||
```rhai
|
||||
// Set value
|
||||
redis.set("key", "value");
|
||||
|
||||
// Get value
|
||||
let val = redis.get("key");
|
||||
print(val);
|
||||
```
|
||||
|
||||
### Git Operations
|
||||
```rhai
|
||||
// Clone repository
|
||||
git.clone("https://github.com/user/repo", "/tmp/repo");
|
||||
|
||||
// Get status
|
||||
let status = git.status("/tmp/repo");
|
||||
print(status);
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Redis server accessible
|
||||
- System permissions for requested operations
|
||||
- Valid job signatures
|
||||
- SAL modules available in runtime
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- SAL operations have system-level access
|
||||
- Jobs must be from trusted sources
|
||||
- Signature verification is mandatory
|
||||
- Limit runner permissions in production
|
||||
88
docs/supervisor/overview.md
Normal file
88
docs/supervisor/overview.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# Supervisor Overview
|
||||
|
||||
The Supervisor is the job dispatcher layer in Horus. It receives jobs, verifies signatures, and routes them to appropriate runners.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Client → Supervisor → Redis Queue → Runner
|
||||
```
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### 1. **Job Admission**
|
||||
- Receive jobs via OpenRPC interface
|
||||
- Validate job structure and required fields
|
||||
- Verify cryptographic signatures
|
||||
|
||||
### 2. **Authentication & Authorization**
|
||||
- Verify job signatures using public keys
|
||||
- Ensure jobs are from authorized sources
|
||||
- Reject unsigned or invalid jobs
|
||||
|
||||
### 3. **Job Routing**
|
||||
- Route jobs to appropriate runner queues
|
||||
- Maintain runner registry
|
||||
- Load balance across available runners
|
||||
|
||||
### 4. **Job Management**
|
||||
- Track job status and lifecycle
|
||||
- Provide job query and listing APIs
|
||||
- Store job results and logs
|
||||
|
||||
### 5. **Runner Management**
|
||||
- Register and track available runners
|
||||
- Monitor runner health and availability
|
||||
- Handle runner disconnections
|
||||
|
||||
## OpenRPC Interface
|
||||
|
||||
The Supervisor exposes an OpenRPC API for job management:
|
||||
|
||||
### Job Operations
|
||||
- `create_job`: Submit a new job
|
||||
- `get_job`: Retrieve job details
|
||||
- `list_jobs`: List all jobs
|
||||
- `delete_job`: Remove a job
|
||||
- `get_job_logs`: Retrieve job execution logs
|
||||
|
||||
### Runner Operations
|
||||
- `register_runner`: Register a new runner
|
||||
- `list_runners`: List available runners
|
||||
- `get_runner_status`: Check runner health
|
||||
|
||||
## Job Lifecycle
|
||||
|
||||
1. **Submission**: Client submits job via OpenRPC
|
||||
2. **Validation**: Supervisor validates structure and signature
|
||||
3. **Queueing**: Job pushed to runner's Redis queue
|
||||
4. **Execution**: Runner processes job
|
||||
5. **Completion**: Result stored in Redis
|
||||
6. **Retrieval**: Client retrieves result via OpenRPC
|
||||
|
||||
## Transport Options
|
||||
|
||||
The Supervisor supports multiple transport layers:
|
||||
|
||||
- **HTTP**: Standard HTTP/HTTPS transport
|
||||
- **Mycelium**: Peer-to-peer encrypted transport
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# Start supervisor
|
||||
supervisor --port 8080 --redis-url redis://localhost:6379
|
||||
|
||||
# With Mycelium
|
||||
supervisor --port 8080 --mycelium --redis-url redis://localhost:6379
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
- All jobs must be cryptographically signed
|
||||
- Signatures verified before job admission
|
||||
- Public key infrastructure for identity
|
||||
- Optional TLS for HTTP transport
|
||||
- End-to-end encryption via Mycelium
|
||||
|
||||
[→ Authentication Documentation](./auth.md)
|
||||
Reference in New Issue
Block a user