From f67296cd25362c112e1f40812a57eec45b944965 Mon Sep 17 00:00:00 2001 From: Timur Gordon <31495328+timurgordon@users.noreply.github.com> Date: Fri, 14 Nov 2025 11:00:26 +0100 Subject: [PATCH] add some documentation for blue book --- docs/.collection | 0 docs/README.md | 67 +++++++++++++ docs/architecture.md | 186 +++++++++++++++++++++++++++++++++-- docs/coordinator/overview.md | 145 +++++++++++++++++++++++++++ docs/getting-started.md | 186 +++++++++++++++++++++++++++++++++++ docs/job-format.md | 179 +++++++++++++++++++++++++++++++++ docs/runner/hero.md | 71 +++++++++++++ docs/runner/osiris.md | 142 ++++++++++++++++++++++++++ docs/runner/overview.md | 96 ++++++++++++++++++ docs/runner/sal.md | 123 +++++++++++++++++++++++ docs/supervisor/overview.md | 88 +++++++++++++++++ 11 files changed, 1275 insertions(+), 8 deletions(-) create mode 100644 docs/.collection create mode 100644 docs/README.md create mode 100644 docs/coordinator/overview.md create mode 100644 docs/getting-started.md create mode 100644 docs/job-format.md create mode 100644 docs/runner/hero.md create mode 100644 docs/runner/osiris.md create mode 100644 docs/runner/overview.md create mode 100644 docs/runner/sal.md create mode 100644 docs/supervisor/overview.md diff --git a/docs/.collection b/docs/.collection new file mode 100644 index 0000000..e69de29 diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..91965cd --- /dev/null +++ b/docs/README.md @@ -0,0 +1,67 @@ +# Horus Documentation + +**Hierarchical Orchestration Runtime for Universal Scripts** + +Horus is a distributed job execution system with three layers: Coordinator, Supervisor, and Runner. + +## Quick Links + +- **[Getting Started](./getting-started.md)** - Install and run your first job +- **[Architecture](./architecture.md)** - System design and components +- **[Etymology](./ethymology.md)** - The meaning behind the name + +## Components + +### Coordinator +Workflow orchestration engine for DAG-based execution. + +- [Overview](./coordinator/overview.md) + +### Supervisor +Job dispatcher with authentication and routing. + +- [Overview](./supervisor/overview.md) +- [Authentication](./supervisor/auth.md) +- [OpenRPC API](./supervisor/openrpc.json) + +### Runners +Job executors for different workload types. + +- [Runner Overview](./runner/overview.md) +- [Hero Runner](./runner/hero.md) - Heroscript execution +- [SAL Runner](./runner/sal.md) - System operations +- [Osiris Runner](./runner/osiris.md) - Database operations + +## Core Concepts + +### Jobs +Units of work executed by runners. Each job contains: +- Target runner ID +- Payload (script/command) +- Cryptographic signature +- Optional timeout and environment variables + +### Workflows +Multi-step DAGs executed by the Coordinator. Steps can: +- Run in parallel or sequence +- Pass data between steps +- Target different runners +- Handle errors and retries + +### Signatures +All jobs must be cryptographically signed: +- Ensures job authenticity +- Prevents tampering +- Enables authorization + +## Use Cases + +- **Automation**: Execute system tasks and scripts +- **Data Pipelines**: Multi-step ETL workflows +- **CI/CD**: Build, test, and deployment pipelines +- **Infrastructure**: Manage cloud resources and containers +- **Integration**: Connect systems via scripted workflows + +## Repository + +[git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus) diff --git a/docs/architecture.md b/docs/architecture.md index afd8c5d..c3bbf99 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,15 +1,185 @@ # Architecture -The Horus architecture consists of three layers: +Horus is a hierarchical orchestration runtime with three layers: Coordinator, Supervisor, and Runner. -1. Coordinator: A workflow engine that executes DAG-based flows by sending ready job steps to the targeted supervisors. -2. Supervisor: A job dispatcher that routes jobs to the appropriate runners. -3. Runner: A job executor that runs the actual job steps. +## Overview -## Networking +``` +┌─────────────────────────────────────────────────────────┐ +│ Coordinator │ +│ (Workflow Engine - DAG Execution) │ +│ │ +│ • Parses workflow definitions │ +│ • Resolves dependencies │ +│ • Dispatches ready steps │ +│ • Tracks workflow state │ +└────────────────────┬────────────────────────────────────┘ + │ OpenRPC (HTTP/Mycelium) + │ +┌────────────────────▼────────────────────────────────────┐ +│ Supervisor │ +│ (Job Dispatcher & Authenticator) │ +│ │ +│ • Verifies job signatures │ +│ • Routes jobs to runners │ +│ • Manages runner registry │ +│ • Tracks job lifecycle │ +└────────────────────┬────────────────────────────────────┘ + │ Redis Queue Protocol + │ +┌────────────────────▼────────────────────────────────────┐ +│ Runners │ +│ (Job Executors) │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Hero │ │ SAL │ │ Osiris │ │ +│ │ Runner │ │ Runner │ │ Runner │ │ +│ └──────────┘ └──────────┘ └──────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` -- The user / client talks to the coordinator over an OpenRPC interface, using either regular HTTP transport or Mycelium. -- The coordinator talks to the supervisor over an OpenRPC interface, using either regular HTTP transport or Mycelium. -- The supervisor talks to runners over a Redis based job execution protocol. +## Layers + +### 1. Coordinator (Optional) +**Purpose:** Workflow orchestration and DAG execution + +**Responsibilities:** +- Parse and validate workflow definitions +- Execute DAG-based flows +- Manage step dependencies +- Route jobs to appropriate supervisors +- Handle multi-step workflows + +**Use When:** +- You need multi-step workflows +- Jobs have dependencies +- Parallel execution is required +- Complex data pipelines + +[→ Coordinator Documentation](./coordinator/overview.md) + +### 2. Supervisor (Required) +**Purpose:** Job admission, authentication, and routing + +**Responsibilities:** +- Receive jobs via OpenRPC interface +- Verify cryptographic signatures +- Route jobs to appropriate runners +- Manage runner registry +- Track job status and results + +**Features:** +- OpenRPC API for job management +- HTTP and Mycelium transport +- Signature-based authentication +- Runner health monitoring + +[→ Supervisor Documentation](./supervisor/overview.md) + +### 3. Runners (Required) +**Purpose:** Execute actual job workloads + +**Available Runners:** +- **Hero Runner**: Executes heroscripts via Hero CLI +- **SAL Runner**: System operations (OS, K8s, cloud, etc.) +- **Osiris Runner**: Database operations with Rhai scripts + +**Common Features:** +- Redis queue-based job polling +- Signature verification +- Timeout support +- Environment variable handling + +[→ Runner Documentation](./runner/overview.md) + +## Communication Protocols + +### Client ↔ Coordinator +- **Protocol:** OpenRPC +- **Transport:** HTTP or Mycelium +- **Operations:** Submit workflow, check status, retrieve results + +### Coordinator ↔ Supervisor +- **Protocol:** OpenRPC +- **Transport:** HTTP or Mycelium +- **Operations:** Create job, get status, retrieve logs + +### Supervisor ↔ Runner +- **Protocol:** Redis Queue +- **Transport:** Redis pub/sub and lists +- **Operations:** Push job, poll queue, store result + +## Job Flow + +### Simple Job (No Coordinator) +``` +1. Client → Supervisor: create_job() +2. Supervisor: Verify signature +3. Supervisor → Redis: Push to runner queue +4. Runner ← Redis: Pop job +5. Runner: Execute job +6. Runner → Redis: Store result +7. Client ← Supervisor: get_job_result() +``` + +### Workflow (With Coordinator) +``` +1. Client → Coordinator: submit_workflow() +2. Coordinator: Parse DAG +3. Coordinator: Identify ready steps +4. Coordinator → Supervisor: create_job() for each ready step +5. Supervisor → Runner: Route via Redis +6. Runner: Execute and return result +7. Coordinator: Update workflow state +8. Coordinator: Dispatch next ready steps +9. Repeat until workflow complete +``` + +## Security Model + +### Authentication +- Jobs must be cryptographically signed +- Signatures verified at Supervisor layer +- Public key infrastructure for identity + +### Authorization +- Runners only execute signed jobs +- Signature verification before execution +- Untrusted jobs rejected + +### Transport Security +- Optional TLS for HTTP transport +- End-to-end encryption via Mycelium +- No plaintext credentials + +[→ Authentication Details](./supervisor/auth.md) + +## Deployment Patterns + +### Minimal Setup +``` +Redis + Supervisor + Runner(s) +``` +Single machine, simple job execution. + +### Distributed Setup +``` +Redis Cluster + Multiple Supervisors + Runner Pool +``` +High availability, load balancing. + +### Full Orchestration +``` +Coordinator + Multiple Supervisors + Runner Pool +``` +Complex workflows, multi-step pipelines. + +## Design Principles + +1. **Hierarchical**: Clear separation of concerns across layers +2. **Secure**: Signature-based authentication throughout +3. **Scalable**: Horizontal scaling at each layer +4. **Observable**: Comprehensive logging and status tracking +5. **Flexible**: Multiple runners for different workload types diff --git a/docs/coordinator/overview.md b/docs/coordinator/overview.md new file mode 100644 index 0000000..a5dc95a --- /dev/null +++ b/docs/coordinator/overview.md @@ -0,0 +1,145 @@ +# Coordinator Overview + +The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors. + +## Architecture + +``` +Client → Coordinator → Supervisor(s) → Runner(s) +``` + +## Responsibilities + +### 1. **Workflow Management** +- Parse and validate DAG workflow definitions +- Track workflow execution state +- Manage step dependencies + +### 2. **Job Orchestration** +- Determine which steps are ready to execute +- Dispatch jobs to appropriate supervisors +- Handle step failures and retries + +### 3. **Dependency Resolution** +- Track step completion +- Resolve data dependencies between steps +- Pass outputs from completed steps to dependent steps + +### 4. **Multi-Supervisor Coordination** +- Route jobs to specific supervisors +- Handle supervisor failures +- Load balance across supervisors + +## Workflow Definition + +Workflows are defined as Directed Acyclic Graphs (DAGs): + +```yaml +workflow: + name: "data-pipeline" + steps: + - id: "fetch" + runner: "hero" + payload: "!!http.get url:'https://api.example.com/data'" + + - id: "process" + runner: "sal" + depends_on: ["fetch"] + payload: | + let data = input.fetch; + let processed = process_data(data); + processed + + - id: "store" + runner: "osiris" + depends_on: ["process"] + payload: | + let model = osiris.model("results"); + model.create(input.process); +``` + +## Features + +### DAG Execution +- Parallel execution of independent steps +- Sequential execution of dependent steps +- Automatic dependency resolution + +### Error Handling +- Step-level retry policies +- Workflow-level error handlers +- Partial workflow recovery + +### Data Flow +- Pass outputs between steps +- Transform data between steps +- Aggregate results from parallel steps + +### Monitoring +- Real-time workflow status +- Step-level progress tracking +- Execution metrics and logs + +## Workflow Lifecycle + +1. **Submission**: Client submits workflow definition +2. **Validation**: Coordinator validates DAG structure +3. **Scheduling**: Determine ready steps (no pending dependencies) +4. **Dispatch**: Send jobs to supervisors +5. **Tracking**: Monitor step completion +6. **Progression**: Execute next ready steps +7. **Completion**: Workflow finishes when all steps complete + +## Use Cases + +### Data Pipelines +```yaml +Extract → Transform → Load +``` + +### CI/CD Workflows +```yaml +Build → Test → Deploy +``` + +### Multi-Stage Processing +```yaml +Fetch Data → Process → Validate → Store → Notify +``` + +### Parallel Execution +```yaml + ┌─ Task A ─┐ +Start ──┼─ Task B ─┼── Aggregate → Finish + └─ Task C ─┘ +``` + +## Configuration + +```bash +# Start coordinator +coordinator --port 9090 --redis-url redis://localhost:6379 + +# With multiple supervisors +coordinator --port 9090 \ + --supervisor http://supervisor1:8080 \ + --supervisor http://supervisor2:8080 +``` + +## API + +The Coordinator exposes an OpenRPC API: + +- `submit_workflow`: Submit a new workflow +- `get_workflow_status`: Check workflow progress +- `list_workflows`: List all workflows +- `cancel_workflow`: Stop a running workflow +- `get_workflow_logs`: Retrieve execution logs + +## Advantages + +- **Declarative**: Define what to do, not how +- **Scalable**: Parallel execution across multiple supervisors +- **Resilient**: Automatic retry and error handling +- **Observable**: Real-time status and logging +- **Composable**: Reuse workflows as steps in larger workflows diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..7f4f45a --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,186 @@ +# Getting Started with Horus + +Quick start guide to running your first Horus job. + +## Prerequisites + +- Redis server running +- Rust toolchain installed +- Horus repository cloned + +## Installation + +### Build from Source + +```bash +# Clone repository +git clone https://git.ourworld.tf/herocode/horus +cd horus + +# Build all components +cargo build --release + +# Binaries will be in target/release/ +``` + +## Quick Start + +### 1. Start Redis + +```bash +# Using Docker +docker run -d -p 6379:6379 redis:latest + +# Or install locally +redis-server +``` + +### 2. Start a Runner + +```bash +# Start Hero runner +./target/release/herorunner my-runner + +# Or SAL runner +./target/release/runner_sal my-sal-runner + +# Or Osiris runner +./target/release/runner_osiris my-osiris-runner +``` + +### 3. Start the Supervisor + +```bash +./target/release/supervisor --port 8080 +``` + +### 4. Submit a Job + +Using the Supervisor client: + +```rust +use hero_supervisor_client::SupervisorClient; +use hero_job::Job; + +#[tokio::main] +async fn main() -> Result<(), Box> { + let client = SupervisorClient::new("http://localhost:8080")?; + + let job = Job::new( + "my-runner", + "print('Hello from Horus!')".to_string(), + ); + + let result = client.create_job(job).await?; + println!("Job ID: {}", result.id); + + Ok(()) +} +``` + +## Example Workflows + +### Simple Heroscript Execution + +```bash +# Job payload +print("Hello World") +!!git.list +``` + +### SAL System Operation + +```rhai +// List files in directory +let files = os.list_dir("/tmp"); +for file in files { + print(file); +} +``` + +### Osiris Data Storage + +```rhai +// Store user data +let users = osiris.model("users"); +let user = users.create(#{ + name: "Alice", + email: "alice@example.com" +}); +print(`Created user: ${user.id}`); +``` + +## Architecture Overview + +``` +┌──────────────┐ +│ Coordinator │ (Optional: For workflows) +└──────┬───────┘ + │ +┌──────▼───────┐ +│ Supervisor │ (Job dispatcher) +└──────┬───────┘ + │ + │ Redis + │ +┌──────▼───────┐ +│ Runners │ (Job executors) +│ - Hero │ +│ - SAL │ +│ - Osiris │ +└──────────────┘ +``` + +## Next Steps + +- [Architecture Details](./architecture.md) +- [Runner Documentation](./runner/overview.md) +- [Supervisor API](./supervisor/overview.md) +- [Coordinator Workflows](./coordinator/overview.md) +- [Authentication](./supervisor/auth.md) + +## Common Issues + +### Runner Not Receiving Jobs + +1. Check Redis connection +2. Verify runner ID matches job target +3. Check supervisor logs + +### Job Signature Verification Failed + +1. Ensure job is properly signed +2. Verify public key is registered +3. Check signature format + +### Timeout Errors + +1. Increase job timeout value +2. Check runner resource availability +3. Optimize job payload + +## Development + +### Running Tests + +```bash +# All tests +cargo test + +# Specific component +cargo test -p hero-supervisor +cargo test -p runner-hero +``` + +### Debug Mode + +```bash +# Enable debug logging +RUST_LOG=debug ./target/release/supervisor --port 8080 +``` + +## Support + +- Documentation: [docs.ourworld.tf/horus](https://docs.ourworld.tf/horus) +- Repository: [git.ourworld.tf/herocode/horus](https://git.ourworld.tf/herocode/horus) +- Issues: Report on the repository diff --git a/docs/job-format.md b/docs/job-format.md new file mode 100644 index 0000000..ce2b70e --- /dev/null +++ b/docs/job-format.md @@ -0,0 +1,179 @@ +# Job Format + +Jobs are the fundamental unit of work in Horus. + +## Structure + +```rust +pub struct Job { + pub id: String, // Unique job identifier + pub runner_id: String, // Target runner ID + pub payload: String, // Job payload (script/command) + pub timeout: Option, // Timeout in seconds + pub env_vars: HashMap, // Environment variables + pub signatures: Vec, // Cryptographic signatures + pub created_at: i64, // Creation timestamp + pub status: JobStatus, // Current status +} +``` + +## Job Status + +```rust +pub enum JobStatus { + Pending, // Queued, not yet started + Running, // Currently executing + Completed, // Finished successfully + Failed, // Execution failed + Timeout, // Exceeded timeout + Cancelled, // Manually cancelled +} +``` + +## Signature Format + +```rust +pub struct Signature { + pub public_key: String, // Signer's public key + pub signature: String, // Cryptographic signature + pub algorithm: String, // Signature algorithm (e.g., "ed25519") +} +``` + +## Creating a Job + +### Minimal Job + +```rust +use hero_job::Job; + +let job = Job::new( + "my-runner", + "print('Hello World')".to_string(), +); +``` + +### With Timeout + +```rust +let job = Job::builder() + .runner_id("my-runner") + .payload("long_running_task()") + .timeout(300) // 5 minutes + .build(); +``` + +### With Environment Variables + +```rust +use std::collections::HashMap; + +let mut env_vars = HashMap::new(); +env_vars.insert("API_KEY".to_string(), "secret".to_string()); +env_vars.insert("ENV".to_string(), "production".to_string()); + +let job = Job::builder() + .runner_id("my-runner") + .payload("deploy_app()") + .env_vars(env_vars) + .build(); +``` + +### With Signature + +```rust +use hero_job::{Job, Signature}; + +let job = Job::builder() + .runner_id("my-runner") + .payload("important_task()") + .signature(Signature { + public_key: "ed25519:abc123...".to_string(), + signature: "sig:xyz789...".to_string(), + algorithm: "ed25519".to_string(), + }) + .build(); +``` + +## Payload Format + +The payload format depends on the target runner: + +### Hero Runner +Heroscript content: +```heroscript +!!git.list +print("Repositories listed") +!!docker.ps +``` + +### SAL Runner +Rhai script with SAL modules: +```rhai +let files = os.list_dir("/tmp"); +for file in files { + print(file); +} +``` + +### Osiris Runner +Rhai script with Osiris database: +```rhai +let users = osiris.model("users"); +let user = users.create(#{ + name: "Alice", + email: "alice@example.com" +}); +``` + +## Job Result + +```rust +pub struct JobResult { + pub job_id: String, + pub status: JobStatus, + pub output: String, // Stdout + pub error: Option, // Stderr or error message + pub exit_code: Option, + pub started_at: Option, + pub completed_at: Option, +} +``` + +## Best Practices + +### Timeouts +- Always set timeouts for jobs +- Default: 60 seconds +- Long-running jobs: Set appropriate timeout +- Infinite jobs: Use separate monitoring + +### Environment Variables +- Don't store secrets in env vars in production +- Use vault/secret management instead +- Keep env vars minimal +- Document required variables + +### Signatures +- Always sign jobs in production +- Use strong algorithms (ed25519) +- Rotate keys regularly +- Store private keys securely + +### Payloads +- Keep payloads concise +- Validate input data +- Handle errors gracefully +- Log important operations + +## Validation + +Jobs are validated before execution: + +1. **Structure**: All required fields present +2. **Signature**: Valid cryptographic signature +3. **Runner**: Target runner exists and available +4. **Payload**: Non-empty payload +5. **Timeout**: Reasonable timeout value + +Invalid jobs are rejected before execution. diff --git a/docs/runner/hero.md b/docs/runner/hero.md new file mode 100644 index 0000000..7c2e9d2 --- /dev/null +++ b/docs/runner/hero.md @@ -0,0 +1,71 @@ +# Hero Runner + +Executes heroscripts using the Hero CLI tool. + +## Overview + +The Hero runner pipes job payloads directly to `hero run -s` via stdin, making it ideal for executing Hero automation tasks and heroscripts. + +## Features + +- **Heroscript Execution**: Direct stdin piping to `hero run -s` +- **No Temp Files**: Secure execution without filesystem artifacts +- **Environment Variables**: Full environment variable support +- **Timeout Support**: Respects job timeout settings +- **Signature Verification**: Cryptographic job verification + +## Usage + +```bash +# Start the runner +herorunner my-hero-runner + +# With custom Redis +herorunner my-hero-runner --redis-url redis://custom:6379 +``` + +## Job Payload + +The payload should contain the heroscript content: + +```heroscript +!!git.list +print("Repositories listed") +!!docker.ps +``` + +## Examples + +### Simple Print +```heroscript +print("Hello from heroscript!") +``` + +### Hero Actions +```heroscript +!!git.list +!!docker.start name:"myapp" +``` + +### With Environment Variables +```json +{ + "payload": "print(env.MY_VAR)", + "env_vars": { + "MY_VAR": "Hello World" + } +} +``` + +## Requirements + +- `hero` CLI must be installed and in PATH +- Redis server accessible +- Valid job signatures + +## Error Handling + +- **Hero CLI Not Found**: Returns error if `hero` command unavailable +- **Timeout**: Kills process if timeout exceeded +- **Non-zero Exit**: Returns error with hero CLI output +- **Invalid Signature**: Rejects job before execution diff --git a/docs/runner/osiris.md b/docs/runner/osiris.md new file mode 100644 index 0000000..5115a4d --- /dev/null +++ b/docs/runner/osiris.md @@ -0,0 +1,142 @@ +# Osiris Runner + +Database-backed runner for structured data storage and retrieval. + +## Overview + +The Osiris runner executes Rhai scripts with access to a model-based database system, enabling structured data operations and persistence. + +## Features + +- **Rhai Scripting**: Execute Rhai scripts with Osiris database access +- **Model-Based Storage**: Define and use data models +- **CRUD Operations**: Create, read, update, delete records +- **Query Support**: Search and filter data +- **Schema Validation**: Type-safe data operations +- **Transaction Support**: Atomic database operations + +## Usage + +```bash +# Start the runner +runner_osiris my-osiris-runner + +# With custom Redis +runner_osiris my-osiris-runner --redis-url redis://custom:6379 +``` + +## Job Payload + +The payload should contain a Rhai script using Osiris operations: + +```rhai +// Example: Store data +let model = osiris.model("users"); +let user = model.create(#{ + name: "Alice", + email: "alice@example.com", + age: 30 +}); +print(user.id); + +// Example: Retrieve data +let found = model.get(user.id); +print(found.name); +``` + +## Examples + +### Create Model and Store Data +```rhai +// Define model +let posts = osiris.model("posts"); + +// Create record +let post = posts.create(#{ + title: "Hello World", + content: "First post", + author: "Alice", + published: true +}); + +print(`Created post with ID: ${post.id}`); +``` + +### Query Data +```rhai +let posts = osiris.model("posts"); + +// Find by field +let published = posts.find(#{ + published: true +}); + +for post in published { + print(post.title); +} +``` + +### Update Records +```rhai +let posts = osiris.model("posts"); + +// Get record +let post = posts.get("post-123"); + +// Update fields +post.content = "Updated content"; +posts.update(post); +``` + +### Delete Records +```rhai +let posts = osiris.model("posts"); + +// Delete by ID +posts.delete("post-123"); +``` + +### Transactions +```rhai +osiris.transaction(|| { + let users = osiris.model("users"); + let posts = osiris.model("posts"); + + let user = users.create(#{ name: "Bob" }); + let post = posts.create(#{ + title: "Bob's Post", + author_id: user.id + }); + + // Both operations commit together +}); +``` + +## Data Models + +Models are defined dynamically through Rhai scripts: + +```rhai +let model = osiris.model("products"); + +// Model automatically handles: +// - ID generation +// - Timestamps (created_at, updated_at) +// - Schema validation +// - Indexing +``` + +## Requirements + +- Redis server accessible +- Osiris database configured +- Valid job signatures +- Sufficient storage for data operations + +## Use Cases + +- **Configuration Storage**: Store application configs +- **User Data**: Manage user profiles and preferences +- **Workflow State**: Persist workflow execution state +- **Metrics & Logs**: Store structured logs and metrics +- **Cache Management**: Persistent caching layer diff --git a/docs/runner/overview.md b/docs/runner/overview.md new file mode 100644 index 0000000..c72e90e --- /dev/null +++ b/docs/runner/overview.md @@ -0,0 +1,96 @@ +# Runners Overview + +Runners are the execution layer in the Horus architecture. They receive jobs from the Supervisor via Redis queues and execute the actual workload. + +## Architecture + +``` +Supervisor → Redis Queue → Runner → Execute Job → Return Result +``` + +## Available Runners + +Horus provides three specialized runners: + +### 1. **Hero Runner** +Executes heroscripts using the Hero CLI ecosystem. + +**Use Cases:** +- Running Hero automation tasks +- Executing heroscripts from job payloads +- Integration with Hero CLI tools + +**Binary:** `herorunner` + +[→ Hero Runner Documentation](./hero.md) + +### 2. **SAL Runner** +System Abstraction Layer runner for system-level operations. + +**Use Cases:** +- OS operations (file, process, network) +- Infrastructure management (Kubernetes, VMs) +- Cloud provider operations (Hetzner) +- Database operations (Redis, Postgres) + +**Binary:** `runner_sal` + +[→ SAL Runner Documentation](./sal.md) + +### 3. **Osiris Runner** +Database-backed runner for data storage and retrieval using Rhai scripts. + +**Use Cases:** +- Structured data storage +- Model-based data operations +- Rhai script execution with database access + +**Binary:** `runner_osiris` + +[→ Osiris Runner Documentation](./osiris.md) + +## Common Features + +All runners implement the `Runner` trait and provide: + +- **Job Execution**: Process jobs from Redis queues +- **Signature Verification**: Verify job signatures before execution +- **Timeout Support**: Respect job timeout settings +- **Environment Variables**: Pass environment variables to jobs +- **Error Handling**: Comprehensive error reporting +- **Logging**: Structured logging for debugging + +## Runner Protocol + +Runners communicate with the Supervisor using a Redis-based protocol: + +1. **Job Queue**: Supervisor pushes jobs to `runner:{runner_id}:jobs` +2. **Job Processing**: Runner pops job, validates signature, executes +3. **Result Storage**: Runner stores result in `job:{job_id}:result` +4. **Status Updates**: Runner updates job status throughout execution + +## Starting a Runner + +```bash +# Hero Runner +herorunner [--redis-url ] + +# SAL Runner +runner_sal [--redis-url ] + +# Osiris Runner +runner_osiris [--redis-url ] +``` + +## Configuration + +All runners accept: +- `runner_id`: Unique identifier for the runner (required) +- `--redis-url`: Redis connection URL (default: `redis://localhost:6379`) + +## Security + +- Jobs must be cryptographically signed +- Runners verify signatures before execution +- Untrusted jobs are rejected +- Environment variables should not contain sensitive data in production diff --git a/docs/runner/sal.md b/docs/runner/sal.md new file mode 100644 index 0000000..f26afa2 --- /dev/null +++ b/docs/runner/sal.md @@ -0,0 +1,123 @@ +# SAL Runner + +System Abstraction Layer runner for system-level operations. + +## Overview + +The SAL runner executes Rhai scripts with access to system abstraction modules for OS operations, infrastructure management, and cloud provider interactions. + +## Features + +- **Rhai Scripting**: Execute Rhai scripts with SAL modules +- **System Operations**: File, process, and network management +- **Infrastructure**: Kubernetes, VM, and container operations +- **Cloud Providers**: Hetzner and other cloud integrations +- **Database Access**: Redis and Postgres client operations +- **Networking**: Mycelium and network configuration + +## Available SAL Modules + +### Core Modules +- **sal-os**: Operating system operations +- **sal-process**: Process management +- **sal-text**: Text processing utilities +- **sal-net**: Network operations + +### Infrastructure +- **sal-virt**: Virtualization management +- **sal-kubernetes**: Kubernetes cluster operations +- **sal-zinit-client**: Zinit process manager + +### Storage & Data +- **sal-redisclient**: Redis operations +- **sal-postgresclient**: PostgreSQL operations +- **sal-vault**: Secret management + +### Networking +- **sal-mycelium**: Mycelium network integration + +### Cloud Providers +- **sal-hetzner**: Hetzner cloud operations + +### Version Control +- **sal-git**: Git repository operations + +## Usage + +```bash +# Start the runner +runner_sal my-sal-runner + +# With custom Redis +runner_sal my-sal-runner --redis-url redis://custom:6379 +``` + +## Job Payload + +The payload should contain a Rhai script using SAL modules: + +```rhai +// Example: List files +let files = os.list_dir("/tmp"); +print(files); + +// Example: Process management +let pid = process.spawn("ls", ["-la"]); +let output = process.wait(pid); +print(output); +``` + +## Examples + +### File Operations +```rhai +// Read file +let content = os.read_file("/path/to/file"); +print(content); + +// Write file +os.write_file("/path/to/output", "Hello World"); +``` + +### Kubernetes Operations +```rhai +// List pods +let pods = k8s.list_pods("default"); +for pod in pods { + print(pod.name); +} +``` + +### Redis Operations +```rhai +// Set value +redis.set("key", "value"); + +// Get value +let val = redis.get("key"); +print(val); +``` + +### Git Operations +```rhai +// Clone repository +git.clone("https://github.com/user/repo", "/tmp/repo"); + +// Get status +let status = git.status("/tmp/repo"); +print(status); +``` + +## Requirements + +- Redis server accessible +- System permissions for requested operations +- Valid job signatures +- SAL modules available in runtime + +## Security Considerations + +- SAL operations have system-level access +- Jobs must be from trusted sources +- Signature verification is mandatory +- Limit runner permissions in production diff --git a/docs/supervisor/overview.md b/docs/supervisor/overview.md new file mode 100644 index 0000000..844b9ef --- /dev/null +++ b/docs/supervisor/overview.md @@ -0,0 +1,88 @@ +# Supervisor Overview + +The Supervisor is the job dispatcher layer in Horus. It receives jobs, verifies signatures, and routes them to appropriate runners. + +## Architecture + +``` +Client → Supervisor → Redis Queue → Runner +``` + +## Responsibilities + +### 1. **Job Admission** +- Receive jobs via OpenRPC interface +- Validate job structure and required fields +- Verify cryptographic signatures + +### 2. **Authentication & Authorization** +- Verify job signatures using public keys +- Ensure jobs are from authorized sources +- Reject unsigned or invalid jobs + +### 3. **Job Routing** +- Route jobs to appropriate runner queues +- Maintain runner registry +- Load balance across available runners + +### 4. **Job Management** +- Track job status and lifecycle +- Provide job query and listing APIs +- Store job results and logs + +### 5. **Runner Management** +- Register and track available runners +- Monitor runner health and availability +- Handle runner disconnections + +## OpenRPC Interface + +The Supervisor exposes an OpenRPC API for job management: + +### Job Operations +- `create_job`: Submit a new job +- `get_job`: Retrieve job details +- `list_jobs`: List all jobs +- `delete_job`: Remove a job +- `get_job_logs`: Retrieve job execution logs + +### Runner Operations +- `register_runner`: Register a new runner +- `list_runners`: List available runners +- `get_runner_status`: Check runner health + +## Job Lifecycle + +1. **Submission**: Client submits job via OpenRPC +2. **Validation**: Supervisor validates structure and signature +3. **Queueing**: Job pushed to runner's Redis queue +4. **Execution**: Runner processes job +5. **Completion**: Result stored in Redis +6. **Retrieval**: Client retrieves result via OpenRPC + +## Transport Options + +The Supervisor supports multiple transport layers: + +- **HTTP**: Standard HTTP/HTTPS transport +- **Mycelium**: Peer-to-peer encrypted transport + +## Configuration + +```bash +# Start supervisor +supervisor --port 8080 --redis-url redis://localhost:6379 + +# With Mycelium +supervisor --port 8080 --mycelium --redis-url redis://localhost:6379 +``` + +## Security + +- All jobs must be cryptographically signed +- Signatures verified before job admission +- Public key infrastructure for identity +- Optional TLS for HTTP transport +- End-to-end encryption via Mycelium + +[→ Authentication Documentation](./auth.md)