add some documentation for blue book

This commit is contained in:
Timur Gordon
2025-11-14 11:00:26 +01:00
parent 75e62f4730
commit f67296cd25
11 changed files with 1275 additions and 8 deletions

View File

@@ -0,0 +1,145 @@
# Coordinator Overview
The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
## Architecture
```
Client → Coordinator → Supervisor(s) → Runner(s)
```
## Responsibilities
### 1. **Workflow Management**
- Parse and validate DAG workflow definitions
- Track workflow execution state
- Manage step dependencies
### 2. **Job Orchestration**
- Determine which steps are ready to execute
- Dispatch jobs to appropriate supervisors
- Handle step failures and retries
### 3. **Dependency Resolution**
- Track step completion
- Resolve data dependencies between steps
- Pass outputs from completed steps to dependent steps
### 4. **Multi-Supervisor Coordination**
- Route jobs to specific supervisors
- Handle supervisor failures
- Load balance across supervisors
## Workflow Definition
Workflows are defined as Directed Acyclic Graphs (DAGs):
```yaml
workflow:
name: "data-pipeline"
steps:
- id: "fetch"
runner: "hero"
payload: "!!http.get url:'https://api.example.com/data'"
- id: "process"
runner: "sal"
depends_on: ["fetch"]
payload: |
let data = input.fetch;
let processed = process_data(data);
processed
- id: "store"
runner: "osiris"
depends_on: ["process"]
payload: |
let model = osiris.model("results");
model.create(input.process);
```
## Features
### DAG Execution
- Parallel execution of independent steps
- Sequential execution of dependent steps
- Automatic dependency resolution
### Error Handling
- Step-level retry policies
- Workflow-level error handlers
- Partial workflow recovery
### Data Flow
- Pass outputs between steps
- Transform data between steps
- Aggregate results from parallel steps
### Monitoring
- Real-time workflow status
- Step-level progress tracking
- Execution metrics and logs
## Workflow Lifecycle
1. **Submission**: Client submits workflow definition
2. **Validation**: Coordinator validates DAG structure
3. **Scheduling**: Determine ready steps (no pending dependencies)
4. **Dispatch**: Send jobs to supervisors
5. **Tracking**: Monitor step completion
6. **Progression**: Execute next ready steps
7. **Completion**: Workflow finishes when all steps complete
## Use Cases
### Data Pipelines
```yaml
Extract → Transform → Load
```
### CI/CD Workflows
```yaml
Build → Test → Deploy
```
### Multi-Stage Processing
```yaml
Fetch Data → Process → Validate → Store → Notify
```
### Parallel Execution
```yaml
┌─ Task A ─┐
Start ──┼─ Task B ─┼── Aggregate → Finish
└─ Task C ─┘
```
## Configuration
```bash
# Start coordinator
coordinator --port 9090 --redis-url redis://localhost:6379
# With multiple supervisors
coordinator --port 9090 \
--supervisor http://supervisor1:8080 \
--supervisor http://supervisor2:8080
```
## API
The Coordinator exposes an OpenRPC API:
- `submit_workflow`: Submit a new workflow
- `get_workflow_status`: Check workflow progress
- `list_workflows`: List all workflows
- `cancel_workflow`: Stop a running workflow
- `get_workflow_logs`: Retrieve execution logs
## Advantages
- **Declarative**: Define what to do, not how
- **Scalable**: Parallel execution across multiple supervisors
- **Resilient**: Automatic retry and error handling
- **Observable**: Real-time status and logging
- **Composable**: Reuse workflows as steps in larger workflows