146 lines
3.5 KiB
Markdown
146 lines
3.5 KiB
Markdown
# Coordinator Overview
|
|
|
|
The Coordinator is the workflow orchestration layer in Horus. It executes DAG-based flows by managing job dependencies and dispatching ready steps to supervisors.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Client → Coordinator → Supervisor(s) → Runner(s)
|
|
```
|
|
|
|
## Responsibilities
|
|
|
|
### 1. **Workflow Management**
|
|
- Parse and validate DAG workflow definitions
|
|
- Track workflow execution state
|
|
- Manage step dependencies
|
|
|
|
### 2. **Job Orchestration**
|
|
- Determine which steps are ready to execute
|
|
- Dispatch jobs to appropriate supervisors
|
|
- Handle step failures and retries
|
|
|
|
### 3. **Dependency Resolution**
|
|
- Track step completion
|
|
- Resolve data dependencies between steps
|
|
- Pass outputs from completed steps to dependent steps
|
|
|
|
### 4. **Multi-Supervisor Coordination**
|
|
- Route jobs to specific supervisors
|
|
- Handle supervisor failures
|
|
- Load balance across supervisors
|
|
|
|
## Workflow Definition
|
|
|
|
Workflows are defined as Directed Acyclic Graphs (DAGs):
|
|
|
|
```yaml
|
|
workflow:
|
|
name: "data-pipeline"
|
|
steps:
|
|
- id: "fetch"
|
|
runner: "hero"
|
|
payload: "!!http.get url:'https://api.example.com/data'"
|
|
|
|
- id: "process"
|
|
runner: "sal"
|
|
depends_on: ["fetch"]
|
|
payload: |
|
|
let data = input.fetch;
|
|
let processed = process_data(data);
|
|
processed
|
|
|
|
- id: "store"
|
|
runner: "osiris"
|
|
depends_on: ["process"]
|
|
payload: |
|
|
let model = osiris.model("results");
|
|
model.create(input.process);
|
|
```
|
|
|
|
## Features
|
|
|
|
### DAG Execution
|
|
- Parallel execution of independent steps
|
|
- Sequential execution of dependent steps
|
|
- Automatic dependency resolution
|
|
|
|
### Error Handling
|
|
- Step-level retry policies
|
|
- Workflow-level error handlers
|
|
- Partial workflow recovery
|
|
|
|
### Data Flow
|
|
- Pass outputs between steps
|
|
- Transform data between steps
|
|
- Aggregate results from parallel steps
|
|
|
|
### Monitoring
|
|
- Real-time workflow status
|
|
- Step-level progress tracking
|
|
- Execution metrics and logs
|
|
|
|
## Workflow Lifecycle
|
|
|
|
1. **Submission**: Client submits workflow definition
|
|
2. **Validation**: Coordinator validates DAG structure
|
|
3. **Scheduling**: Determine ready steps (no pending dependencies)
|
|
4. **Dispatch**: Send jobs to supervisors
|
|
5. **Tracking**: Monitor step completion
|
|
6. **Progression**: Execute next ready steps
|
|
7. **Completion**: Workflow finishes when all steps complete
|
|
|
|
## Use Cases
|
|
|
|
### Data Pipelines
|
|
```yaml
|
|
Extract → Transform → Load
|
|
```
|
|
|
|
### CI/CD Workflows
|
|
```yaml
|
|
Build → Test → Deploy
|
|
```
|
|
|
|
### Multi-Stage Processing
|
|
```yaml
|
|
Fetch Data → Process → Validate → Store → Notify
|
|
```
|
|
|
|
### Parallel Execution
|
|
```yaml
|
|
┌─ Task A ─┐
|
|
Start ──┼─ Task B ─┼── Aggregate → Finish
|
|
└─ Task C ─┘
|
|
```
|
|
|
|
## Configuration
|
|
|
|
```bash
|
|
# Start coordinator
|
|
coordinator --port 9090 --redis-url redis://localhost:6379
|
|
|
|
# With multiple supervisors
|
|
coordinator --port 9090 \
|
|
--supervisor http://supervisor1:8080 \
|
|
--supervisor http://supervisor2:8080
|
|
```
|
|
|
|
## API
|
|
|
|
The Coordinator exposes an OpenRPC API:
|
|
|
|
- `submit_workflow`: Submit a new workflow
|
|
- `get_workflow_status`: Check workflow progress
|
|
- `list_workflows`: List all workflows
|
|
- `cancel_workflow`: Stop a running workflow
|
|
- `get_workflow_logs`: Retrieve execution logs
|
|
|
|
## Advantages
|
|
|
|
- **Declarative**: Define what to do, not how
|
|
- **Scalable**: Parallel execution across multiple supervisors
|
|
- **Resilient**: Automatic retry and error handling
|
|
- **Observable**: Real-time status and logging
|
|
- **Composable**: Reuse workflows as steps in larger workflows
|