add complete binary and benchmarking

This commit is contained in:
Timur Gordon
2025-11-18 20:39:25 +01:00
parent f66edba1d3
commit 4142f62e54
17 changed files with 2559 additions and 2 deletions

View File

@@ -2,6 +2,7 @@
resolver = "2"
members = [
"bin/coordinator",
"bin/horus",
"bin/osiris",
"bin/runners/osiris",
"bin/runners/sal",
@@ -35,6 +36,26 @@ lazy_static = { workspace = true }
escargot = "0.5"
ctrlc = "3.4"
[dev-dependencies]
criterion = { version = "0.5", features = ["async_tokio", "html_reports"] }
osiris-client = { path = "lib/clients/osiris" }
reqwest = { version = "0.12", features = ["json"] }
serde_json = { workspace = true }
uuid = { workspace = true }
chrono = { workspace = true }
[[bench]]
name = "horus_stack"
harness = false
[[bench]]
name = "stress_test"
harness = false
[[bench]]
name = "memory_usage"
harness = false
[workspace.package]
version = "0.1.0"
edition = "2024"

View File

@@ -4,6 +4,17 @@ Horus is a comprehensive workspace for Hero infrastructure components.
## Structure
In '/bin' you have the binaries for the components of Horus
- [Supervisor](./bin/supervisor)
- [Osiris Runner](./bin/runner)
- [SAL Runner](./bin/runner)
- [Hero Runner](./bin/runner)
In '/lib' you have shared libraries.
- [Clients](./lib/clients)
## Structure
```
horus/
├── bin/
@@ -15,7 +26,7 @@ horus/
## Components
### Hero Supervisor (`bin/supervisor`)
### Supervisor (`bin/supervisor`)
The Hero Supervisor manages job execution across distributed runners with:
- Job lifecycle management (create, start, stop, delete)
@@ -25,6 +36,19 @@ The Hero Supervisor manages job execution across distributed runners with:
- OpenRPC JSON-RPC API with authentication
- CORS-enabled HTTP server
### Coordinator (`bin/supervisor`)
The Hero Supervisor manages job execution across distributed runners with:
- Job lifecycle management (create, start, stop, delete)
- Runner registration and management
- Redis-based job queuing
- Osiris integration for persistent storage
- OpenRPC JSON-RPC API with authentication
- CORS-enabled HTTP server
### Supervisor Client (`lib/clients/supervisor`)
OpenRPC client library for Hero Supervisor with dual-target support:

View File

@@ -0,0 +1,217 @@
# Memory Usage Benchmarks
Benchmarks for measuring memory consumption of the Horus stack components.
## Overview
The memory benchmarks measure heap memory usage for various operations:
- Job creation and storage
- Client instantiation
- Payload size impact
- Memory growth under load
## Benchmarks
### 1. `memory_job_creation`
Measures memory usage when creating multiple Job objects in memory.
**Test sizes**: 10, 50, 100, 200 jobs
**What it measures**:
- Memory allocated per job object
- Heap growth with increasing job count
- Memory efficiency of Job structure
**Expected results**:
- Linear memory growth with job count
- ~1-2 KB per job object (depending on payload)
### 2. `memory_client_creation`
Measures memory overhead of creating multiple Supervisor client instances.
**Test sizes**: 1, 10, 50, 100 clients
**What it measures**:
- Memory per client instance
- Connection pool overhead
- HTTP client memory footprint
**Expected results**:
- ~10-50 KB per client instance
- Includes HTTP client, connection pools, and buffers
### 3. `memory_payload_sizes`
Measures memory usage with different payload sizes.
**Test sizes**: 1KB, 10KB, 100KB, 1MB payloads
**What it measures**:
- Memory overhead of JSON serialization
- String allocation costs
- Payload storage efficiency
**Expected results**:
- Memory usage should scale linearly with payload size
- Small overhead for JSON structure (~5-10%)
## Running Memory Benchmarks
```bash
# Run all memory benchmarks
cargo bench --bench memory_usage
# Run specific memory test
cargo bench --bench memory_usage -- memory_job_creation
# Run with verbose output to see memory deltas
cargo bench --bench memory_usage -- --verbose
```
## Interpreting Results
The benchmarks print memory deltas to stderr during execution:
```
Memory delta for 100 jobs: 156 KB
Memory delta for 50 clients: 2048 KB
Memory delta for 100KB payload: 105 KB
```
### Memory Delta Interpretation
- **Positive delta**: Memory was allocated during the operation
- **Zero delta**: No significant memory change (may be reusing existing allocations)
- **Negative delta**: Memory was freed (garbage collection, deallocations)
### Platform Differences
**macOS**: Uses `ps` command to read RSS (Resident Set Size)
**Linux**: Reads `/proc/self/status` for VmRSS
RSS includes:
- Heap allocations
- Stack memory
- Shared libraries (mapped into process)
- Memory-mapped files
## Limitations
1. **Granularity**: OS-level memory reporting may not capture small allocations
2. **Timing**: Memory measurements happen before/after operations, not continuously
3. **GC effects**: Rust's allocator may not immediately release memory to OS
4. **Shared memory**: RSS includes shared library memory
## Best Practices
### For Accurate Measurements
1. **Run multiple iterations**: Criterion handles this automatically
2. **Warm up**: First iterations may show higher memory due to lazy initialization
3. **Isolate tests**: Run memory benchmarks separately from performance benchmarks
4. **Monitor trends**: Compare results over time, not absolute values
### Memory Optimization Tips
If benchmarks show high memory usage:
1. **Check payload sizes**: Large payloads consume proportional memory
2. **Limit concurrent operations**: Too many simultaneous jobs/clients increase memory
3. **Review data structures**: Ensure efficient serialization
4. **Profile with tools**: Use `heaptrack` (Linux) or `instruments` (macOS) for detailed analysis
## Advanced Profiling
For detailed memory profiling beyond these benchmarks:
### macOS
```bash
# Use Instruments
instruments -t Allocations -D memory_trace.trace ./target/release/horus
# Use heap profiler
cargo install cargo-instruments
cargo instruments --bench memory_usage --template Allocations
```
### Linux
```bash
# Use Valgrind massif
valgrind --tool=massif --massif-out-file=massif.out \
./target/release/deps/memory_usage-*
# Visualize with massif-visualizer
massif-visualizer massif.out
# Use heaptrack
heaptrack ./target/release/deps/memory_usage-*
heaptrack_gui heaptrack.memory_usage.*.gz
```
### Cross-platform
```bash
# Use dhat (heap profiler)
cargo install dhat
# Add dhat to your benchmark and run
cargo bench --bench memory_usage --features dhat-heap
```
## Continuous Monitoring
Integrate memory benchmarks into CI/CD:
```bash
# Run and save baseline
cargo bench --bench memory_usage -- --save-baseline memory-main
# Compare in PR
cargo bench --bench memory_usage -- --baseline memory-main
# Fail if memory usage increases >10%
# (requires custom scripting to parse Criterion output)
```
## Troubleshooting
### "Memory delta is always 0"
- OS may not update RSS immediately
- Allocations might be too small to measure
- Try increasing iteration count or operation size
### "Memory keeps growing"
- Check for memory leaks
- Verify objects are being dropped
- Use `cargo clippy` to find potential issues
### "Results are inconsistent"
- Other processes may be affecting measurements
- Run benchmarks on idle system
- Increase sample size in benchmark code
## Example Output
```
memory_job_creation/10 time: [45.2 µs 46.1 µs 47.3 µs]
Memory delta for 10 jobs: 24 KB
memory_job_creation/50 time: [198.4 µs 201.2 µs 204.8 µs]
Memory delta for 50 jobs: 98 KB
memory_job_creation/100 time: [387.6 µs 392.1 µs 397.4 µs]
Memory delta for 100 jobs: 187 KB
memory_client_creation/1 time: [234.5 µs 238.2 µs 242.6 µs]
Memory delta for 1 clients: 45 KB
memory_payload_sizes/1KB time: [12.3 µs 12.6 µs 13.0 µs]
Memory delta for 1KB payload: 2 KB
memory_payload_sizes/100KB time: [156.7 µs 159.4 µs 162.8 µs]
Memory delta for 100KB payload: 105 KB
```
## Related Documentation
- [Performance Benchmarks](./README.md)
- [Stress Tests](./README.md#stress-tests)
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)

129
benches/QUICK_START.md Normal file
View File

@@ -0,0 +1,129 @@
# Horus Benchmarks - Quick Start
## 1. Start the Stack
```bash
# Terminal 1: Start Redis
redis-server
# Terminal 2: Start Horus
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```
## 2. Run Benchmarks
### Option A: Use the helper script (recommended)
```bash
./benches/run_benchmarks.sh
```
### Option B: Run directly with cargo
```bash
# All benchmarks
cargo bench
# Specific benchmark suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
# Specific test
cargo bench --bench horus_stack -- supervisor_discovery
# Quick run (fewer samples)
cargo bench -- --quick
```
## 3. View Results
```bash
# Open HTML report in browser
open target/criterion/report/index.html
# Or on Linux
xdg-open target/criterion/report/index.html
```
## Available Benchmark Suites
### `horus_stack` - Standard Performance Tests
- API discovery and metadata
- Runner management
- Job operations
- Concurrency tests
- Health checks
- API latency measurements
### `stress_test` - Load & Stress Tests
- High-frequency job submissions (50-200 jobs)
- Sustained load testing
- Large payload handling (1KB-100KB)
- Rapid API calls (100 calls/test)
- Mixed workload scenarios
- Connection pool exhaustion (10-100 clients)
### `memory_usage` - Memory Profiling
- Job object memory footprint (10-200 jobs)
- Client instance memory overhead (1-100 clients)
- Payload size impact on memory (1KB-1MB)
- Memory growth patterns under load
## Common Commands
```bash
# Run only fast benchmarks
cargo bench -- --quick
# Save baseline for comparison
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
# Run with verbose output
cargo bench -- --verbose
# Filter by name
cargo bench -- concurrent
cargo bench -- stress
# Run specific benchmark group
cargo bench --bench horus_stack -- api_latency
# Run memory benchmarks
cargo bench --bench memory_usage
# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose
```
## Troubleshooting
**"Connection refused"**
- Make sure Horus stack is running
- Check ports: 3030 (supervisor), 8081 (osiris), 9652/9653 (coordinator)
**"Job timeout"**
- Increase timeout in benchmark code
- Check that runners are registered: `curl http://127.0.0.1:3030` (requires POST)
**Slow benchmarks**
- Close other applications
- Use `--quick` flag for faster runs
- Reduce sample size in benchmark code
## Performance Expectations
| Test | Expected Time |
|------|---------------|
| supervisor_discovery | < 10ms |
| supervisor_get_info | < 5ms |
| job_full_lifecycle | < 100ms |
| concurrent_jobs (10) | < 500ms |
| stress_high_frequency (50) | < 2s |
## Next Steps
- See `benches/README.md` for detailed documentation
- Modify `benches/horus_stack.rs` to add custom tests
- Check `target/criterion/` for detailed reports

206
benches/README.md Normal file
View File

@@ -0,0 +1,206 @@
# Horus Stack Benchmarks
Comprehensive benchmark suite for the entire Horus stack, testing performance through the client APIs.
## Overview
These benchmarks test the full Horus system including:
- **Supervisor API** - Job management, runner coordination
- **Coordinator API** - Job routing and execution
- **Osiris API** - REST API for data queries
All benchmarks interact with the stack through the official client libraries in `/lib/clients`, which is the only supported way to interact with the system.
## Prerequisites
Before running benchmarks, you must have the Horus stack running:
```bash
# Start Redis
redis-server
# Start all Horus services
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```
The benchmarks expect:
- **Supervisor** running on `http://127.0.0.1:3030`
- **Coordinator** running on `http://127.0.0.1:9652` (HTTP) and `ws://127.0.0.1:9653` (WebSocket)
- **Osiris** running on `http://127.0.0.1:8081`
- **Redis** running on `127.0.0.1:6379`
- Admin secret: `SECRET`
## Running Benchmarks
### Run all benchmarks
```bash
cargo bench --bench horus_stack
```
### Run specific benchmark
```bash
cargo bench --bench horus_stack -- supervisor_discovery
```
### Run with specific filter
```bash
cargo bench --bench horus_stack -- concurrent
```
### Generate detailed reports
```bash
cargo bench --bench horus_stack -- --verbose
```
## Benchmark Categories
### 1. API Discovery & Metadata (`horus_stack`)
- `supervisor_discovery` - OpenRPC metadata retrieval
- `supervisor_get_info` - Supervisor information and stats
### 2. Runner Management (`horus_stack`)
- `supervisor_list_runners` - List all registered runners
- `get_all_runner_status` - Get status of all runners
### 3. Job Operations (`horus_stack`)
- `supervisor_job_create` - Create job without execution
- `supervisor_job_list` - List all jobs
- `job_full_lifecycle` - Complete job lifecycle (create → execute → result)
### 4. Concurrency Tests (`horus_stack`)
- `concurrent_jobs` - Submit multiple jobs concurrently (1, 5, 10, 20 jobs)
### 5. Health & Monitoring (`horus_stack`)
- `osiris_health_check` - Osiris server health endpoint
### 6. API Latency (`horus_stack`)
- `api_latency/supervisor_info` - Supervisor info latency
- `api_latency/runner_list` - Runner list latency
- `api_latency/job_list` - Job list latency
### 7. Stress Tests (`stress_test`)
- `stress_high_frequency_jobs` - High-frequency submissions (50-200 jobs)
- `stress_sustained_load` - Continuous load testing
- `stress_large_payloads` - Large payload handling (1KB-100KB)
- `stress_rapid_api_calls` - Rapid API calls (100 calls/iteration)
- `stress_mixed_workload` - Mixed operation scenarios
- `stress_connection_pool` - Connection pool exhaustion (10-100 clients)
### 8. Memory Usage (`memory_usage`)
- `memory_job_creation` - Memory per job object (10-200 jobs)
- `memory_client_creation` - Memory per client instance (1-100 clients)
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
See [MEMORY_BENCHMARKS.md](./MEMORY_BENCHMARKS.md) for detailed memory profiling documentation.
## Interpreting Results
Criterion outputs detailed statistics including:
- **Mean time** - Average execution time
- **Std deviation** - Variability in measurements
- **Median** - Middle value (50th percentile)
- **MAD** - Median Absolute Deviation
- **Throughput** - Operations per second
Results are saved in `target/criterion/` with:
- HTML reports with graphs
- JSON data for further analysis
- Historical comparison with previous runs
## Performance Targets
Expected performance (on modern hardware):
| Benchmark | Target | Notes |
|-----------|--------|-------|
| supervisor_discovery | < 10ms | Metadata retrieval |
| supervisor_get_info | < 5ms | Simple info query |
| supervisor_list_runners | < 5ms | List operation |
| supervisor_job_create | < 10ms | Job creation only |
| job_full_lifecycle | < 100ms | Full execution cycle |
| osiris_health_check | < 2ms | Health endpoint |
| concurrent_jobs (10) | < 500ms | 10 parallel jobs |
## Customization
To modify benchmark parameters, edit `benches/horus_stack.rs`:
```rust
// Change URLs
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const OSIRIS_URL: &str = "http://127.0.0.1:8081";
// Change admin secret
const ADMIN_SECRET: &str = "SECRET";
// Adjust concurrent job counts
for num_jobs in [1, 5, 10, 20, 50].iter() {
// ...
}
```
## CI/CD Integration
To run benchmarks in CI without the full stack:
```bash
# Run only fast benchmarks
cargo bench --bench horus_stack -- --quick
# Save baseline for comparison
cargo bench --bench horus_stack -- --save-baseline main
# Compare against baseline
cargo bench --bench horus_stack -- --baseline main
```
## Troubleshooting
### "Connection refused" errors
- Ensure the Horus stack is running
- Check that all services are listening on expected ports
- Verify firewall settings
### "Job execution timeout" errors
- Increase timeout values in benchmark code
- Check that runners are properly registered
- Verify Redis is accessible
### Inconsistent results
- Close other applications to reduce system load
- Run benchmarks multiple times for statistical significance
- Use `--warm-up-time` flag to increase warm-up period
## Adding New Benchmarks
To add a new benchmark:
1. Create a new function in `benches/horus_stack.rs`:
```rust
fn bench_my_feature(c: &mut Criterion) {
let rt = create_runtime();
let client = /* create client */;
c.bench_function("my_feature", |b| {
b.to_async(&rt).iter(|| async {
// Your benchmark code
});
});
}
```
2. Add to the criterion_group:
```rust
criterion_group!(
benches,
// ... existing benchmarks
bench_my_feature,
);
```
## Resources
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [Horus Client Documentation](../lib/clients/)
- [Performance Tuning Guide](../docs/performance.md)

195
benches/SUMMARY.md Normal file
View File

@@ -0,0 +1,195 @@
# Horus Stack Benchmarks - Summary
## ✅ Created Comprehensive Benchmark Suite
Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.
### Files Created
1. **`benches/horus_stack.rs`** - Main benchmark suite
- API discovery and metadata retrieval
- Runner management operations
- Job lifecycle testing
- Concurrent job submissions (1, 5, 10, 20 jobs)
- Health checks
- API latency measurements
2. **`benches/stress_test.rs`** - Stress and load testing
- High-frequency job submissions (50-200 jobs)
- Sustained load testing
- Large payload handling (1KB-100KB)
- Rapid API calls (100 calls/iteration)
- Mixed workload scenarios
- Connection pool exhaustion tests (10-100 clients)
3. **`benches/memory_usage.rs`** - Memory profiling
- Job object memory footprint (10-200 jobs)
- Client instance memory overhead (1-100 clients)
- Payload size impact on memory (1KB-1MB)
- Real-time memory delta reporting
4. **`benches/README.md`** - Comprehensive documentation
- Setup instructions
- Benchmark descriptions
- Performance targets
- Customization guide
- Troubleshooting tips
5. **`benches/QUICK_START.md`** - Quick reference guide
- Fast setup steps
- Common commands
- Expected performance metrics
6. **`benches/MEMORY_BENCHMARKS.md`** - Memory profiling guide
- Memory benchmark descriptions
- Platform-specific measurement details
- Advanced profiling tools
- Memory optimization tips
7. **`benches/run_benchmarks.sh`** - Helper script
- Automated prerequisite checking
- Service health verification
- One-command benchmark execution
### Architecture
The benchmarks interact with the Horus stack exclusively through the client libraries:
- **`hero-supervisor-openrpc-client`** - Supervisor API (job management, runner coordination)
- **`osiris-client`** - Osiris REST API (data queries)
- **`hero-job`** - Job model definitions
This ensures benchmarks test the real-world API surface that users interact with.
### Key Features
**Async/await support** - Uses Criterion's async_tokio feature
**Realistic workloads** - Tests actual job submission and execution
**Concurrent testing** - Measures performance under parallel load
**Stress testing** - Pushes system limits with high-frequency operations
**HTML reports** - Beautiful visualizations with historical comparison
**Automated checks** - Helper script verifies stack is running
### Benchmark Categories
#### Performance Benchmarks (`horus_stack`)
- `supervisor_discovery` - OpenRPC metadata (target: <10ms)
- `supervisor_get_info` - Info retrieval (target: <5ms)
- `supervisor_list_runners` - List operations (target: <5ms)
- `supervisor_job_create` - Job creation (target: <10ms)
- `supervisor_job_list` - Job listing (target: <10ms)
- `osiris_health_check` - Health endpoint (target: <2ms)
- `job_full_lifecycle` - Complete job cycle (target: <100ms)
- `concurrent_jobs` - Parallel submissions (target: <500ms for 10 jobs)
- `get_all_runner_status` - Status queries
- `api_latency/*` - Detailed latency measurements
#### Stress Tests (`stress_test`)
- `stress_high_frequency_jobs` - 50-200 concurrent jobs
- `stress_sustained_load` - Continuous submissions over time
- `stress_large_payloads` - 1KB-100KB payload handling
- `stress_rapid_api_calls` - 100 rapid calls per iteration
- `stress_mixed_workload` - Combined operations
- `stress_connection_pool` - 10-100 concurrent clients
#### Memory Profiling (`memory_usage`)
- `memory_job_creation` - Memory footprint per job (10-200 jobs)
- `memory_client_creation` - Memory per client instance (1-100 clients)
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
- Reports memory deltas in real-time during execution
### Usage
```bash
# Quick start
./benches/run_benchmarks.sh
# Run specific suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
cargo bench --bench memory_usage
# Run specific test
cargo bench -- supervisor_discovery
# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose
# Save baseline
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
```
### Prerequisites
The benchmarks require the full Horus stack to be running:
```bash
# Start Redis
redis-server
# Start Horus (with auto port cleanup)
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```
### Configuration
All benchmarks use these defaults (configurable in source):
- Supervisor: `http://127.0.0.1:3030`
- Osiris: `http://127.0.0.1:8081`
- Coordinator HTTP: `http://127.0.0.1:9652`
- Coordinator WS: `ws://127.0.0.1:9653`
- Admin secret: `SECRET`
### Results
Results are saved to `target/criterion/` with:
- HTML reports with graphs and statistics
- JSON data for programmatic analysis
- Historical comparison with previous runs
- Detailed performance metrics (mean, median, std dev, throughput)
### Integration
The benchmarks are integrated into the workspace:
- Added to `Cargo.toml` with proper dependencies
- Uses workspace-level dependencies for consistency
- Configured with `harness = false` for Criterion
- Includes all necessary dev-dependencies
### Next Steps
1. Run benchmarks to establish baseline performance
2. Monitor performance over time as code changes
3. Use stress tests to identify bottlenecks
4. Customize benchmarks for specific use cases
5. Integrate into CI/CD for automated performance tracking
## Technical Details
### Dependencies Added
- `criterion` v0.5 with async_tokio and html_reports features
- `osiris-client` from workspace
- `reqwest` v0.12 with json feature
- `serde_json`, `uuid`, `chrono` from workspace
### Benchmark Harness
Uses Criterion.rs for:
- Statistical analysis
- Historical comparison
- HTML report generation
- Configurable sample sizes
- Warm-up periods
- Outlier detection
### Job Creation
Helper function `create_test_job()` creates properly structured Job instances:
- Unique UUIDs for each job
- Proper timestamps
- JSON-serialized payloads
- Empty signatures (for testing)
- Configurable runner and command
This ensures benchmarks test realistic job structures that match production usage.

324
benches/horus_stack.rs Normal file
View File

@@ -0,0 +1,324 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
use hero_job::Job;
use tokio::runtime::Runtime;
use std::time::Duration;
use std::collections::HashMap;
use uuid::Uuid;
use chrono::Utc;
/// Benchmark configuration
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const OSIRIS_URL: &str = "http://127.0.0.1:8081";
const ADMIN_SECRET: &str = "SECRET";
/// Helper to create a tokio runtime for benchmarks
fn create_runtime() -> Runtime {
Runtime::new().unwrap()
}
/// Helper to create a test job
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
Job {
id: Uuid::new_v4().to_string(),
caller_id: "benchmark".to_string(),
context_id: "test".to_string(),
payload: serde_json::json!({
"command": command,
"args": args
}).to_string(),
runner: runner.to_string(),
timeout: 30,
env_vars: HashMap::new(),
created_at: Utc::now(),
updated_at: Utc::now(),
signatures: vec![],
}
}
/// Benchmark: Supervisor discovery (OpenRPC metadata)
fn bench_supervisor_discovery(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_discovery", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.discover().await.expect("Discovery failed"))
});
});
}
/// Benchmark: Supervisor info retrieval
fn bench_supervisor_info(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_get_info", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.get_supervisor_info().await.expect("Get info failed"))
});
});
}
/// Benchmark: List runners
fn bench_list_runners(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_list_runners", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.runner_list().await.expect("List runners failed"))
});
});
}
/// Benchmark: Job creation (without execution)
fn bench_job_create(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner exists
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
c.bench_function("supervisor_job_create", |b| {
b.to_async(&rt).iter(|| async {
let job = create_test_job("hero", "echo", vec!["hello".to_string()]);
black_box(client.job_create(job).await.expect("Job create failed"))
});
});
}
/// Benchmark: Job listing
fn bench_job_list(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_job_list", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.job_list().await.expect("Job list failed"))
});
});
}
/// Benchmark: Osiris health check
fn bench_osiris_health(c: &mut Criterion) {
let rt = create_runtime();
let client = reqwest::Client::new();
c.bench_function("osiris_health_check", |b| {
b.to_async(&rt).iter(|| async {
let url = format!("{}/health", OSIRIS_URL);
black_box(
client
.get(&url)
.send()
.await
.expect("Health check failed")
.json::<serde_json::Value>()
.await
.expect("JSON parse failed")
)
});
});
}
/// Benchmark: Full job lifecycle (create, start, wait for result)
fn bench_job_lifecycle(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(60))
.build()
.expect("Failed to create supervisor client")
});
// First ensure we have a runner registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
c.bench_function("job_full_lifecycle", |b| {
b.to_async(&rt).iter(|| async {
let job = create_test_job("hero", "echo", vec!["benchmark_test".to_string()]);
// Start job and wait for result
black_box(
client
.job_run(job, Some(30))
.await
.expect("Job run failed")
)
});
});
}
/// Benchmark: Concurrent job submissions
fn bench_concurrent_jobs(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(60))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("concurrent_jobs");
for num_jobs in [1, 5, 10, 20].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_jobs),
num_jobs,
|b, &num_jobs| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
for i in 0..num_jobs {
let client = client.clone();
let handle = tokio::spawn(async move {
let job = create_test_job("hero", "echo", vec![format!("job_{}", i)]);
client.job_create(job).await
});
handles.push(handle);
}
// Wait for all jobs to be submitted
for handle in handles {
black_box(handle.await.expect("Task failed").expect("Job start failed"));
}
});
},
);
}
group.finish();
}
/// Benchmark: Runner status checks
fn bench_runner_status(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
// Ensure we have runners
rt.block_on(async {
let _ = client.runner_create("hero").await;
let _ = client.runner_create("osiris").await;
});
c.bench_function("get_all_runner_status", |b| {
b.to_async(&rt).iter(|| async {
black_box(
client
.get_all_runner_status()
.await
.expect("Get status failed")
)
});
});
}
/// Benchmark: API response time under load
fn bench_api_latency(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
let mut group = c.benchmark_group("api_latency");
group.measurement_time(Duration::from_secs(10));
group.bench_function("supervisor_info", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.get_supervisor_info().await.expect("Failed"))
});
});
group.bench_function("runner_list", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.runner_list().await.expect("Failed"))
});
});
group.bench_function("job_list", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.job_list().await.expect("Failed"))
});
});
group.finish();
}
criterion_group!(
benches,
bench_supervisor_discovery,
bench_supervisor_info,
bench_list_runners,
bench_job_create,
bench_job_list,
bench_osiris_health,
bench_job_lifecycle,
bench_concurrent_jobs,
bench_runner_status,
bench_api_latency,
);
criterion_main!(benches);

210
benches/memory_usage.rs Normal file
View File

@@ -0,0 +1,210 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
use hero_job::Job;
use tokio::runtime::Runtime;
use std::time::Duration;
use std::collections::HashMap;
use uuid::Uuid;
use chrono::Utc;
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const ADMIN_SECRET: &str = "SECRET";
fn create_runtime() -> Runtime {
Runtime::new().unwrap()
}
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
Job {
id: Uuid::new_v4().to_string(),
caller_id: "benchmark".to_string(),
context_id: "test".to_string(),
payload: serde_json::json!({
"command": command,
"args": args
}).to_string(),
runner: runner.to_string(),
timeout: 30,
env_vars: HashMap::new(),
created_at: Utc::now(),
updated_at: Utc::now(),
signatures: vec![],
}
}
#[cfg(target_os = "macos")]
fn get_memory_usage() -> Option<usize> {
use std::process::Command;
let output = Command::new("ps")
.args(&["-o", "rss=", "-p", &std::process::id().to_string()])
.output()
.ok()?;
String::from_utf8(output.stdout)
.ok()?
.trim()
.parse::<usize>()
.ok()
.map(|kb| kb * 1024)
}
#[cfg(target_os = "linux")]
fn get_memory_usage() -> Option<usize> {
use std::fs;
let status = fs::read_to_string("/proc/self/status").ok()?;
for line in status.lines() {
if line.starts_with("VmRSS:") {
let kb = line.split_whitespace().nth(1)?.parse::<usize>().ok()?;
return Some(kb * 1024);
}
}
None
}
fn memory_job_creation(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create client")
});
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("memory_job_creation");
for num_jobs in [10, 50, 100, 200].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_jobs),
num_jobs,
|b, &num_jobs| {
b.iter_custom(|iters| {
let mut total_duration = Duration::ZERO;
for _ in 0..iters {
let mem_before = get_memory_usage().unwrap_or(0);
let start = std::time::Instant::now();
rt.block_on(async {
let mut jobs = Vec::new();
for i in 0..num_jobs {
let job = create_test_job("hero", "echo", vec![format!("mem_test_{}", i)]);
jobs.push(job);
}
black_box(jobs);
});
total_duration += start.elapsed();
let mem_after = get_memory_usage().unwrap_or(0);
let mem_delta = mem_after.saturating_sub(mem_before);
if mem_delta > 0 {
eprintln!("Memory delta for {} jobs: {} KB", num_jobs, mem_delta / 1024);
}
}
total_duration
});
},
);
}
group.finish();
}
fn memory_client_creation(c: &mut Criterion) {
let rt = create_runtime();
let mut group = c.benchmark_group("memory_client_creation");
for num_clients in [1, 10, 50, 100].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_clients),
num_clients,
|b, &num_clients| {
b.iter_custom(|iters| {
let mut total_duration = Duration::ZERO;
for _ in 0..iters {
let mem_before = get_memory_usage().unwrap_or(0);
let start = std::time::Instant::now();
rt.block_on(async {
let mut clients = Vec::new();
for _ in 0..num_clients {
let client = SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create client");
clients.push(client);
}
black_box(clients);
});
total_duration += start.elapsed();
let mem_after = get_memory_usage().unwrap_or(0);
let mem_delta = mem_after.saturating_sub(mem_before);
if mem_delta > 0 {
eprintln!("Memory delta for {} clients: {} KB", num_clients, mem_delta / 1024);
}
}
total_duration
});
},
);
}
group.finish();
}
fn memory_payload_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("memory_payload_sizes");
for size_kb in [1, 10, 100, 1000].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(format!("{}KB", size_kb)),
size_kb,
|b, &size_kb| {
b.iter_custom(|iters| {
let mut total_duration = Duration::ZERO;
for _ in 0..iters {
let mem_before = get_memory_usage().unwrap_or(0);
let start = std::time::Instant::now();
let large_data = "x".repeat(size_kb * 1024);
let job = create_test_job("hero", "echo", vec![large_data]);
black_box(job);
total_duration += start.elapsed();
let mem_after = get_memory_usage().unwrap_or(0);
let mem_delta = mem_after.saturating_sub(mem_before);
if mem_delta > 0 {
eprintln!("Memory delta for {}KB payload: {} KB", size_kb, mem_delta / 1024);
}
}
total_duration
});
},
);
}
group.finish();
}
criterion_group!(
memory_benches,
memory_job_creation,
memory_client_creation,
memory_payload_sizes,
);
criterion_main!(memory_benches);

113
benches/run_benchmarks.sh Executable file
View File

@@ -0,0 +1,113 @@
#!/bin/bash
# Horus Stack Benchmark Runner
# This script ensures the Horus stack is running before executing benchmarks
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Configuration
SUPERVISOR_URL="http://127.0.0.1:3030"
OSIRIS_URL="http://127.0.0.1:8081"
REDIS_URL="127.0.0.1:6379"
echo -e "${GREEN}=== Horus Stack Benchmark Runner ===${NC}\n"
# Function to check if a service is running
check_service() {
local url=$1
local name=$2
if curl -s -f "$url/health" > /dev/null 2>&1 || curl -s -f "$url" > /dev/null 2>&1; then
echo -e "${GREEN}${NC} $name is running"
return 0
else
echo -e "${RED}${NC} $name is not running"
return 1
fi
}
# Function to check if Redis is running
check_redis() {
if redis-cli -h 127.0.0.1 -p 6379 ping > /dev/null 2>&1; then
echo -e "${GREEN}${NC} Redis is running"
return 0
else
echo -e "${RED}${NC} Redis is not running"
return 1
fi
}
# Check prerequisites
echo "Checking prerequisites..."
echo ""
REDIS_OK=false
OSIRIS_OK=false
SUPERVISOR_OK=false
if check_redis; then
REDIS_OK=true
fi
if check_service "$OSIRIS_URL" "Osiris"; then
OSIRIS_OK=true
fi
if check_service "$SUPERVISOR_URL" "Supervisor"; then
SUPERVISOR_OK=true
fi
echo ""
# If any service is not running, provide instructions
if [ "$REDIS_OK" = false ] || [ "$OSIRIS_OK" = false ] || [ "$SUPERVISOR_OK" = false ]; then
echo -e "${YELLOW}Some services are not running. Please start the Horus stack:${NC}"
echo ""
if [ "$REDIS_OK" = false ]; then
echo " 1. Start Redis:"
echo " redis-server"
echo ""
fi
echo " 2. Start Horus stack:"
echo " cd $PROJECT_ROOT"
echo " RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports"
echo ""
echo " Or run in the background:"
echo " RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports &"
echo ""
read -p "Do you want to continue anyway? (y/N) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo -e "${RED}Benchmark cancelled.${NC}"
exit 1
fi
fi
# Build the project first
echo -e "${GREEN}Building project...${NC}"
cd "$PROJECT_ROOT"
cargo build --release
echo ""
echo -e "${GREEN}Running benchmarks...${NC}"
echo ""
# Run benchmarks with any additional arguments passed to this script
cargo bench --bench horus_stack "$@"
echo ""
echo -e "${GREEN}=== Benchmark Complete ===${NC}"
echo ""
echo "Results saved to: target/criterion/"
echo "View HTML reports: open target/criterion/report/index.html"

300
benches/stress_test.rs Normal file
View File

@@ -0,0 +1,300 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
use hero_job::Job;
use tokio::runtime::Runtime;
use std::time::Duration;
use std::collections::HashMap;
use uuid::Uuid;
use chrono::Utc;
/// Benchmark configuration
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const ADMIN_SECRET: &str = "SECRET";
/// Helper to create a tokio runtime for benchmarks
fn create_runtime() -> Runtime {
Runtime::new().unwrap()
}
/// Helper to create a test job
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
Job {
id: Uuid::new_v4().to_string(),
caller_id: "benchmark".to_string(),
context_id: "test".to_string(),
payload: serde_json::json!({
"command": command,
"args": args
}).to_string(),
runner: runner.to_string(),
timeout: 30,
env_vars: HashMap::new(),
created_at: Utc::now(),
updated_at: Utc::now(),
signatures: vec![],
}
}
/// Stress test: High-frequency job submissions
fn stress_high_frequency_jobs(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_high_frequency");
group.sample_size(10); // Fewer samples for stress tests
group.measurement_time(Duration::from_secs(20));
for num_jobs in [50, 100, 200].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_jobs),
num_jobs,
|b, &num_jobs| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
for i in 0..num_jobs {
let client = client.clone();
let handle = tokio::spawn(async move {
let job = create_test_job("hero", "echo", vec![format!("stress_{}", i)]);
client.job_create(job).await
});
handles.push(handle);
}
// Wait for all jobs to be submitted
for handle in handles {
let _ = black_box(handle.await);
}
});
},
);
}
group.finish();
}
/// Stress test: Sustained load over time
fn stress_sustained_load(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_sustained_load");
group.sample_size(10);
group.measurement_time(Duration::from_secs(30));
group.bench_function("continuous_submissions", |b| {
b.to_async(&rt).iter(|| async {
// Submit jobs continuously for the measurement period
for i in 0..20 {
let job = create_test_job("hero", "echo", vec![format!("sustained_{}", i)]);
let _ = black_box(client.job_create(job).await);
}
});
});
group.finish();
}
/// Stress test: Large payload handling
fn stress_large_payloads(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_large_payloads");
group.sample_size(10);
for size_kb in [1, 10, 100].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(format!("{}KB", size_kb)),
size_kb,
|b, &size_kb| {
b.to_async(&rt).iter(|| async {
// Create a large payload
let large_data = "x".repeat(size_kb * 1024);
let job = create_test_job("hero", "echo", vec![large_data]);
black_box(client.job_create(job).await.expect("Job create failed"))
});
},
);
}
group.finish();
}
/// Stress test: Rapid API calls
fn stress_rapid_api_calls(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
let mut group = c.benchmark_group("stress_rapid_api");
group.sample_size(10);
group.measurement_time(Duration::from_secs(15));
group.bench_function("rapid_info_calls", |b| {
b.to_async(&rt).iter(|| async {
// Make 100 rapid API calls
for _ in 0..100 {
let _ = black_box(client.get_supervisor_info().await);
}
});
});
group.bench_function("rapid_list_calls", |b| {
b.to_async(&rt).iter(|| async {
// Make 100 rapid list calls
for _ in 0..100 {
let _ = black_box(client.runner_list().await);
}
});
});
group.finish();
}
/// Stress test: Mixed workload
fn stress_mixed_workload(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_mixed_workload");
group.sample_size(10);
group.measurement_time(Duration::from_secs(25));
group.bench_function("mixed_operations", |b| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
// Mix of different operations
for i in 0..10 {
let client = client.clone();
// Job submission
let handle1 = tokio::spawn(async move {
let job = create_test_job("hero", "echo", vec![format!("mixed_{}", i)]);
client.job_create(job).await.map(|_| ())
});
handles.push(handle1);
}
// Wait for all operations
for handle in handles {
let _ = black_box(handle.await);
}
});
});
group.finish();
}
/// Stress test: Connection pool exhaustion
fn stress_connection_pool(c: &mut Criterion) {
let rt = create_runtime();
let mut group = c.benchmark_group("stress_connection_pool");
group.sample_size(10);
group.measurement_time(Duration::from_secs(20));
for num_clients in [10, 50, 100].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_clients),
num_clients,
|b, &num_clients| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
// Create many clients and make concurrent requests
for _ in 0..num_clients {
let handle = tokio::spawn(async move {
let client = SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create client");
client.get_supervisor_info().await
});
handles.push(handle);
}
// Wait for all requests
for handle in handles {
let _ = black_box(handle.await);
}
});
},
);
}
group.finish();
}
criterion_group!(
stress_tests,
stress_high_frequency_jobs,
stress_sustained_load,
stress_large_payloads,
stress_rapid_api_calls,
stress_mixed_workload,
stress_connection_pool,
);
criterion_main!(stress_tests);

40
bin/horus/Cargo.toml Normal file
View File

@@ -0,0 +1,40 @@
[package]
name = "horus-mono"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
repository.workspace = true
[[bin]]
name = "horus"
path = "src/main.rs"
[dependencies]
# Workspace dependencies
tokio = { workspace = true }
clap = { workspace = true }
log = { workspace = true }
env_logger = { workspace = true }
anyhow = { workspace = true }
# Internal dependencies - coordinator
hero-coordinator = { path = "../coordinator" }
hero-supervisor-openrpc-client = { path = "../../lib/clients/supervisor" }
# Internal dependencies - supervisor
hero-supervisor = { path = "../supervisor" }
# Internal dependencies - osiris server
osiris-core = { path = "../../lib/osiris/core" }
axum = "0.7"
tower = "0.4"
tower-http = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }
# Internal dependencies - runners
hero-runner = { path = "../../lib/runner" }
hero-job = { path = "../../lib/models/job" }

145
bin/horus/README.md Normal file
View File

@@ -0,0 +1,145 @@
# Horus - Hero System Mono Binary
A unified binary that runs all Hero system components: coordinator, supervisor, osiris server, and runners.
## Installation
Build the binary:
```bash
cargo build -p horus-mono --release
```
The binary will be available at `target/release/horus`.
## Usage
### Run Individual Services
#### Coordinator
Manages job execution across runners:
```bash
horus coordinator \
--mycelium-ip 127.0.0.1 \
--mycelium-port 8990 \
--redis-addr 127.0.0.1:6379 \
--api-http-ip 127.0.0.1 \
--api-http-port 9652 \
--api-ws-ip 127.0.0.1 \
--api-ws-port 9653
```
#### Supervisor
Manages actors and dispatches jobs:
```bash
horus supervisor \
--redis-url redis://127.0.0.1:6379 \
--admin-secret your-admin-secret \
--port 3030 \
--bind-address 127.0.0.1 \
--runners osiris,sal,hero
```
#### Osiris Server
REST API server for Osiris data structures:
```bash
horus osiris \
--bind-address 0.0.0.0 \
--port 8081
```
### Run All Services Together
Start all services with a single command:
```bash
horus all \
--redis-url redis://127.0.0.1:6379 \
--admin-secret your-admin-secret
```
**Kill existing processes on ports before starting:**
```bash
horus all \
--redis-url redis://127.0.0.1:6379 \
--admin-secret your-admin-secret \
--kill-ports
```
This will start:
- **Supervisor** on `http://127.0.0.1:3030`
- **Coordinator HTTP** on `http://127.0.0.1:9652`
- **Coordinator WebSocket** on `ws://127.0.0.1:9653`
- **Osiris Server** on `http://0.0.0.0:8081`
The `--kill-ports` flag will automatically kill any processes using ports 3030, 8081, 9652, and 9653 before starting the services.
## Environment Variables
You can also configure services using environment variables:
### Coordinator
- `MYCELIUM_IP` - Mycelium IP address (default: 127.0.0.1)
- `MYCELIUM_PORT` - Mycelium port (default: 8990)
- `REDIS_ADDR` - Redis address (default: 127.0.0.1:6379)
- `API_HTTP_IP` - HTTP API bind IP (default: 127.0.0.1)
- `API_HTTP_PORT` - HTTP API port (default: 9652)
- `API_WS_IP` - WebSocket API bind IP (default: 127.0.0.1)
- `API_WS_PORT` - WebSocket API port (default: 9653)
### Logging
Set the `RUST_LOG` environment variable to control logging:
```bash
RUST_LOG=info horus all --admin-secret your-secret
```
Available levels: `error`, `warn`, `info`, `debug`, `trace`
## Prerequisites
- Redis server running on localhost:6379 (or specify custom address)
- For coordinator: Mycelium service running (if using Mycelium transport)
## Architecture
The horus binary consolidates the following components:
1. **Coordinator** - Routes jobs between contexts and manages job execution
2. **Supervisor** - Manages runner registration and job dispatching
3. **Osiris Server** - Provides REST API for Osiris data structures
4. **Runners** (not included in mono binary, run separately):
- OSIRIS runner - Script execution with Osiris support
- SAL runner - Script execution with SAL support
- Hero runner - Command execution
## Examples
### Development Setup
```bash
# Start Redis
redis-server
# Run all services (kills any existing processes on required ports)
RUST_LOG=info horus all --admin-secret dev-secret --kill-ports
```
### Production Setup
```bash
# Build release binary
cargo build -p horus-mono --release
# Run with production settings
RUST_LOG=warn ./target/release/horus all \
--redis-url redis://prod-redis:6379 \
--admin-secret $ADMIN_SECRET
```
## Help
For detailed help on any command:
```bash
horus --help
horus coordinator --help
horus supervisor --help
horus osiris --help
horus all --help
```

569
bin/horus/src/main.rs Normal file
View File

@@ -0,0 +1,569 @@
//! Horus - Mono binary for running all Hero components
//!
//! This binary provides subcommands to run:
//! - coordinator: Job coordination service
//! - supervisor: Actor and job management
//! - osiris: REST API server
//! - runner-osiris: Osiris script runner
//! - runner-sal: SAL script runner
//! - runner-hero: Command execution runner
//! - all: Run all components together
use clap::{Parser, Subcommand};
#[derive(Parser)]
#[command(name = "horus")]
#[command(about = "Horus - Hero system mono binary", long_about = None)]
struct Cli {
#[command(subcommand)]
command: Commands,
}
#[derive(Subcommand)]
enum Commands {
/// Run the coordinator service
Coordinator {
#[arg(long, default_value = "127.0.0.1")]
mycelium_ip: String,
#[arg(long, default_value = "8990")]
mycelium_port: u16,
#[arg(long, default_value = "127.0.0.1:6379")]
redis_addr: String,
#[arg(long, default_value = "127.0.0.1")]
api_http_ip: String,
#[arg(long, default_value = "9652")]
api_http_port: u16,
#[arg(long, default_value = "127.0.0.1")]
api_ws_ip: String,
#[arg(long, default_value = "9653")]
api_ws_port: u16,
},
/// Run the supervisor service
Supervisor {
#[arg(long, default_value = "redis://127.0.0.1:6379")]
redis_url: String,
#[arg(long, default_value = "")]
namespace: String,
#[arg(long = "admin-secret", required = true)]
admin_secrets: Vec<String>,
#[arg(long = "user-secret")]
user_secrets: Vec<String>,
#[arg(long = "register-secret")]
register_secrets: Vec<String>,
#[arg(long, default_value = "3030")]
port: u16,
#[arg(long, default_value = "127.0.0.1")]
bind_address: String,
#[arg(long, value_delimiter = ',')]
runners: Vec<String>,
},
/// Run the Osiris REST API server
Osiris {
#[arg(long, default_value = "0.0.0.0")]
bind_address: String,
#[arg(long, default_value = "8081")]
port: u16,
},
/// Run all services together
All {
#[arg(long, default_value = "redis://127.0.0.1:6379")]
redis_url: String,
#[arg(long = "admin-secret", required = true)]
admin_secrets: Vec<String>,
#[arg(long, help = "Kill processes using required ports before starting")]
kill_ports: bool,
},
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cli = Cli::parse();
match cli.command {
Commands::Coordinator {
mycelium_ip,
mycelium_port,
redis_addr,
api_http_ip,
api_http_port,
api_ws_ip,
api_ws_port,
} => {
run_coordinator(
mycelium_ip,
mycelium_port,
redis_addr,
api_http_ip,
api_http_port,
api_ws_ip,
api_ws_port,
false,
).await?;
}
Commands::Supervisor {
redis_url,
namespace,
admin_secrets,
user_secrets,
register_secrets,
port,
bind_address,
runners,
} => {
run_supervisor(
redis_url,
namespace,
admin_secrets,
user_secrets,
register_secrets,
port,
bind_address,
runners,
false,
).await?;
}
Commands::Osiris { bind_address, port } => {
run_osiris(bind_address, port, false).await?;
}
Commands::All {
redis_url,
admin_secrets,
kill_ports,
} => {
run_all(redis_url, admin_secrets, kill_ports).await?;
}
}
Ok(())
}
async fn run_coordinator(
mycelium_ip: String,
mycelium_port: u16,
redis_addr: String,
api_http_ip: String,
api_http_port: u16,
api_ws_ip: String,
api_ws_port: u16,
skip_logging_init: bool,
) -> Result<(), Box<dyn std::error::Error>> {
use std::net::{IpAddr, SocketAddr};
use std::sync::Arc;
use tracing::{error, info};
use tracing_subscriber::EnvFilter;
if !skip_logging_init {
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));
tracing_subscriber::fmt()
.with_env_filter(filter)
.pretty()
.with_target(true)
.with_level(true)
.init();
}
let mycelium_ip: IpAddr = mycelium_ip.parse()?;
let api_http_ip: IpAddr = api_http_ip.parse()?;
let api_ws_ip: IpAddr = api_ws_ip.parse()?;
let redis_addr: SocketAddr = redis_addr.parse()?;
let http_addr = SocketAddr::new(api_http_ip, api_http_port);
let ws_addr = SocketAddr::new(api_ws_ip, api_ws_port);
let redis = hero_coordinator::storage::RedisDriver::new(redis_addr.to_string())
.await
.expect("Failed to connect to Redis");
let service = hero_coordinator::service::AppService::new(redis);
let service_for_router = service.clone();
let state = Arc::new(hero_coordinator::rpc::AppState::new(service));
// Only initialize router if not skipping logging (i.e., not in "all" mode)
// In "all" mode, we skip Mycelium since everything is local
if !skip_logging_init {
let base_url = format!("http://{}:{}", mycelium_ip, mycelium_port);
let mycelium = Arc::new(
hero_supervisor_openrpc_client::transports::MyceliumClient::new(&base_url)
.expect("Failed to create MyceliumClient")
);
let hub = hero_supervisor_openrpc_client::transports::SupervisorHub::new_with_client(
mycelium,
"supervisor.rpc".to_string(),
);
let cfg = hero_coordinator::router::RouterConfig {
context_ids: Vec::new(),
concurrency: 32,
base_url,
topic: "supervisor.rpc".to_string(),
sup_hub: hub.clone(),
transport_poll_interval_secs: 2,
transport_poll_timeout_secs: 300,
};
let _auto_handle = hero_coordinator::router::start_router_auto(service_for_router, cfg);
}
let http_module = hero_coordinator::rpc::build_module(state.clone());
let ws_module = hero_coordinator::rpc::build_module(state.clone());
info!(%http_addr, %ws_addr, %redis_addr, "Starting Coordinator JSON-RPC servers");
let _http_handle = match hero_coordinator::rpc::start_http(http_addr, http_module).await {
Ok(handle) => handle,
Err(e) => {
error!("Failed to start HTTP server on {}: {}", http_addr, e);
return Err(format!("Failed to start HTTP server: {}", e).into());
}
};
let _ws_handle = match hero_coordinator::rpc::start_ws(ws_addr, ws_module).await {
Ok(handle) => handle,
Err(e) => {
error!("Failed to start WS server on {}: {}", ws_addr, e);
return Err(format!("Failed to start WS server: {}", e).into());
}
};
if let Err(e) = tokio::signal::ctrl_c().await {
error!(error=%e, "Failed to listen for shutdown signal");
}
info!("Shutdown signal received, exiting.");
Ok(())
}
async fn run_supervisor(
_redis_url: String,
_namespace: String,
admin_secrets: Vec<String>,
user_secrets: Vec<String>,
register_secrets: Vec<String>,
port: u16,
bind_address: String,
runners: Vec<String>,
skip_logging_init: bool,
) -> Result<(), Box<dyn std::error::Error>> {
use hero_supervisor::SupervisorBuilder;
use log::{error, info};
if !skip_logging_init {
env_logger::init();
}
let mut builder = SupervisorBuilder::new()
.admin_secrets(admin_secrets);
if !user_secrets.is_empty() {
builder = builder.user_secrets(user_secrets);
}
if !register_secrets.is_empty() {
builder = builder.register_secrets(register_secrets);
}
let supervisor = builder.build().await?;
if !runners.is_empty() {
for runner_name in &runners {
match supervisor.runner_create(runner_name.clone()).await {
Ok(_) => {},
Err(e) => error!("Failed to register runner '{}': {}", runner_name, e),
}
}
}
use hero_supervisor::openrpc::start_http_openrpc_server;
let supervisor_clone = supervisor.clone();
let bind_addr = bind_address.clone();
tokio::spawn(async move {
match start_http_openrpc_server(supervisor_clone, &bind_addr, port).await {
Ok(handle) => {
handle.stopped().await;
error!("OpenRPC server stopped unexpectedly");
}
Err(e) => {
error!("OpenRPC server error: {}", e);
}
}
});
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
println!("📡 Supervisor: http://{}:{}", bind_address, port);
info!("Hero Supervisor is running. Press Ctrl+C to shutdown.");
tokio::spawn(async move {
tokio::signal::ctrl_c().await.expect("Failed to listen for ctrl+c");
info!("Received shutdown signal");
std::process::exit(0);
});
loop {
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
}
}
async fn run_osiris(
bind_address: String,
port: u16,
skip_logging_init: bool,
) -> Result<(), Box<dyn std::error::Error>> {
use axum::{
extract::{Path, Query, State},
http::StatusCode,
response::{IntoResponse, Json},
routing::get,
Router,
};
use serde_json::{json, Value};
use std::collections::HashMap;
use std::sync::Arc;
use tower_http::cors::{Any, CorsLayer};
use tracing::{info, warn};
if !skip_logging_init {
tracing_subscriber::fmt()
.with_target(false)
.compact()
.init();
}
#[derive(Clone)]
struct AppState {
store: Arc<tokio::sync::RwLock<HashMap<String, HashMap<String, Value>>>>,
}
impl AppState {
fn new() -> Self {
Self {
store: Arc::new(tokio::sync::RwLock::new(HashMap::new())),
}
}
}
async fn health_check() -> impl IntoResponse {
Json(json!({
"status": "healthy",
"service": "osiris-server",
"version": "0.1.0"
}))
}
async fn get_struct(
State(state): State<AppState>,
Path((struct_name, id)): Path<(String, String)>,
) -> Result<Json<Value>, (StatusCode, String)> {
info!("GET /api/{}/{}", struct_name, id);
let store = state.store.read().await;
if let Some(struct_store) = store.get(&struct_name) {
if let Some(data) = struct_store.get(&id) {
return Ok(Json(data.clone()));
}
}
warn!("Not found: {}/{}", struct_name, id);
Err((
StatusCode::NOT_FOUND,
format!("{}/{} not found", struct_name, id),
))
}
async fn list_structs(
State(state): State<AppState>,
Path(struct_name): Path<String>,
Query(params): Query<HashMap<String, String>>,
) -> Result<Json<Vec<Value>>, (StatusCode, String)> {
info!("GET /api/{} with params: {:?}", struct_name, params);
let store = state.store.read().await;
if let Some(struct_store) = store.get(&struct_name) {
let mut results: Vec<Value> = struct_store.values().cloned().collect();
if !params.is_empty() {
results.retain(|item| {
params.iter().all(|(key, value)| {
item.get(key)
.and_then(|v| v.as_str())
.map(|v| v == value)
.unwrap_or(false)
})
});
}
return Ok(Json(results));
}
Ok(Json(vec![]))
}
let state = AppState::new();
let app = Router::new()
.route("/health", get(health_check))
.route("/api/:struct_name", get(list_structs))
.route("/api/:struct_name/:id", get(get_struct))
.layer(
CorsLayer::new()
.allow_origin(Any)
.allow_methods(Any)
.allow_headers(Any),
)
.with_state(state);
let addr = format!("{}:{}", bind_address, port);
info!("🚀 Osiris Server starting on {}", addr);
let listener = tokio::net::TcpListener::bind(&addr)
.await
.expect("Failed to bind address");
axum::serve(listener, app)
.await
.expect("Server failed");
Ok(())
}
/// Kill any process using the specified port
async fn kill_port(port: u16) -> Result<(), Box<dyn std::error::Error>> {
use std::process::Command;
use log::info;
// Use lsof to find the process using the port
let output = Command::new("lsof")
.args(&["-ti", &format!(":{}", port)])
.output()?;
if !output.status.success() || output.stdout.is_empty() {
// No process found on this port
return Ok(());
}
let pid_str = String::from_utf8_lossy(&output.stdout);
let pids: Vec<&str> = pid_str.trim().lines().collect();
for pid in pids {
if let Ok(pid_num) = pid.trim().parse::<i32>() {
info!("Killing process {} on port {}", pid_num, port);
let _ = Command::new("kill")
.arg(pid)
.output();
}
}
Ok(())
}
async fn run_all(
redis_url: String,
admin_secrets: Vec<String>,
kill_ports: bool,
) -> Result<(), Box<dyn std::error::Error>> {
use log::{info, warn};
// Initialize logging once for all services
env_logger::init();
// Kill processes on required ports if requested
if kill_ports {
let ports = vec![3030, 8081, 9652, 9653];
info!("🔪 Killing processes on ports: {:?}", ports);
for port in ports {
if let Err(e) = kill_port(port).await {
warn!("Failed to kill port {}: {}", port, e);
}
}
// Give the OS a moment to release the ports
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
}
info!("🚀 Starting all Horus services...");
// Start Osiris server
let osiris_handle = tokio::spawn(async move {
if let Err(e) = run_osiris("0.0.0.0".to_string(), 8081, true).await {
eprintln!("Osiris server error: {}", e);
}
});
// Start Supervisor
let redis_url_clone = redis_url.clone();
let admin_secrets_clone = admin_secrets.clone();
let supervisor_handle = tokio::spawn(async move {
if let Err(e) = run_supervisor(
redis_url_clone,
"".to_string(),
admin_secrets_clone,
vec![],
vec![],
3030,
"127.0.0.1".to_string(),
vec!["osiris".to_string(), "sal".to_string(), "hero".to_string()],
true,
).await {
eprintln!("Supervisor error: {}", e);
}
});
// Give supervisor time to start
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
// Start Coordinator
let coordinator_handle = tokio::spawn(async move {
if let Err(e) = run_coordinator(
"127.0.0.1".to_string(),
8990,
"127.0.0.1:6379".to_string(),
"127.0.0.1".to_string(),
9652,
"127.0.0.1".to_string(),
9653,
true,
).await {
eprintln!("Coordinator error: {}", e);
}
});
info!("✅ All services started:");
info!(" 📡 Supervisor: http://127.0.0.1:3030");
info!(" 🔗 Coordinator HTTP: http://127.0.0.1:9652");
info!(" 🔗 Coordinator WS: ws://127.0.0.1:9653");
info!(" 🌐 Osiris: http://0.0.0.0:8081");
// Wait for all services
tokio::select! {
_ = osiris_handle => {},
_ = supervisor_handle => {},
_ = coordinator_handle => {},
}
Ok(())
}

View File

@@ -159,7 +159,7 @@ mod tests {
.payload("test payload")
.build()
.unwrap();
job.id = id.to_string(); // Set ID manually
// job.id = id.to_string(); // Set ID manually
job
}

46
scripts/configure.md Normal file
View File

@@ -0,0 +1,46 @@
# Horus Configuration Heroscript
## Configure Coordinator
!!coordinator.configure
name:'default'
binary_path:'/hero/var/bin/coordinator'
redis_addr:'127.0.0.1:6379'
http_port:8081
ws_port:9653
log_level:'info'
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
## Configure Supervisor
!!supervisor.configure
name:'default'
binary_path:'/hero/var/bin/supervisor'
redis_addr:'127.0.0.1:6379'
http_port:8082
ws_port:9654
log_level:'info'
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
## Configure Hero Runner
!!herorunner.configure
name:'default'
binary_path:'/hero/var/bin/herorunner'
redis_addr:'127.0.0.1:6379'
log_level:'info'
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
## Configure Osiris Runner
!!osirisrunner.configure
name:'default'
binary_path:'/hero/var/bin/runner_osiris'
redis_addr:'127.0.0.1:6379'
log_level:'info'
repo_path:'/root/code/git.ourworld.tf/herocode/horus'
## Configure SAL Runner
!!salrunner.configure
name:'default'
binary_path:'/hero/var/bin/runner_sal'
redis_addr:'127.0.0.1:6379'
log_level:'info'
repo_path:'/root/code/git.ourworld.tf/herocode/horus'

View File

@@ -0,0 +1,6 @@
// Install all components
!!herocoordinator.install
!!supervisor.install
!!herorunner.install
!!osirisrunner.install
!!salrunner.install

12
scripts/start.md Normal file
View File

@@ -0,0 +1,12 @@
# Horus Start Script
Starts all horus binaries
!!include install.md
// Start all services
!!herocoordinator.start name:'default'
!!supervisor.start name:'default'
!!herorunner.start name:'default'
!!osirisrunner.start name:'default'
!!salrunner.start name:'default'