horus/benches/README.md

# Horus Stack Benchmarks

Comprehensive benchmark suite for the entire Horus stack, testing performance through the client APIs.

## Overview

These benchmarks test the full Horus system including:
- **Supervisor API** - Job management, runner coordination
- **Coordinator API** - Job routing and execution
- **Osiris API** - REST API for data queries

All benchmarks interact with the stack through the official client libraries in `/lib/clients`, which is the only supported way to interact with the system.

## Prerequisites

Before running benchmarks, you must have the Horus stack running:

```bash
# Start Redis
redis-server

# Start all Horus services
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```

The benchmarks expect:
- **Supervisor** running on `http://127.0.0.1:3030`
- **Coordinator** running on `http://127.0.0.1:9652` (HTTP) and `ws://127.0.0.1:9653` (WebSocket)
- **Osiris** running on `http://127.0.0.1:8081`
- **Redis** running on `127.0.0.1:6379`
- Admin secret: `SECRET`

## Running Benchmarks

### Run all benchmarks
```bash
cargo bench --bench horus_stack
```

### Run specific benchmark
```bash
cargo bench --bench horus_stack -- supervisor_discovery
```

### Run with specific filter
```bash
cargo bench --bench horus_stack -- concurrent
```

### Generate detailed reports
```bash
cargo bench --bench horus_stack -- --verbose
```

## Benchmark Categories

### 1. API Discovery & Metadata (`horus_stack`)
- `supervisor_discovery` - OpenRPC metadata retrieval
- `supervisor_get_info` - Supervisor information and stats

### 2. Runner Management (`horus_stack`)
- `supervisor_list_runners` - List all registered runners
- `get_all_runner_status` - Get status of all runners

### 3. Job Operations (`horus_stack`)
- `supervisor_job_create` - Create job without execution
- `supervisor_job_list` - List all jobs
- `job_full_lifecycle` - Complete job lifecycle (create → execute → result)

### 4. Concurrency Tests (`horus_stack`)
- `concurrent_jobs` - Submit multiple jobs concurrently (1, 5, 10, 20 jobs)

### 5. Health & Monitoring (`horus_stack`)
- `osiris_health_check` - Osiris server health endpoint

### 6. API Latency (`horus_stack`)
- `api_latency/supervisor_info` - Supervisor info latency
- `api_latency/runner_list` - Runner list latency
- `api_latency/job_list` - Job list latency

### 7. Stress Tests (`stress_test`)
- `stress_high_frequency_jobs` - High-frequency submissions (50-200 jobs)
- `stress_sustained_load` - Continuous load testing
- `stress_large_payloads` - Large payload handling (1KB-100KB)
- `stress_rapid_api_calls` - Rapid API calls (100 calls/iteration)
- `stress_mixed_workload` - Mixed operation scenarios
- `stress_connection_pool` - Connection pool exhaustion (10-100 clients)

### 8. Memory Usage (`memory_usage`)
- `memory_job_creation` - Memory per job object (10-200 jobs)
- `memory_client_creation` - Memory per client instance (1-100 clients)
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)

See [MEMORY_BENCHMARKS.md](./MEMORY_BENCHMARKS.md) for detailed memory profiling documentation.

## Interpreting Results

Criterion outputs detailed statistics including:
- **Mean time** - Average execution time
- **Std deviation** - Variability in measurements
- **Median** - Middle value (50th percentile)
- **MAD** - Median Absolute Deviation
- **Throughput** - Operations per second

Results are saved in `target/criterion/` with:
- HTML reports with graphs
- JSON data for further analysis
- Historical comparison with previous runs

## Performance Targets

Expected performance (on modern hardware):

| Benchmark | Target | Notes |
|-----------|--------|-------|
| supervisor_discovery | < 10ms | Metadata retrieval |
| supervisor_get_info | < 5ms | Simple info query |
| supervisor_list_runners | < 5ms | List operation |
| supervisor_job_create | < 10ms | Job creation only |
| job_full_lifecycle | < 100ms | Full execution cycle |
| osiris_health_check | < 2ms | Health endpoint |
| concurrent_jobs (10) | < 500ms | 10 parallel jobs |

## Customization

To modify benchmark parameters, edit `benches/horus_stack.rs`:

```rust
// Change URLs
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const OSIRIS_URL: &str = "http://127.0.0.1:8081";

// Change admin secret
const ADMIN_SECRET: &str = "SECRET";

// Adjust concurrent job counts
for num_jobs in [1, 5, 10, 20, 50].iter() {
    // ...
}
```

## CI/CD Integration

To run benchmarks in CI without the full stack:

```bash
# Run only fast benchmarks
cargo bench --bench horus_stack -- --quick

# Save baseline for comparison
cargo bench --bench horus_stack -- --save-baseline main

# Compare against baseline
cargo bench --bench horus_stack -- --baseline main
```

## Troubleshooting

### "Connection refused" errors
- Ensure the Horus stack is running
- Check that all services are listening on expected ports
- Verify firewall settings

### "Job execution timeout" errors
- Increase timeout values in benchmark code
- Check that runners are properly registered
- Verify Redis is accessible

### Inconsistent results
- Close other applications to reduce system load
- Run benchmarks multiple times for statistical significance
- Use `--warm-up-time` flag to increase warm-up period

## Adding New Benchmarks

To add a new benchmark:

1. Create a new function in `benches/horus_stack.rs`:
```rust
fn bench_my_feature(c: &mut Criterion) {
    let rt = create_runtime();
    let client = /* create client */;

    c.bench_function("my_feature", |b| {
        b.to_async(&rt).iter(|| async {
            // Your benchmark code
        });
    });
}
```

2. Add to the criterion_group:
```rust
criterion_group!(
    benches,
    // ... existing benchmarks
    bench_my_feature,
);
```

## Resources

- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [Horus Client Documentation](../lib/clients/)
- [Performance Tuning Guide](../docs/performance.md)