horus/benches/SUMMARY.md

# Horus Stack Benchmarks - Summary

## ✅ Created Comprehensive Benchmark Suite

Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.

### Files Created

1. **`benches/horus_stack.rs`** - Main benchmark suite
   - API discovery and metadata retrieval
   - Runner management operations
   - Job lifecycle testing
   - Concurrent job submissions (1, 5, 10, 20 jobs)
   - Health checks
   - API latency measurements

2. **`benches/stress_test.rs`** - Stress and load testing
   - High-frequency job submissions (50-200 jobs)
   - Sustained load testing
   - Large payload handling (1KB-100KB)
   - Rapid API calls (100 calls/iteration)
   - Mixed workload scenarios
   - Connection pool exhaustion tests (10-100 clients)

3. **`benches/memory_usage.rs`** - Memory profiling
   - Job object memory footprint (10-200 jobs)
   - Client instance memory overhead (1-100 clients)
   - Payload size impact on memory (1KB-1MB)
   - Real-time memory delta reporting

4. **`benches/README.md`** - Comprehensive documentation
   - Setup instructions
   - Benchmark descriptions
   - Performance targets
   - Customization guide
   - Troubleshooting tips

5. **`benches/QUICK_START.md`** - Quick reference guide
   - Fast setup steps
   - Common commands
   - Expected performance metrics

6. **`benches/MEMORY_BENCHMARKS.md`** - Memory profiling guide
   - Memory benchmark descriptions
   - Platform-specific measurement details
   - Advanced profiling tools
   - Memory optimization tips

7. **`benches/run_benchmarks.sh`** - Helper script
   - Automated prerequisite checking
   - Service health verification
   - One-command benchmark execution

### Architecture

The benchmarks interact with the Horus stack exclusively through the client libraries:

- **`hero-supervisor-openrpc-client`** - Supervisor API (job management, runner coordination)
- **`osiris-client`** - Osiris REST API (data queries)
- **`hero-job`** - Job model definitions

This ensures benchmarks test the real-world API surface that users interact with.

### Key Features

✅ **Async/await support** - Uses Criterion's async_tokio feature
✅ **Realistic workloads** - Tests actual job submission and execution
✅ **Concurrent testing** - Measures performance under parallel load
✅ **Stress testing** - Pushes system limits with high-frequency operations
✅ **HTML reports** - Beautiful visualizations with historical comparison
✅ **Automated checks** - Helper script verifies stack is running

### Benchmark Categories

#### Performance Benchmarks (`horus_stack`)
- `supervisor_discovery` - OpenRPC metadata (target: <10ms)
- `supervisor_get_info` - Info retrieval (target: <5ms)
- `supervisor_list_runners` - List operations (target: <5ms)
- `supervisor_job_create` - Job creation (target: <10ms)
- `supervisor_job_list` - Job listing (target: <10ms)
- `osiris_health_check` - Health endpoint (target: <2ms)
- `job_full_lifecycle` - Complete job cycle (target: <100ms)
- `concurrent_jobs` - Parallel submissions (target: <500ms for 10 jobs)
- `get_all_runner_status` - Status queries
- `api_latency/*` - Detailed latency measurements

#### Stress Tests (`stress_test`)
- `stress_high_frequency_jobs` - 50-200 concurrent jobs
- `stress_sustained_load` - Continuous submissions over time
- `stress_large_payloads` - 1KB-100KB payload handling
- `stress_rapid_api_calls` - 100 rapid calls per iteration
- `stress_mixed_workload` - Combined operations
- `stress_connection_pool` - 10-100 concurrent clients

#### Memory Profiling (`memory_usage`)
- `memory_job_creation` - Memory footprint per job (10-200 jobs)
- `memory_client_creation` - Memory per client instance (1-100 clients)
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
- Reports memory deltas in real-time during execution

### Usage

```bash
# Quick start
./benches/run_benchmarks.sh

# Run specific suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
cargo bench --bench memory_usage

# Run specific test
cargo bench -- supervisor_discovery

# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose

# Save baseline
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main
```

### Prerequisites

The benchmarks require the full Horus stack to be running:

```bash
# Start Redis
redis-server

# Start Horus (with auto port cleanup)
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```

### Configuration

All benchmarks use these defaults (configurable in source):
- Supervisor: `http://127.0.0.1:3030`
- Osiris: `http://127.0.0.1:8081`
- Coordinator HTTP: `http://127.0.0.1:9652`
- Coordinator WS: `ws://127.0.0.1:9653`
- Admin secret: `SECRET`

### Results

Results are saved to `target/criterion/` with:
- HTML reports with graphs and statistics
- JSON data for programmatic analysis
- Historical comparison with previous runs
- Detailed performance metrics (mean, median, std dev, throughput)

### Integration

The benchmarks are integrated into the workspace:
- Added to `Cargo.toml` with proper dependencies
- Uses workspace-level dependencies for consistency
- Configured with `harness = false` for Criterion
- Includes all necessary dev-dependencies

### Next Steps

1. Run benchmarks to establish baseline performance
2. Monitor performance over time as code changes
3. Use stress tests to identify bottlenecks
4. Customize benchmarks for specific use cases
5. Integrate into CI/CD for automated performance tracking

## Technical Details

### Dependencies Added
- `criterion` v0.5 with async_tokio and html_reports features
- `osiris-client` from workspace
- `reqwest` v0.12 with json feature
- `serde_json`, `uuid`, `chrono` from workspace

### Benchmark Harness
Uses Criterion.rs for:
- Statistical analysis
- Historical comparison
- HTML report generation
- Configurable sample sizes
- Warm-up periods
- Outlier detection

### Job Creation
Helper function `create_test_job()` creates properly structured Job instances:
- Unique UUIDs for each job
- Proper timestamps
- JSON-serialized payloads
- Empty signatures (for testing)
- Configurable runner and command

This ensures benchmarks test realistic job structures that match production usage.