add complete binary and benchmarking

2025-11-18 20:39:25 +01:00
parent f66edba1d3
commit 4142f62e54
17 changed files with 2559 additions and 2 deletions
--- a/benches/SUMMARY.md
+++ b/benches/SUMMARY.md
@@ -0,0 +1,195 @@
+# Horus Stack Benchmarks - Summary
+
+## ✅ Created Comprehensive Benchmark Suite
+
+Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.
+
+### Files Created
+
+1. **`benches/horus_stack.rs`** - Main benchmark suite
+   - API discovery and metadata retrieval
+   - Runner management operations
+   - Job lifecycle testing
+   - Concurrent job submissions (1, 5, 10, 20 jobs)
+   - Health checks
+   - API latency measurements
+
+2. **`benches/stress_test.rs`** - Stress and load testing
+   - High-frequency job submissions (50-200 jobs)
+   - Sustained load testing
+   - Large payload handling (1KB-100KB)
+   - Rapid API calls (100 calls/iteration)
+   - Mixed workload scenarios
+   - Connection pool exhaustion tests (10-100 clients)
+
+3. **`benches/memory_usage.rs`** - Memory profiling
+   - Job object memory footprint (10-200 jobs)
+   - Client instance memory overhead (1-100 clients)
+   - Payload size impact on memory (1KB-1MB)
+   - Real-time memory delta reporting
+
+4. **`benches/README.md`** - Comprehensive documentation
+   - Setup instructions
+   - Benchmark descriptions
+   - Performance targets
+   - Customization guide
+   - Troubleshooting tips
+
+5. **`benches/QUICK_START.md`** - Quick reference guide
+   - Fast setup steps
+   - Common commands
+   - Expected performance metrics
+
+6. **`benches/MEMORY_BENCHMARKS.md`** - Memory profiling guide
+   - Memory benchmark descriptions
+   - Platform-specific measurement details
+   - Advanced profiling tools
+   - Memory optimization tips
+
+7. **`benches/run_benchmarks.sh`** - Helper script
+   - Automated prerequisite checking
+   - Service health verification
+   - One-command benchmark execution
+
+### Architecture
+
+The benchmarks interact with the Horus stack exclusively through the client libraries:
+
+- **`hero-supervisor-openrpc-client`** - Supervisor API (job management, runner coordination)
+- **`osiris-client`** - Osiris REST API (data queries)
+- **`hero-job`** - Job model definitions
+
+This ensures benchmarks test the real-world API surface that users interact with.
+
+### Key Features
+
+✅ **Async/await support** - Uses Criterion's async_tokio feature  
+✅ **Realistic workloads** - Tests actual job submission and execution  
+✅ **Concurrent testing** - Measures performance under parallel load  
+✅ **Stress testing** - Pushes system limits with high-frequency operations  
+✅ **HTML reports** - Beautiful visualizations with historical comparison  
+✅ **Automated checks** - Helper script verifies stack is running  
+
+### Benchmark Categories
+
+#### Performance Benchmarks (`horus_stack`)
+- `supervisor_discovery` - OpenRPC metadata (target: <10ms)
+- `supervisor_get_info` - Info retrieval (target: <5ms)
+- `supervisor_list_runners` - List operations (target: <5ms)
+- `supervisor_job_create` - Job creation (target: <10ms)
+- `supervisor_job_list` - Job listing (target: <10ms)
+- `osiris_health_check` - Health endpoint (target: <2ms)
+- `job_full_lifecycle` - Complete job cycle (target: <100ms)
+- `concurrent_jobs` - Parallel submissions (target: <500ms for 10 jobs)
+- `get_all_runner_status` - Status queries
+- `api_latency/*` - Detailed latency measurements
+
+#### Stress Tests (`stress_test`)
+- `stress_high_frequency_jobs` - 50-200 concurrent jobs
+- `stress_sustained_load` - Continuous submissions over time
+- `stress_large_payloads` - 1KB-100KB payload handling
+- `stress_rapid_api_calls` - 100 rapid calls per iteration
+- `stress_mixed_workload` - Combined operations
+- `stress_connection_pool` - 10-100 concurrent clients
+
+#### Memory Profiling (`memory_usage`)
+- `memory_job_creation` - Memory footprint per job (10-200 jobs)
+- `memory_client_creation` - Memory per client instance (1-100 clients)
+- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
+- Reports memory deltas in real-time during execution
+
+### Usage
+
+```bash
+# Quick start
+./benches/run_benchmarks.sh
+
+# Run specific suite
+cargo bench --bench horus_stack
+cargo bench --bench stress_test
+cargo bench --bench memory_usage
+
+# Run specific test
+cargo bench -- supervisor_discovery
+
+# Run memory benchmarks with verbose output (shows memory deltas)
+cargo bench --bench memory_usage -- --verbose
+
+# Save baseline
+cargo bench -- --save-baseline main
+
+# Compare against baseline
+cargo bench -- --baseline main
+```
+
+### Prerequisites
+
+The benchmarks require the full Horus stack to be running:
+
+```bash
+# Start Redis
+redis-server
+
+# Start Horus (with auto port cleanup)
+RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
+```
+
+### Configuration
+
+All benchmarks use these defaults (configurable in source):
+- Supervisor: `http://127.0.0.1:3030`
+- Osiris: `http://127.0.0.1:8081`
+- Coordinator HTTP: `http://127.0.0.1:9652`
+- Coordinator WS: `ws://127.0.0.1:9653`
+- Admin secret: `SECRET`
+
+### Results
+
+Results are saved to `target/criterion/` with:
+- HTML reports with graphs and statistics
+- JSON data for programmatic analysis
+- Historical comparison with previous runs
+- Detailed performance metrics (mean, median, std dev, throughput)
+
+### Integration
+
+The benchmarks are integrated into the workspace:
+- Added to `Cargo.toml` with proper dependencies
+- Uses workspace-level dependencies for consistency
+- Configured with `harness = false` for Criterion
+- Includes all necessary dev-dependencies
+
+### Next Steps
+
+1. Run benchmarks to establish baseline performance
+2. Monitor performance over time as code changes
+3. Use stress tests to identify bottlenecks
+4. Customize benchmarks for specific use cases
+5. Integrate into CI/CD for automated performance tracking
+
+## Technical Details
+
+### Dependencies Added
+- `criterion` v0.5 with async_tokio and html_reports features
+- `osiris-client` from workspace
+- `reqwest` v0.12 with json feature
+- `serde_json`, `uuid`, `chrono` from workspace
+
+### Benchmark Harness
+Uses Criterion.rs for:
+- Statistical analysis
+- Historical comparison
+- HTML report generation
+- Configurable sample sizes
+- Warm-up periods
+- Outlier detection
+
+### Job Creation
+Helper function `create_test_job()` creates properly structured Job instances:
+- Unique UUIDs for each job
+- Proper timestamps
+- JSON-serialized payloads
+- Empty signatures (for testing)
+- Configurable runner and command
+
+This ensures benchmarks test realistic job structures that match production usage.