add complete binary and benchmarking
This commit is contained in:
195
benches/SUMMARY.md
Normal file
195
benches/SUMMARY.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Horus Stack Benchmarks - Summary
|
||||
|
||||
## ✅ Created Comprehensive Benchmark Suite
|
||||
|
||||
Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.
|
||||
|
||||
### Files Created
|
||||
|
||||
1. **`benches/horus_stack.rs`** - Main benchmark suite
|
||||
- API discovery and metadata retrieval
|
||||
- Runner management operations
|
||||
- Job lifecycle testing
|
||||
- Concurrent job submissions (1, 5, 10, 20 jobs)
|
||||
- Health checks
|
||||
- API latency measurements
|
||||
|
||||
2. **`benches/stress_test.rs`** - Stress and load testing
|
||||
- High-frequency job submissions (50-200 jobs)
|
||||
- Sustained load testing
|
||||
- Large payload handling (1KB-100KB)
|
||||
- Rapid API calls (100 calls/iteration)
|
||||
- Mixed workload scenarios
|
||||
- Connection pool exhaustion tests (10-100 clients)
|
||||
|
||||
3. **`benches/memory_usage.rs`** - Memory profiling
|
||||
- Job object memory footprint (10-200 jobs)
|
||||
- Client instance memory overhead (1-100 clients)
|
||||
- Payload size impact on memory (1KB-1MB)
|
||||
- Real-time memory delta reporting
|
||||
|
||||
4. **`benches/README.md`** - Comprehensive documentation
|
||||
- Setup instructions
|
||||
- Benchmark descriptions
|
||||
- Performance targets
|
||||
- Customization guide
|
||||
- Troubleshooting tips
|
||||
|
||||
5. **`benches/QUICK_START.md`** - Quick reference guide
|
||||
- Fast setup steps
|
||||
- Common commands
|
||||
- Expected performance metrics
|
||||
|
||||
6. **`benches/MEMORY_BENCHMARKS.md`** - Memory profiling guide
|
||||
- Memory benchmark descriptions
|
||||
- Platform-specific measurement details
|
||||
- Advanced profiling tools
|
||||
- Memory optimization tips
|
||||
|
||||
7. **`benches/run_benchmarks.sh`** - Helper script
|
||||
- Automated prerequisite checking
|
||||
- Service health verification
|
||||
- One-command benchmark execution
|
||||
|
||||
### Architecture
|
||||
|
||||
The benchmarks interact with the Horus stack exclusively through the client libraries:
|
||||
|
||||
- **`hero-supervisor-openrpc-client`** - Supervisor API (job management, runner coordination)
|
||||
- **`osiris-client`** - Osiris REST API (data queries)
|
||||
- **`hero-job`** - Job model definitions
|
||||
|
||||
This ensures benchmarks test the real-world API surface that users interact with.
|
||||
|
||||
### Key Features
|
||||
|
||||
✅ **Async/await support** - Uses Criterion's async_tokio feature
|
||||
✅ **Realistic workloads** - Tests actual job submission and execution
|
||||
✅ **Concurrent testing** - Measures performance under parallel load
|
||||
✅ **Stress testing** - Pushes system limits with high-frequency operations
|
||||
✅ **HTML reports** - Beautiful visualizations with historical comparison
|
||||
✅ **Automated checks** - Helper script verifies stack is running
|
||||
|
||||
### Benchmark Categories
|
||||
|
||||
#### Performance Benchmarks (`horus_stack`)
|
||||
- `supervisor_discovery` - OpenRPC metadata (target: <10ms)
|
||||
- `supervisor_get_info` - Info retrieval (target: <5ms)
|
||||
- `supervisor_list_runners` - List operations (target: <5ms)
|
||||
- `supervisor_job_create` - Job creation (target: <10ms)
|
||||
- `supervisor_job_list` - Job listing (target: <10ms)
|
||||
- `osiris_health_check` - Health endpoint (target: <2ms)
|
||||
- `job_full_lifecycle` - Complete job cycle (target: <100ms)
|
||||
- `concurrent_jobs` - Parallel submissions (target: <500ms for 10 jobs)
|
||||
- `get_all_runner_status` - Status queries
|
||||
- `api_latency/*` - Detailed latency measurements
|
||||
|
||||
#### Stress Tests (`stress_test`)
|
||||
- `stress_high_frequency_jobs` - 50-200 concurrent jobs
|
||||
- `stress_sustained_load` - Continuous submissions over time
|
||||
- `stress_large_payloads` - 1KB-100KB payload handling
|
||||
- `stress_rapid_api_calls` - 100 rapid calls per iteration
|
||||
- `stress_mixed_workload` - Combined operations
|
||||
- `stress_connection_pool` - 10-100 concurrent clients
|
||||
|
||||
#### Memory Profiling (`memory_usage`)
|
||||
- `memory_job_creation` - Memory footprint per job (10-200 jobs)
|
||||
- `memory_client_creation` - Memory per client instance (1-100 clients)
|
||||
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
|
||||
- Reports memory deltas in real-time during execution
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Quick start
|
||||
./benches/run_benchmarks.sh
|
||||
|
||||
# Run specific suite
|
||||
cargo bench --bench horus_stack
|
||||
cargo bench --bench stress_test
|
||||
cargo bench --bench memory_usage
|
||||
|
||||
# Run specific test
|
||||
cargo bench -- supervisor_discovery
|
||||
|
||||
# Run memory benchmarks with verbose output (shows memory deltas)
|
||||
cargo bench --bench memory_usage -- --verbose
|
||||
|
||||
# Save baseline
|
||||
cargo bench -- --save-baseline main
|
||||
|
||||
# Compare against baseline
|
||||
cargo bench -- --baseline main
|
||||
```
|
||||
|
||||
### Prerequisites
|
||||
|
||||
The benchmarks require the full Horus stack to be running:
|
||||
|
||||
```bash
|
||||
# Start Redis
|
||||
redis-server
|
||||
|
||||
# Start Horus (with auto port cleanup)
|
||||
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
All benchmarks use these defaults (configurable in source):
|
||||
- Supervisor: `http://127.0.0.1:3030`
|
||||
- Osiris: `http://127.0.0.1:8081`
|
||||
- Coordinator HTTP: `http://127.0.0.1:9652`
|
||||
- Coordinator WS: `ws://127.0.0.1:9653`
|
||||
- Admin secret: `SECRET`
|
||||
|
||||
### Results
|
||||
|
||||
Results are saved to `target/criterion/` with:
|
||||
- HTML reports with graphs and statistics
|
||||
- JSON data for programmatic analysis
|
||||
- Historical comparison with previous runs
|
||||
- Detailed performance metrics (mean, median, std dev, throughput)
|
||||
|
||||
### Integration
|
||||
|
||||
The benchmarks are integrated into the workspace:
|
||||
- Added to `Cargo.toml` with proper dependencies
|
||||
- Uses workspace-level dependencies for consistency
|
||||
- Configured with `harness = false` for Criterion
|
||||
- Includes all necessary dev-dependencies
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. Run benchmarks to establish baseline performance
|
||||
2. Monitor performance over time as code changes
|
||||
3. Use stress tests to identify bottlenecks
|
||||
4. Customize benchmarks for specific use cases
|
||||
5. Integrate into CI/CD for automated performance tracking
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Dependencies Added
|
||||
- `criterion` v0.5 with async_tokio and html_reports features
|
||||
- `osiris-client` from workspace
|
||||
- `reqwest` v0.12 with json feature
|
||||
- `serde_json`, `uuid`, `chrono` from workspace
|
||||
|
||||
### Benchmark Harness
|
||||
Uses Criterion.rs for:
|
||||
- Statistical analysis
|
||||
- Historical comparison
|
||||
- HTML report generation
|
||||
- Configurable sample sizes
|
||||
- Warm-up periods
|
||||
- Outlier detection
|
||||
|
||||
### Job Creation
|
||||
Helper function `create_test_job()` creates properly structured Job instances:
|
||||
- Unique UUIDs for each job
|
||||
- Proper timestamps
|
||||
- JSON-serialized payloads
|
||||
- Empty signatures (for testing)
|
||||
- Configurable runner and command
|
||||
|
||||
This ensures benchmarks test realistic job structures that match production usage.
|
||||
Reference in New Issue
Block a user