add complete binary and benchmarking

This commit is contained in:
Timur Gordon
2025-11-18 20:39:25 +01:00
parent f66edba1d3
commit 4142f62e54
17 changed files with 2559 additions and 2 deletions

View File

@@ -0,0 +1,217 @@
# Memory Usage Benchmarks
Benchmarks for measuring memory consumption of the Horus stack components.
## Overview
The memory benchmarks measure heap memory usage for various operations:
- Job creation and storage
- Client instantiation
- Payload size impact
- Memory growth under load
## Benchmarks
### 1. `memory_job_creation`
Measures memory usage when creating multiple Job objects in memory.
**Test sizes**: 10, 50, 100, 200 jobs
**What it measures**:
- Memory allocated per job object
- Heap growth with increasing job count
- Memory efficiency of Job structure
**Expected results**:
- Linear memory growth with job count
- ~1-2 KB per job object (depending on payload)
### 2. `memory_client_creation`
Measures memory overhead of creating multiple Supervisor client instances.
**Test sizes**: 1, 10, 50, 100 clients
**What it measures**:
- Memory per client instance
- Connection pool overhead
- HTTP client memory footprint
**Expected results**:
- ~10-50 KB per client instance
- Includes HTTP client, connection pools, and buffers
### 3. `memory_payload_sizes`
Measures memory usage with different payload sizes.
**Test sizes**: 1KB, 10KB, 100KB, 1MB payloads
**What it measures**:
- Memory overhead of JSON serialization
- String allocation costs
- Payload storage efficiency
**Expected results**:
- Memory usage should scale linearly with payload size
- Small overhead for JSON structure (~5-10%)
## Running Memory Benchmarks
```bash
# Run all memory benchmarks
cargo bench --bench memory_usage
# Run specific memory test
cargo bench --bench memory_usage -- memory_job_creation
# Run with verbose output to see memory deltas
cargo bench --bench memory_usage -- --verbose
```
## Interpreting Results
The benchmarks print memory deltas to stderr during execution:
```
Memory delta for 100 jobs: 156 KB
Memory delta for 50 clients: 2048 KB
Memory delta for 100KB payload: 105 KB
```
### Memory Delta Interpretation
- **Positive delta**: Memory was allocated during the operation
- **Zero delta**: No significant memory change (may be reusing existing allocations)
- **Negative delta**: Memory was freed (garbage collection, deallocations)
### Platform Differences
**macOS**: Uses `ps` command to read RSS (Resident Set Size)
**Linux**: Reads `/proc/self/status` for VmRSS
RSS includes:
- Heap allocations
- Stack memory
- Shared libraries (mapped into process)
- Memory-mapped files
## Limitations
1. **Granularity**: OS-level memory reporting may not capture small allocations
2. **Timing**: Memory measurements happen before/after operations, not continuously
3. **GC effects**: Rust's allocator may not immediately release memory to OS
4. **Shared memory**: RSS includes shared library memory
## Best Practices
### For Accurate Measurements
1. **Run multiple iterations**: Criterion handles this automatically
2. **Warm up**: First iterations may show higher memory due to lazy initialization
3. **Isolate tests**: Run memory benchmarks separately from performance benchmarks
4. **Monitor trends**: Compare results over time, not absolute values
### Memory Optimization Tips
If benchmarks show high memory usage:
1. **Check payload sizes**: Large payloads consume proportional memory
2. **Limit concurrent operations**: Too many simultaneous jobs/clients increase memory
3. **Review data structures**: Ensure efficient serialization
4. **Profile with tools**: Use `heaptrack` (Linux) or `instruments` (macOS) for detailed analysis
## Advanced Profiling
For detailed memory profiling beyond these benchmarks:
### macOS
```bash
# Use Instruments
instruments -t Allocations -D memory_trace.trace ./target/release/horus
# Use heap profiler
cargo install cargo-instruments
cargo instruments --bench memory_usage --template Allocations
```
### Linux
```bash
# Use Valgrind massif
valgrind --tool=massif --massif-out-file=massif.out \
./target/release/deps/memory_usage-*
# Visualize with massif-visualizer
massif-visualizer massif.out
# Use heaptrack
heaptrack ./target/release/deps/memory_usage-*
heaptrack_gui heaptrack.memory_usage.*.gz
```
### Cross-platform
```bash
# Use dhat (heap profiler)
cargo install dhat
# Add dhat to your benchmark and run
cargo bench --bench memory_usage --features dhat-heap
```
## Continuous Monitoring
Integrate memory benchmarks into CI/CD:
```bash
# Run and save baseline
cargo bench --bench memory_usage -- --save-baseline memory-main
# Compare in PR
cargo bench --bench memory_usage -- --baseline memory-main
# Fail if memory usage increases >10%
# (requires custom scripting to parse Criterion output)
```
## Troubleshooting
### "Memory delta is always 0"
- OS may not update RSS immediately
- Allocations might be too small to measure
- Try increasing iteration count or operation size
### "Memory keeps growing"
- Check for memory leaks
- Verify objects are being dropped
- Use `cargo clippy` to find potential issues
### "Results are inconsistent"
- Other processes may be affecting measurements
- Run benchmarks on idle system
- Increase sample size in benchmark code
## Example Output
```
memory_job_creation/10 time: [45.2 µs 46.1 µs 47.3 µs]
Memory delta for 10 jobs: 24 KB
memory_job_creation/50 time: [198.4 µs 201.2 µs 204.8 µs]
Memory delta for 50 jobs: 98 KB
memory_job_creation/100 time: [387.6 µs 392.1 µs 397.4 µs]
Memory delta for 100 jobs: 187 KB
memory_client_creation/1 time: [234.5 µs 238.2 µs 242.6 µs]
Memory delta for 1 clients: 45 KB
memory_payload_sizes/1KB time: [12.3 µs 12.6 µs 13.0 µs]
Memory delta for 1KB payload: 2 KB
memory_payload_sizes/100KB time: [156.7 µs 159.4 µs 162.8 µs]
Memory delta for 100KB payload: 105 KB
```
## Related Documentation
- [Performance Benchmarks](./README.md)
- [Stress Tests](./README.md#stress-tests)
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)

129
benches/QUICK_START.md Normal file
View File

@@ -0,0 +1,129 @@
# Horus Benchmarks - Quick Start
## 1. Start the Stack
```bash
# Terminal 1: Start Redis
redis-server
# Terminal 2: Start Horus
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```
## 2. Run Benchmarks
### Option A: Use the helper script (recommended)
```bash
./benches/run_benchmarks.sh
```
### Option B: Run directly with cargo
```bash
# All benchmarks
cargo bench
# Specific benchmark suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
# Specific test
cargo bench --bench horus_stack -- supervisor_discovery
# Quick run (fewer samples)
cargo bench -- --quick
```
## 3. View Results
```bash
# Open HTML report in browser
open target/criterion/report/index.html
# Or on Linux
xdg-open target/criterion/report/index.html
```
## Available Benchmark Suites
### `horus_stack` - Standard Performance Tests
- API discovery and metadata
- Runner management
- Job operations
- Concurrency tests
- Health checks
- API latency measurements
### `stress_test` - Load & Stress Tests
- High-frequency job submissions (50-200 jobs)
- Sustained load testing
- Large payload handling (1KB-100KB)
- Rapid API calls (100 calls/test)
- Mixed workload scenarios
- Connection pool exhaustion (10-100 clients)
### `memory_usage` - Memory Profiling
- Job object memory footprint (10-200 jobs)
- Client instance memory overhead (1-100 clients)
- Payload size impact on memory (1KB-1MB)
- Memory growth patterns under load
## Common Commands
```bash
# Run only fast benchmarks
cargo bench -- --quick
# Save baseline for comparison
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
# Run with verbose output
cargo bench -- --verbose
# Filter by name
cargo bench -- concurrent
cargo bench -- stress
# Run specific benchmark group
cargo bench --bench horus_stack -- api_latency
# Run memory benchmarks
cargo bench --bench memory_usage
# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose
```
## Troubleshooting
**"Connection refused"**
- Make sure Horus stack is running
- Check ports: 3030 (supervisor), 8081 (osiris), 9652/9653 (coordinator)
**"Job timeout"**
- Increase timeout in benchmark code
- Check that runners are registered: `curl http://127.0.0.1:3030` (requires POST)
**Slow benchmarks**
- Close other applications
- Use `--quick` flag for faster runs
- Reduce sample size in benchmark code
## Performance Expectations
| Test | Expected Time |
|------|---------------|
| supervisor_discovery | < 10ms |
| supervisor_get_info | < 5ms |
| job_full_lifecycle | < 100ms |
| concurrent_jobs (10) | < 500ms |
| stress_high_frequency (50) | < 2s |
## Next Steps
- See `benches/README.md` for detailed documentation
- Modify `benches/horus_stack.rs` to add custom tests
- Check `target/criterion/` for detailed reports

206
benches/README.md Normal file
View File

@@ -0,0 +1,206 @@
# Horus Stack Benchmarks
Comprehensive benchmark suite for the entire Horus stack, testing performance through the client APIs.
## Overview
These benchmarks test the full Horus system including:
- **Supervisor API** - Job management, runner coordination
- **Coordinator API** - Job routing and execution
- **Osiris API** - REST API for data queries
All benchmarks interact with the stack through the official client libraries in `/lib/clients`, which is the only supported way to interact with the system.
## Prerequisites
Before running benchmarks, you must have the Horus stack running:
```bash
# Start Redis
redis-server
# Start all Horus services
cd /Users/timurgordon/code/git.ourworld.tf/herocode/horus
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```
The benchmarks expect:
- **Supervisor** running on `http://127.0.0.1:3030`
- **Coordinator** running on `http://127.0.0.1:9652` (HTTP) and `ws://127.0.0.1:9653` (WebSocket)
- **Osiris** running on `http://127.0.0.1:8081`
- **Redis** running on `127.0.0.1:6379`
- Admin secret: `SECRET`
## Running Benchmarks
### Run all benchmarks
```bash
cargo bench --bench horus_stack
```
### Run specific benchmark
```bash
cargo bench --bench horus_stack -- supervisor_discovery
```
### Run with specific filter
```bash
cargo bench --bench horus_stack -- concurrent
```
### Generate detailed reports
```bash
cargo bench --bench horus_stack -- --verbose
```
## Benchmark Categories
### 1. API Discovery & Metadata (`horus_stack`)
- `supervisor_discovery` - OpenRPC metadata retrieval
- `supervisor_get_info` - Supervisor information and stats
### 2. Runner Management (`horus_stack`)
- `supervisor_list_runners` - List all registered runners
- `get_all_runner_status` - Get status of all runners
### 3. Job Operations (`horus_stack`)
- `supervisor_job_create` - Create job without execution
- `supervisor_job_list` - List all jobs
- `job_full_lifecycle` - Complete job lifecycle (create → execute → result)
### 4. Concurrency Tests (`horus_stack`)
- `concurrent_jobs` - Submit multiple jobs concurrently (1, 5, 10, 20 jobs)
### 5. Health & Monitoring (`horus_stack`)
- `osiris_health_check` - Osiris server health endpoint
### 6. API Latency (`horus_stack`)
- `api_latency/supervisor_info` - Supervisor info latency
- `api_latency/runner_list` - Runner list latency
- `api_latency/job_list` - Job list latency
### 7. Stress Tests (`stress_test`)
- `stress_high_frequency_jobs` - High-frequency submissions (50-200 jobs)
- `stress_sustained_load` - Continuous load testing
- `stress_large_payloads` - Large payload handling (1KB-100KB)
- `stress_rapid_api_calls` - Rapid API calls (100 calls/iteration)
- `stress_mixed_workload` - Mixed operation scenarios
- `stress_connection_pool` - Connection pool exhaustion (10-100 clients)
### 8. Memory Usage (`memory_usage`)
- `memory_job_creation` - Memory per job object (10-200 jobs)
- `memory_client_creation` - Memory per client instance (1-100 clients)
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
See [MEMORY_BENCHMARKS.md](./MEMORY_BENCHMARKS.md) for detailed memory profiling documentation.
## Interpreting Results
Criterion outputs detailed statistics including:
- **Mean time** - Average execution time
- **Std deviation** - Variability in measurements
- **Median** - Middle value (50th percentile)
- **MAD** - Median Absolute Deviation
- **Throughput** - Operations per second
Results are saved in `target/criterion/` with:
- HTML reports with graphs
- JSON data for further analysis
- Historical comparison with previous runs
## Performance Targets
Expected performance (on modern hardware):
| Benchmark | Target | Notes |
|-----------|--------|-------|
| supervisor_discovery | < 10ms | Metadata retrieval |
| supervisor_get_info | < 5ms | Simple info query |
| supervisor_list_runners | < 5ms | List operation |
| supervisor_job_create | < 10ms | Job creation only |
| job_full_lifecycle | < 100ms | Full execution cycle |
| osiris_health_check | < 2ms | Health endpoint |
| concurrent_jobs (10) | < 500ms | 10 parallel jobs |
## Customization
To modify benchmark parameters, edit `benches/horus_stack.rs`:
```rust
// Change URLs
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const OSIRIS_URL: &str = "http://127.0.0.1:8081";
// Change admin secret
const ADMIN_SECRET: &str = "SECRET";
// Adjust concurrent job counts
for num_jobs in [1, 5, 10, 20, 50].iter() {
// ...
}
```
## CI/CD Integration
To run benchmarks in CI without the full stack:
```bash
# Run only fast benchmarks
cargo bench --bench horus_stack -- --quick
# Save baseline for comparison
cargo bench --bench horus_stack -- --save-baseline main
# Compare against baseline
cargo bench --bench horus_stack -- --baseline main
```
## Troubleshooting
### "Connection refused" errors
- Ensure the Horus stack is running
- Check that all services are listening on expected ports
- Verify firewall settings
### "Job execution timeout" errors
- Increase timeout values in benchmark code
- Check that runners are properly registered
- Verify Redis is accessible
### Inconsistent results
- Close other applications to reduce system load
- Run benchmarks multiple times for statistical significance
- Use `--warm-up-time` flag to increase warm-up period
## Adding New Benchmarks
To add a new benchmark:
1. Create a new function in `benches/horus_stack.rs`:
```rust
fn bench_my_feature(c: &mut Criterion) {
let rt = create_runtime();
let client = /* create client */;
c.bench_function("my_feature", |b| {
b.to_async(&rt).iter(|| async {
// Your benchmark code
});
});
}
```
2. Add to the criterion_group:
```rust
criterion_group!(
benches,
// ... existing benchmarks
bench_my_feature,
);
```
## Resources
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [Horus Client Documentation](../lib/clients/)
- [Performance Tuning Guide](../docs/performance.md)

195
benches/SUMMARY.md Normal file
View File

@@ -0,0 +1,195 @@
# Horus Stack Benchmarks - Summary
## ✅ Created Comprehensive Benchmark Suite
Successfully created a complete benchmark suite for the Horus stack that tests the entire system through the official client APIs.
### Files Created
1. **`benches/horus_stack.rs`** - Main benchmark suite
- API discovery and metadata retrieval
- Runner management operations
- Job lifecycle testing
- Concurrent job submissions (1, 5, 10, 20 jobs)
- Health checks
- API latency measurements
2. **`benches/stress_test.rs`** - Stress and load testing
- High-frequency job submissions (50-200 jobs)
- Sustained load testing
- Large payload handling (1KB-100KB)
- Rapid API calls (100 calls/iteration)
- Mixed workload scenarios
- Connection pool exhaustion tests (10-100 clients)
3. **`benches/memory_usage.rs`** - Memory profiling
- Job object memory footprint (10-200 jobs)
- Client instance memory overhead (1-100 clients)
- Payload size impact on memory (1KB-1MB)
- Real-time memory delta reporting
4. **`benches/README.md`** - Comprehensive documentation
- Setup instructions
- Benchmark descriptions
- Performance targets
- Customization guide
- Troubleshooting tips
5. **`benches/QUICK_START.md`** - Quick reference guide
- Fast setup steps
- Common commands
- Expected performance metrics
6. **`benches/MEMORY_BENCHMARKS.md`** - Memory profiling guide
- Memory benchmark descriptions
- Platform-specific measurement details
- Advanced profiling tools
- Memory optimization tips
7. **`benches/run_benchmarks.sh`** - Helper script
- Automated prerequisite checking
- Service health verification
- One-command benchmark execution
### Architecture
The benchmarks interact with the Horus stack exclusively through the client libraries:
- **`hero-supervisor-openrpc-client`** - Supervisor API (job management, runner coordination)
- **`osiris-client`** - Osiris REST API (data queries)
- **`hero-job`** - Job model definitions
This ensures benchmarks test the real-world API surface that users interact with.
### Key Features
**Async/await support** - Uses Criterion's async_tokio feature
**Realistic workloads** - Tests actual job submission and execution
**Concurrent testing** - Measures performance under parallel load
**Stress testing** - Pushes system limits with high-frequency operations
**HTML reports** - Beautiful visualizations with historical comparison
**Automated checks** - Helper script verifies stack is running
### Benchmark Categories
#### Performance Benchmarks (`horus_stack`)
- `supervisor_discovery` - OpenRPC metadata (target: <10ms)
- `supervisor_get_info` - Info retrieval (target: <5ms)
- `supervisor_list_runners` - List operations (target: <5ms)
- `supervisor_job_create` - Job creation (target: <10ms)
- `supervisor_job_list` - Job listing (target: <10ms)
- `osiris_health_check` - Health endpoint (target: <2ms)
- `job_full_lifecycle` - Complete job cycle (target: <100ms)
- `concurrent_jobs` - Parallel submissions (target: <500ms for 10 jobs)
- `get_all_runner_status` - Status queries
- `api_latency/*` - Detailed latency measurements
#### Stress Tests (`stress_test`)
- `stress_high_frequency_jobs` - 50-200 concurrent jobs
- `stress_sustained_load` - Continuous submissions over time
- `stress_large_payloads` - 1KB-100KB payload handling
- `stress_rapid_api_calls` - 100 rapid calls per iteration
- `stress_mixed_workload` - Combined operations
- `stress_connection_pool` - 10-100 concurrent clients
#### Memory Profiling (`memory_usage`)
- `memory_job_creation` - Memory footprint per job (10-200 jobs)
- `memory_client_creation` - Memory per client instance (1-100 clients)
- `memory_payload_sizes` - Memory vs payload size (1KB-1MB)
- Reports memory deltas in real-time during execution
### Usage
```bash
# Quick start
./benches/run_benchmarks.sh
# Run specific suite
cargo bench --bench horus_stack
cargo bench --bench stress_test
cargo bench --bench memory_usage
# Run specific test
cargo bench -- supervisor_discovery
# Run memory benchmarks with verbose output (shows memory deltas)
cargo bench --bench memory_usage -- --verbose
# Save baseline
cargo bench -- --save-baseline main
# Compare against baseline
cargo bench -- --baseline main
```
### Prerequisites
The benchmarks require the full Horus stack to be running:
```bash
# Start Redis
redis-server
# Start Horus (with auto port cleanup)
RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports
```
### Configuration
All benchmarks use these defaults (configurable in source):
- Supervisor: `http://127.0.0.1:3030`
- Osiris: `http://127.0.0.1:8081`
- Coordinator HTTP: `http://127.0.0.1:9652`
- Coordinator WS: `ws://127.0.0.1:9653`
- Admin secret: `SECRET`
### Results
Results are saved to `target/criterion/` with:
- HTML reports with graphs and statistics
- JSON data for programmatic analysis
- Historical comparison with previous runs
- Detailed performance metrics (mean, median, std dev, throughput)
### Integration
The benchmarks are integrated into the workspace:
- Added to `Cargo.toml` with proper dependencies
- Uses workspace-level dependencies for consistency
- Configured with `harness = false` for Criterion
- Includes all necessary dev-dependencies
### Next Steps
1. Run benchmarks to establish baseline performance
2. Monitor performance over time as code changes
3. Use stress tests to identify bottlenecks
4. Customize benchmarks for specific use cases
5. Integrate into CI/CD for automated performance tracking
## Technical Details
### Dependencies Added
- `criterion` v0.5 with async_tokio and html_reports features
- `osiris-client` from workspace
- `reqwest` v0.12 with json feature
- `serde_json`, `uuid`, `chrono` from workspace
### Benchmark Harness
Uses Criterion.rs for:
- Statistical analysis
- Historical comparison
- HTML report generation
- Configurable sample sizes
- Warm-up periods
- Outlier detection
### Job Creation
Helper function `create_test_job()` creates properly structured Job instances:
- Unique UUIDs for each job
- Proper timestamps
- JSON-serialized payloads
- Empty signatures (for testing)
- Configurable runner and command
This ensures benchmarks test realistic job structures that match production usage.

324
benches/horus_stack.rs Normal file
View File

@@ -0,0 +1,324 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
use hero_job::Job;
use tokio::runtime::Runtime;
use std::time::Duration;
use std::collections::HashMap;
use uuid::Uuid;
use chrono::Utc;
/// Benchmark configuration
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const OSIRIS_URL: &str = "http://127.0.0.1:8081";
const ADMIN_SECRET: &str = "SECRET";
/// Helper to create a tokio runtime for benchmarks
fn create_runtime() -> Runtime {
Runtime::new().unwrap()
}
/// Helper to create a test job
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
Job {
id: Uuid::new_v4().to_string(),
caller_id: "benchmark".to_string(),
context_id: "test".to_string(),
payload: serde_json::json!({
"command": command,
"args": args
}).to_string(),
runner: runner.to_string(),
timeout: 30,
env_vars: HashMap::new(),
created_at: Utc::now(),
updated_at: Utc::now(),
signatures: vec![],
}
}
/// Benchmark: Supervisor discovery (OpenRPC metadata)
fn bench_supervisor_discovery(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_discovery", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.discover().await.expect("Discovery failed"))
});
});
}
/// Benchmark: Supervisor info retrieval
fn bench_supervisor_info(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_get_info", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.get_supervisor_info().await.expect("Get info failed"))
});
});
}
/// Benchmark: List runners
fn bench_list_runners(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_list_runners", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.runner_list().await.expect("List runners failed"))
});
});
}
/// Benchmark: Job creation (without execution)
fn bench_job_create(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner exists
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
c.bench_function("supervisor_job_create", |b| {
b.to_async(&rt).iter(|| async {
let job = create_test_job("hero", "echo", vec!["hello".to_string()]);
black_box(client.job_create(job).await.expect("Job create failed"))
});
});
}
/// Benchmark: Job listing
fn bench_job_list(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
c.bench_function("supervisor_job_list", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.job_list().await.expect("Job list failed"))
});
});
}
/// Benchmark: Osiris health check
fn bench_osiris_health(c: &mut Criterion) {
let rt = create_runtime();
let client = reqwest::Client::new();
c.bench_function("osiris_health_check", |b| {
b.to_async(&rt).iter(|| async {
let url = format!("{}/health", OSIRIS_URL);
black_box(
client
.get(&url)
.send()
.await
.expect("Health check failed")
.json::<serde_json::Value>()
.await
.expect("JSON parse failed")
)
});
});
}
/// Benchmark: Full job lifecycle (create, start, wait for result)
fn bench_job_lifecycle(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(60))
.build()
.expect("Failed to create supervisor client")
});
// First ensure we have a runner registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
c.bench_function("job_full_lifecycle", |b| {
b.to_async(&rt).iter(|| async {
let job = create_test_job("hero", "echo", vec!["benchmark_test".to_string()]);
// Start job and wait for result
black_box(
client
.job_run(job, Some(30))
.await
.expect("Job run failed")
)
});
});
}
/// Benchmark: Concurrent job submissions
fn bench_concurrent_jobs(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(60))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("concurrent_jobs");
for num_jobs in [1, 5, 10, 20].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_jobs),
num_jobs,
|b, &num_jobs| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
for i in 0..num_jobs {
let client = client.clone();
let handle = tokio::spawn(async move {
let job = create_test_job("hero", "echo", vec![format!("job_{}", i)]);
client.job_create(job).await
});
handles.push(handle);
}
// Wait for all jobs to be submitted
for handle in handles {
black_box(handle.await.expect("Task failed").expect("Job start failed"));
}
});
},
);
}
group.finish();
}
/// Benchmark: Runner status checks
fn bench_runner_status(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
// Ensure we have runners
rt.block_on(async {
let _ = client.runner_create("hero").await;
let _ = client.runner_create("osiris").await;
});
c.bench_function("get_all_runner_status", |b| {
b.to_async(&rt).iter(|| async {
black_box(
client
.get_all_runner_status()
.await
.expect("Get status failed")
)
});
});
}
/// Benchmark: API response time under load
fn bench_api_latency(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
let mut group = c.benchmark_group("api_latency");
group.measurement_time(Duration::from_secs(10));
group.bench_function("supervisor_info", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.get_supervisor_info().await.expect("Failed"))
});
});
group.bench_function("runner_list", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.runner_list().await.expect("Failed"))
});
});
group.bench_function("job_list", |b| {
b.to_async(&rt).iter(|| async {
black_box(client.job_list().await.expect("Failed"))
});
});
group.finish();
}
criterion_group!(
benches,
bench_supervisor_discovery,
bench_supervisor_info,
bench_list_runners,
bench_job_create,
bench_job_list,
bench_osiris_health,
bench_job_lifecycle,
bench_concurrent_jobs,
bench_runner_status,
bench_api_latency,
);
criterion_main!(benches);

210
benches/memory_usage.rs Normal file
View File

@@ -0,0 +1,210 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
use hero_job::Job;
use tokio::runtime::Runtime;
use std::time::Duration;
use std::collections::HashMap;
use uuid::Uuid;
use chrono::Utc;
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const ADMIN_SECRET: &str = "SECRET";
fn create_runtime() -> Runtime {
Runtime::new().unwrap()
}
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
Job {
id: Uuid::new_v4().to_string(),
caller_id: "benchmark".to_string(),
context_id: "test".to_string(),
payload: serde_json::json!({
"command": command,
"args": args
}).to_string(),
runner: runner.to_string(),
timeout: 30,
env_vars: HashMap::new(),
created_at: Utc::now(),
updated_at: Utc::now(),
signatures: vec![],
}
}
#[cfg(target_os = "macos")]
fn get_memory_usage() -> Option<usize> {
use std::process::Command;
let output = Command::new("ps")
.args(&["-o", "rss=", "-p", &std::process::id().to_string()])
.output()
.ok()?;
String::from_utf8(output.stdout)
.ok()?
.trim()
.parse::<usize>()
.ok()
.map(|kb| kb * 1024)
}
#[cfg(target_os = "linux")]
fn get_memory_usage() -> Option<usize> {
use std::fs;
let status = fs::read_to_string("/proc/self/status").ok()?;
for line in status.lines() {
if line.starts_with("VmRSS:") {
let kb = line.split_whitespace().nth(1)?.parse::<usize>().ok()?;
return Some(kb * 1024);
}
}
None
}
fn memory_job_creation(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create client")
});
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("memory_job_creation");
for num_jobs in [10, 50, 100, 200].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_jobs),
num_jobs,
|b, &num_jobs| {
b.iter_custom(|iters| {
let mut total_duration = Duration::ZERO;
for _ in 0..iters {
let mem_before = get_memory_usage().unwrap_or(0);
let start = std::time::Instant::now();
rt.block_on(async {
let mut jobs = Vec::new();
for i in 0..num_jobs {
let job = create_test_job("hero", "echo", vec![format!("mem_test_{}", i)]);
jobs.push(job);
}
black_box(jobs);
});
total_duration += start.elapsed();
let mem_after = get_memory_usage().unwrap_or(0);
let mem_delta = mem_after.saturating_sub(mem_before);
if mem_delta > 0 {
eprintln!("Memory delta for {} jobs: {} KB", num_jobs, mem_delta / 1024);
}
}
total_duration
});
},
);
}
group.finish();
}
fn memory_client_creation(c: &mut Criterion) {
let rt = create_runtime();
let mut group = c.benchmark_group("memory_client_creation");
for num_clients in [1, 10, 50, 100].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_clients),
num_clients,
|b, &num_clients| {
b.iter_custom(|iters| {
let mut total_duration = Duration::ZERO;
for _ in 0..iters {
let mem_before = get_memory_usage().unwrap_or(0);
let start = std::time::Instant::now();
rt.block_on(async {
let mut clients = Vec::new();
for _ in 0..num_clients {
let client = SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create client");
clients.push(client);
}
black_box(clients);
});
total_duration += start.elapsed();
let mem_after = get_memory_usage().unwrap_or(0);
let mem_delta = mem_after.saturating_sub(mem_before);
if mem_delta > 0 {
eprintln!("Memory delta for {} clients: {} KB", num_clients, mem_delta / 1024);
}
}
total_duration
});
},
);
}
group.finish();
}
fn memory_payload_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("memory_payload_sizes");
for size_kb in [1, 10, 100, 1000].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(format!("{}KB", size_kb)),
size_kb,
|b, &size_kb| {
b.iter_custom(|iters| {
let mut total_duration = Duration::ZERO;
for _ in 0..iters {
let mem_before = get_memory_usage().unwrap_or(0);
let start = std::time::Instant::now();
let large_data = "x".repeat(size_kb * 1024);
let job = create_test_job("hero", "echo", vec![large_data]);
black_box(job);
total_duration += start.elapsed();
let mem_after = get_memory_usage().unwrap_or(0);
let mem_delta = mem_after.saturating_sub(mem_before);
if mem_delta > 0 {
eprintln!("Memory delta for {}KB payload: {} KB", size_kb, mem_delta / 1024);
}
}
total_duration
});
},
);
}
group.finish();
}
criterion_group!(
memory_benches,
memory_job_creation,
memory_client_creation,
memory_payload_sizes,
);
criterion_main!(memory_benches);

113
benches/run_benchmarks.sh Executable file
View File

@@ -0,0 +1,113 @@
#!/bin/bash
# Horus Stack Benchmark Runner
# This script ensures the Horus stack is running before executing benchmarks
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Configuration
SUPERVISOR_URL="http://127.0.0.1:3030"
OSIRIS_URL="http://127.0.0.1:8081"
REDIS_URL="127.0.0.1:6379"
echo -e "${GREEN}=== Horus Stack Benchmark Runner ===${NC}\n"
# Function to check if a service is running
check_service() {
local url=$1
local name=$2
if curl -s -f "$url/health" > /dev/null 2>&1 || curl -s -f "$url" > /dev/null 2>&1; then
echo -e "${GREEN}${NC} $name is running"
return 0
else
echo -e "${RED}${NC} $name is not running"
return 1
fi
}
# Function to check if Redis is running
check_redis() {
if redis-cli -h 127.0.0.1 -p 6379 ping > /dev/null 2>&1; then
echo -e "${GREEN}${NC} Redis is running"
return 0
else
echo -e "${RED}${NC} Redis is not running"
return 1
fi
}
# Check prerequisites
echo "Checking prerequisites..."
echo ""
REDIS_OK=false
OSIRIS_OK=false
SUPERVISOR_OK=false
if check_redis; then
REDIS_OK=true
fi
if check_service "$OSIRIS_URL" "Osiris"; then
OSIRIS_OK=true
fi
if check_service "$SUPERVISOR_URL" "Supervisor"; then
SUPERVISOR_OK=true
fi
echo ""
# If any service is not running, provide instructions
if [ "$REDIS_OK" = false ] || [ "$OSIRIS_OK" = false ] || [ "$SUPERVISOR_OK" = false ]; then
echo -e "${YELLOW}Some services are not running. Please start the Horus stack:${NC}"
echo ""
if [ "$REDIS_OK" = false ]; then
echo " 1. Start Redis:"
echo " redis-server"
echo ""
fi
echo " 2. Start Horus stack:"
echo " cd $PROJECT_ROOT"
echo " RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports"
echo ""
echo " Or run in the background:"
echo " RUST_LOG=info ./target/release/horus all --admin-secret SECRET --kill-ports &"
echo ""
read -p "Do you want to continue anyway? (y/N) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo -e "${RED}Benchmark cancelled.${NC}"
exit 1
fi
fi
# Build the project first
echo -e "${GREEN}Building project...${NC}"
cd "$PROJECT_ROOT"
cargo build --release
echo ""
echo -e "${GREEN}Running benchmarks...${NC}"
echo ""
# Run benchmarks with any additional arguments passed to this script
cargo bench --bench horus_stack "$@"
echo ""
echo -e "${GREEN}=== Benchmark Complete ===${NC}"
echo ""
echo "Results saved to: target/criterion/"
echo "View HTML reports: open target/criterion/report/index.html"

300
benches/stress_test.rs Normal file
View File

@@ -0,0 +1,300 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use hero_supervisor_openrpc_client::SupervisorClientBuilder;
use hero_job::Job;
use tokio::runtime::Runtime;
use std::time::Duration;
use std::collections::HashMap;
use uuid::Uuid;
use chrono::Utc;
/// Benchmark configuration
const SUPERVISOR_URL: &str = "http://127.0.0.1:3030";
const ADMIN_SECRET: &str = "SECRET";
/// Helper to create a tokio runtime for benchmarks
fn create_runtime() -> Runtime {
Runtime::new().unwrap()
}
/// Helper to create a test job
fn create_test_job(runner: &str, command: &str, args: Vec<String>) -> Job {
Job {
id: Uuid::new_v4().to_string(),
caller_id: "benchmark".to_string(),
context_id: "test".to_string(),
payload: serde_json::json!({
"command": command,
"args": args
}).to_string(),
runner: runner.to_string(),
timeout: 30,
env_vars: HashMap::new(),
created_at: Utc::now(),
updated_at: Utc::now(),
signatures: vec![],
}
}
/// Stress test: High-frequency job submissions
fn stress_high_frequency_jobs(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_high_frequency");
group.sample_size(10); // Fewer samples for stress tests
group.measurement_time(Duration::from_secs(20));
for num_jobs in [50, 100, 200].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_jobs),
num_jobs,
|b, &num_jobs| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
for i in 0..num_jobs {
let client = client.clone();
let handle = tokio::spawn(async move {
let job = create_test_job("hero", "echo", vec![format!("stress_{}", i)]);
client.job_create(job).await
});
handles.push(handle);
}
// Wait for all jobs to be submitted
for handle in handles {
let _ = black_box(handle.await);
}
});
},
);
}
group.finish();
}
/// Stress test: Sustained load over time
fn stress_sustained_load(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_sustained_load");
group.sample_size(10);
group.measurement_time(Duration::from_secs(30));
group.bench_function("continuous_submissions", |b| {
b.to_async(&rt).iter(|| async {
// Submit jobs continuously for the measurement period
for i in 0..20 {
let job = create_test_job("hero", "echo", vec![format!("sustained_{}", i)]);
let _ = black_box(client.job_create(job).await);
}
});
});
group.finish();
}
/// Stress test: Large payload handling
fn stress_large_payloads(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_large_payloads");
group.sample_size(10);
for size_kb in [1, 10, 100].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(format!("{}KB", size_kb)),
size_kb,
|b, &size_kb| {
b.to_async(&rt).iter(|| async {
// Create a large payload
let large_data = "x".repeat(size_kb * 1024);
let job = create_test_job("hero", "echo", vec![large_data]);
black_box(client.job_create(job).await.expect("Job create failed"))
});
},
);
}
group.finish();
}
/// Stress test: Rapid API calls
fn stress_rapid_api_calls(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create supervisor client")
});
let mut group = c.benchmark_group("stress_rapid_api");
group.sample_size(10);
group.measurement_time(Duration::from_secs(15));
group.bench_function("rapid_info_calls", |b| {
b.to_async(&rt).iter(|| async {
// Make 100 rapid API calls
for _ in 0..100 {
let _ = black_box(client.get_supervisor_info().await);
}
});
});
group.bench_function("rapid_list_calls", |b| {
b.to_async(&rt).iter(|| async {
// Make 100 rapid list calls
for _ in 0..100 {
let _ = black_box(client.runner_list().await);
}
});
});
group.finish();
}
/// Stress test: Mixed workload
fn stress_mixed_workload(c: &mut Criterion) {
let rt = create_runtime();
let client = rt.block_on(async {
SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.timeout(Duration::from_secs(120))
.build()
.expect("Failed to create supervisor client")
});
// Ensure runner is registered
rt.block_on(async {
let _ = client.runner_create("hero").await;
});
let mut group = c.benchmark_group("stress_mixed_workload");
group.sample_size(10);
group.measurement_time(Duration::from_secs(25));
group.bench_function("mixed_operations", |b| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
// Mix of different operations
for i in 0..10 {
let client = client.clone();
// Job submission
let handle1 = tokio::spawn(async move {
let job = create_test_job("hero", "echo", vec![format!("mixed_{}", i)]);
client.job_create(job).await.map(|_| ())
});
handles.push(handle1);
}
// Wait for all operations
for handle in handles {
let _ = black_box(handle.await);
}
});
});
group.finish();
}
/// Stress test: Connection pool exhaustion
fn stress_connection_pool(c: &mut Criterion) {
let rt = create_runtime();
let mut group = c.benchmark_group("stress_connection_pool");
group.sample_size(10);
group.measurement_time(Duration::from_secs(20));
for num_clients in [10, 50, 100].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(num_clients),
num_clients,
|b, &num_clients| {
b.to_async(&rt).iter(|| async {
let mut handles = vec![];
// Create many clients and make concurrent requests
for _ in 0..num_clients {
let handle = tokio::spawn(async move {
let client = SupervisorClientBuilder::new()
.url(SUPERVISOR_URL)
.secret(ADMIN_SECRET)
.build()
.expect("Failed to create client");
client.get_supervisor_info().await
});
handles.push(handle);
}
// Wait for all requests
for handle in handles {
let _ = black_box(handle.await);
}
});
},
);
}
group.finish();
}
criterion_group!(
stress_tests,
stress_high_frequency_jobs,
stress_sustained_load,
stress_large_payloads,
stress_rapid_api_calls,
stress_mixed_workload,
stress_connection_pool,
);
criterion_main!(stress_tests);